research article open access characterization of … · research article open access...

18
RESEARCH ARTICLE Open Access Characterization of the tandem CWCH2 sequence motif: a hallmark of inter-zinc finger interactions Minoru Hatayama 1 , Jun Aruga 1,2* Abstract Background: The C2H2 zinc finger (ZF) domain is widely conserved among eukaryotic proteins. In Zic/Gli/Zap1 C2H2 ZF proteins, the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core. This structural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobic core. Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif, we have named this structure the tandem CWCH2 (tCWCH2) motif. Results: Here, we characterized 587 tCWCH2-containing genes using data derived from public databases. We categorized genes into 11 classes including Zic/Gli/Glis, Arid2/Rsc9, PacC, Mizf, Aebp2, Zap1/ZafA, Fungl, Zfp106, Twincl, Clr1, and Fungl-4ZF, based on sequence similarity, domain organization, and functional similarities. tCWCH2 motifs are mostly found in organisms belonging to the Opisthokonta (metazoa, fungi, and choanoflagellates) and Amoebozoa (amoeba, Dictyostelium discoideum). By comparison, the C2H2 ZF motif is distributed widely among the eukaryotes. The structure and organization of the tCWCH2 motif, its phylogenetic distribution, and molecular phylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor. Within-group or between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features (site-specific amino acid frequencies, longer linker sequence between two C2H2 ZFs, and frequent extra-sequences within C2H2 ZF motifs). Conclusion: These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc finger interactions. Background Zinc finger (ZF) domains (ZFDs) are found in a large number of eukaryotic proteins [1-3]. A single ZF nor- mally forms a globular structure that is stabilized by binding a zinc ion. Many classes of ZFs have been described in public databases (Pfam, http://pfam.sanger. ac.uk/; Prosite, http://au.expasy.org/prosite/; Interpro, http://www.ebi.ac.uk/interpro/; SMART, http://smart. embl-heidelberg.de/), with Cys-Xn-Cys-Xn-His-Xn-His (C2H2) being one of the most common. Most C2H2 ZF proteins contain tandem arrays of the C2H2 motif, which are located at specific intervals to form a func- tional domain. ZFDs were originally identified as the DNA-binding domain of transcription factor IIIA (TFIIIa) and other transcription factors, but accumulat- ing evidence suggests that ZFDs also bind RNA and proteins [4-6]. Since ZFDs are essential for many biolo- gical processes, their structure-function relationships have been well studied. For the DNA-binding ZFs, the structures of ZFD-DNA complexes have been elucidated for GLI [7], TFIIIA [8], zif268 [9], YY1 [10], and WT1 [11]. These findings have enabled the development of synthetic ZF proteins as versatile molecular tools [12]. Despite their important and widespread roles, the struc- tural basis of ZFD-protein interactions is less under- stood than those of ZFD-DNA interactions. The identification of the critical structural features of protein-binding ZFDs remains elusive [5] but some clues are available for a group of ZFs that mediate pro- tein-to-protein interactions [4-6]. Previously, we studied the intra-molecular protein-to-protein interaction between two adjacent C2H2 ZFs [13]. The two N-term- inal ZFs (ZF1 and ZF2) of human ZIC3, which has a ZFD composed of five ZFs (ZF1-ZF5), form a single structural unit through a common hydrophobic core. * Correspondence: [email protected] 1 Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-shi, Saitama 351-0198, Japan Hatayama and Aruga BMC Evolutionary Biology 2010, 10:53 http://www.biomedcentral.com/1471-2148/10/53 © 2010 Hatayama and Aruga; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: dangbao

Post on 29-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

RESEARCH ARTICLE Open Access

Characterization of the tandem CWCH2 sequencemotif a hallmark of inter-zinc finger interactionsMinoru Hatayama1 Jun Aruga12

Abstract

Background The C2H2 zinc finger (ZF) domain is widely conserved among eukaryotic proteins In ZicGliZap1C2H2 ZF proteins the two N-terminal ZFs form a single structural unit by sharing a hydrophobic core Thisstructural unit defines a new motif comprised of two tryptophan side chains at the center of the hydrophobiccore Because each tryptophan residue is located between the two cysteine residues of the C2H2 motif we havenamed this structure the tandem CWCH2 (tCWCH2) motif

Results Here we characterized 587 tCWCH2-containing genes using data derived from public databases Wecategorized genes into 11 classes including ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2 Zap1ZafA Fungl Zfp106Twincl Clr1 and Fungl-4ZF based on sequence similarity domain organization and functional similarities tCWCH2motifs are mostly found in organisms belonging to the Opisthokonta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum) By comparison the C2H2 ZF motif is distributed widely amongthe eukaryotes The structure and organization of the tCWCH2 motif its phylogenetic distribution and molecularphylogenetic analysis suggest that prototypical tCWCH2 genes existed in the Opisthokonta ancestor Within-groupor between-group comparisons of the tCWCH2 amino acid sequence identified three additional sequence features(site-specific amino acid frequencies longer linker sequence between two C2H2 ZFs and frequent extra-sequenceswithin C2H2 ZF motifs)

Conclusion These features suggest that the tCWCH2 motif is a specialized motif involved in inter-zinc fingerinteractions

BackgroundZinc finger (ZF) domains (ZFDs) are found in a largenumber of eukaryotic proteins [1-3] A single ZF nor-mally forms a globular structure that is stabilized bybinding a zinc ion Many classes of ZFs have beendescribed in public databases (Pfam httppfamsangeracuk Prosite httpauexpasyorgprosite Interprohttpwwwebiacukinterpro SMART httpsmartembl-heidelbergde) with Cys-Xn-Cys-Xn-His-Xn-His(C2H2) being one of the most common Most C2H2 ZFproteins contain tandem arrays of the C2H2 motifwhich are located at specific intervals to form a func-tional domain ZFDs were originally identified as theDNA-binding domain of transcription factor IIIA(TFIIIa) and other transcription factors but accumulat-ing evidence suggests that ZFDs also bind RNA and

proteins [4-6] Since ZFDs are essential for many biolo-gical processes their structure-function relationshipshave been well studied For the DNA-binding ZFs thestructures of ZFD-DNA complexes have been elucidatedfor GLI [7] TFIIIA [8] zif268 [9] YY1 [10] and WT1[11] These findings have enabled the development ofsynthetic ZF proteins as versatile molecular tools [12]Despite their important and widespread roles the struc-tural basis of ZFD-protein interactions is less under-stood than those of ZFD-DNA interactionsThe identification of the critical structural features of

protein-binding ZFDs remains elusive [5] but someclues are available for a group of ZFs that mediate pro-tein-to-protein interactions [4-6] Previously we studiedthe intra-molecular protein-to-protein interactionbetween two adjacent C2H2 ZFs [13] The two N-term-inal ZFs (ZF1 and ZF2) of human ZIC3 which has aZFD composed of five ZFs (ZF1-ZF5) form a singlestructural unit through a common hydrophobic core

Correspondence jarugabrainrikenjp1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

copy 2010 Hatayama and Aruga licensee BioMed Central Ltd This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution andreproduction in any medium provided the original work is properly cited

This ZF-connecting hydrophobic core is associated withtwo tryptophan residues each of which is locatedbetween the zinc-binding cysteine residues in ZF1 andZF2 Mutation of the tryptophan in ZF1 (W255G) per-turbs the subcellular localization and function of theprotein and is associated with a pathological and conge-nital heart malformation [1314] The importance ofthese tryptophan residues is further supported by theirconservation among more than 40 Zic proteins identi-fied in a wide range of eumetazoan species [15]Previous studies have revealed that human GLI1

(PDBID 2GLI) and yeast Zap1 (PDBID 1ZW8) alsopossess ZFs with two tryptophans in the correspondingregion and that these tryptophans are located in thehydrophobic core formed by two adjacent ZFs [16]These data raise the possibility that domains containingthe consensus sequence ldquoCys-X-Trp-Xn-Cys-Xn-His-Xn-His-Xn-Cys-X-Trp-Xn-Cys-Xn-His-Xn-Hisrdquo whichwe have named the tandem CWCH2 (tCWCH2) motifare involved in the interaction between the two adjacentZFs Since our knowledge of the tCWCH2 motif is lim-ited to a small group of proteins the biological signifi-cance of the tCWCH2 structure is unclearIn the present study we performed a computer-based

analysis of the tCWCH2 motif using sequence dataderived from public databases We classified thetCWCH2-containing sequences into gene classes andthen determined their conservation status in each pro-tein family and their phylogenic distribution We alsoidentified sequence features unique to tCWCH2-con-taining ZFDs The significance of the tCWCH2 motif isdiscussed in terms of its possible structural and func-tional roles

ResultsThree-dimensional structure of known tCWCH2 motifsTo investigate the positions of the two key tryptophanresidues in different tCWCH2 motifs we performed astructural alignment to compare the three-dimensional(3D) structures of the tCWCH2 motifs from ZIC3(PDBID 2RPC) GLI1 (PDBID 2GLI) and Zap1 (PDBID1ZW8) ZFDs (Figure 1A) with that of two C2H2 ZFsOur data show that all of these C2H2 ZF motifs formglobular structures composed of two anti-parallel bsheets and an a helix (bba) The two tryptophan resi-dues between the two zinc-chelating cysteine residues(Figure 1B) were localized onto the anti-parallel b sheetwhere they were juxtaposed to each other The relativepositions of the two tryptophans were similar amongthe three tCWCH2 structures as was each zinc-chelat-ing residue The tryptophan side chains were close tothe zinc-chelating histidine in the same ZF and thehydrophobic residues in the same and the opposite ZF(within 5Aring Additional file 1) This result suggested that

the tryptophans play a role in both the stabilization oftheir own ZFs and the formation of a hydrophobic corebetween the two adjacent ZFs The conserved position-ing of the tCWCH2 motif-forming residue in evolutio-narily distant sources led us to investigate itsphylogenetic distribution and structural features

Database search for tCWCH2-containing sequencesWe performed a comprehensive search of tCWCH2motif-containing genes in current databases We per-formed an initial search of the non-redundant NCBIprotein database with PHI-BLAST using the tCWCH2pattern deduced from the ZIC comparison and humanZIC3 (NP_003404) as the query sequence This searchyielded a set of 709 amino acid sequences that were ten-tatively classified into 11 gene classes according to theirannotations and the pilot phylogenetic tree analyses(see Methods Figure 2) For the fine analysis ofthe tCWCH2-containing sequences we performedTBLASTN and PBLAST searches of the non-redundantsequence collection of the NCBI database using therepresentative sequences of the initial 11 gene classesand 3 specific genes (two Dictyostelium genes and oneMonosiga gene) as the key sequences In order to exam-ine the distribution of the tCWCH2 sequence motif in aphylogenetic tree of eukaryotes TBLASTN searcheswere also performed on a whole-genome shotgunsequence database of 24 organisms The sequencesobtained were manually checked to remove any redun-dant sequences The final tCWCH2 collection contained587 sequences that represented the non-redundanttCWCH2-containing sequences from a wide range oforganisms (Additional file 2) Because the Mizf andZap1ZafA family genes and one Dictyostelium genecontain two independent tCWCH2 motifs our collec-tion included a total of 637 independent sequenceregions for tCWCH2At this point we re-evaluated the initial sequence pat-tern for PHI-BLAST because it was based solely on thesequence compilation of the Zic family genes The initialPHI-BLAST pattern recovered 90 (572637) of the col-lected sequences (Additional file 3) Among the 65 non-matching sequences 20 sequences showed deviation inthe C W and H residues of the tCWCH2 motif and 50sequences showed derangement of the sequence lengthbetween the two CWCH2 motifs This mismatchappeared to be distributed randomly and suggested thatoptimization of the PHI-BLAST pattern was unlikely tobe informative Therefore we optimized the lengths ofthe intervening sequences PHI-BLAST searches usingthe original pattern ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo with a tCWCH2consensus sequence (see Methods) yielded 911sequences of which 459 sequences were included in our

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 2 of 18

non-redundant tCWCH2 collection This number corre-sponded to 83 of the tCWCH2 sequences in the NCBIprotein database (total = 551) A looser patternldquoxCxWx(135)Cx(515)Hx(25)Hx(542)CxWx(16)Cx(531)Hx(25)Hxrdquo yielded 968 sequences that contains469 tCWCH2 sequences (85 of the total) (Additionalfile 3) The sequences identified by the looser patternincluded Zfp407 sequences that contain a pair ofCWCH2 ZFs separated by another two C2H2 ZFsThese data suggest that decreasing the stringency of thesearch (increasing the intervening arbitrary residuenumber) impairs the fidelity of the search Thereforewe consider that the original PHI-BLAST pattern ismore useful for the tCWCH2 survey

Classification and domain structure of tCWCH2sequence-containing proteinsWe classified the non-redundant tCWCH2-containingsequences based on the available flanking amino acidsequences of the tCWCH2 domain (Figure 2) Eachgene contained one or two tCWCH2 motifs among atotal of 2-9 C2H2 ZF motifs Proteins containing onlyone tCWCH2 domain suggest that a single tCWCH2constitutes the minimal ZFD component In the ZFDs

composed of both tCWCH2 and non-tCWCH2 C2H2tCWCH2s were always placed at the N-terminal end ofthe clustered ZFDsThe 587 sequences included one fungusmetazoan

common gene family (Arid2Rsc9) four metazoan genefamilies (ZicGliGlis Mizf Aebp2 Zfp106) three fungusgene families (PacC Zap1ZafA Clr1) two social amoebagenes (named here as DdHp) and one Monosiga gene Inaddition we identified three novel fungal gene groups(Tandem-CWCH2-protein C-terminal Fungus-Gli-likeand Fungus-Gli-like-4ZF) We have summarized thestructural features of each tCWCH2 gene class togetherwith major gene functions below and in Figure 2

Function structure and classification of the gene classescontaining tCWCH2ZicGliGlisThe proteins encoded by these three gene classes med-iate various processes in animal development [17-23]Their ZFD is commonly composed of five C2H2 ZFsand can bind DNA [724-26]Arid2Rsc9Arid2Rsc9 genes are known to encode components ofan ATP-dependent chromatin remodeling complex

Figure 1 Structures of tCWCH2 sequence motifs from three different proteins (A) Superimposition of the 3D structures of ZIC3 GLI1 andZap1 tCWCH2 is shown in stereo view Backbones of the protein structures are indicated by the flat ribbon model The side chains of twoconserved tryptophan residues in tCWCH2 are indicated by the stick model Red = ZIC3 (PDBID 2RPC) Blue = GLI1 (PDBID 2GLI) Green = Zap1(PDBID 1ZW8) (B) Amino acid sequence alignment of the tCWCH2 regions Zinc-chelating cysteine and histidine residues are shown in grayboxes and conserved tryptophan residues are shown in white letters with black boxes In (A) and (B) the gray lines with arrowheads indicate thesequence between the two CWCH2 motifs (linker sequence)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 3 of 18

Figure 2 Domain structure and function of tCWCH2-containing proteins The 11 gene classes are listed in descending order of the numberof representative genes in each class A Monosiga gene and two Dictyostelium genes which do not belong to these classes are indicated Genename is indicated at the left of each row (open box = C2H2 ZF gray box = CWCH2 gray boxes linked with thick lines = tCWCH2 open boxwith curved ends = other domains) Representative genes and their amino acid length are indicated on the left and references are shown onthe right Hs Homo sapiens Dm Drosophila melanogaster Sc Saccharomyces cerevisiae Afu Aspergillus fumigatus Spo Schizosaccharomycespombe Ro Rhizopus oryzae Mb Monosiga brevicollis Dd Dictyostelium discoideum

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 4 of 18

[27-29] The ARID (AT-rich interaction domain) andtCWCH2 domains are conserved in Arid2 and Rsc9(Additional file 4) Another conserved sequence motifwas found in the C-terminal flanking region of thetCWCH2 motif This sequence motif is summarized asldquoIxL (ST) AxL (IV) L (KR) N (IL) times (KR)rdquo and wenamed this motif the LIL domain PHI-BLAST searchesfor LIL domains suggested that this domain was distrib-uted only in the Arid2Rsc9 proteins (data not shown)PacCPacC mediates gene expression regulation by ambientpH in filamentous fungi and yeasts [3031] It containsan N-terminal DNA binding domain with three ZFsOther ZF domains named A B and C are involved inresponses to pH change [3233]MizfMizf was originally reported as a seven ZF-containingprotein [34-36] We identified two additional C2H2 ZFsin the gene sequence encoding this protein (ZF2 andZF8 in Additional file 5) The two tCWCH2 motifs inthe first four ZFs (ZF1-ZF4) were separated by longerintervening sequences (gt35 aa) than most ZFsAebp2Mammalian Aebp2 is a DNA binding transcription fac-tor [37] Its Drosophila homolog jing is required forcellular differentiation [38-42]Zap1ZafAThe proteins encoded by Zap1ZafA should be groupedtogether because of the similarity in structure of their ZFdomains (Additional file 6) and similarities in their func-tional roles in zinc homeostasis [43-49] Our alignments ofZap1ZafA sequences from seven fungal species (Ustilagomaydis Cryptococcus neoformans Aspergillus fumigatusMagnaporthe grisea Candida albicans Saccharomyces cer-evisiae and Yarrowia lipolytica) reveal significant similarityin their ZFDs (Additional file 6) The ZFDs contain a maxi-mum of eight C2H2 ZF motifs in which the two N-terminalZF pairs (ZF1-ZF2 ZF3-ZF4) can form tCWCH2 motifsZF1-2 and ZF3-4 are separated by longer interveningsequences than most ZFs Our alignments showed thatconservation of ZF1 ZF2 and ZF8 was incompleteFungus-Gli-like (Fungl)The amino acid sequences encoded by this group ofgenes have been known as ldquofungus Gli likerdquo Here wehave named these sequences Fungl (Fungus-Gli-like)(Additional file 7 data not shown)Zfp106Zfp106 is a metazoan ZF protein of unknown function[5051] It contains two pairs of ZFs at each end WD40repeats (IPR001680 Zfp106) are known to mediate pro-tein-protein interactions [52-55]Tandem-CWCH2-protein-C-terminal (Twincl)The sequences are composed of 468 to 1458 aminoacid residues (data not shown) On the basis of these

structural features (Figure 2 Additional file 8) wenamed this gene family Twincl (Tandem CWCH2 pro-tein C-terminal)Clr1The ZFs among members of the Clr1 family [56] werehighly divergent and other sequences were not con-served (Additional file 9 data not shown) Therefore weomitted this gene family from the consensus sequenceand phylogenetic tree analysesFungus-Gli-like-4ZF (Fungl-4ZF)This gene group includes annotations from Rhizopusoryzae and Batrachochytrium dendrobatidis representingthe fungal phyla Mucoromycotina and Chytridiomycotarespectively The proteins encoded by the Fungl-4ZFgene family contain four ZFs that are weakly similar tothose in Gli and Fungl (Additional file 7) Based on theZF motif organization we tentatively grouped themseparately from Fungl However based on theirsequence similarity and phylogenetic classification it ispossible that the Fungl-4ZF and Fungl gene groups arederived from a common ancestral gene [5758]Dictyostelium discoideum tCWCH2The social amoeba Dictyostelium discoideum belongs tothe Amoebozoa supergroup There are two tCWCH2-containing genes in this gene class DdHp1 (Dictyoste-lium discoideum hypothetical protein) and DdHp2 TheZFD alignment of these genes is shown in Additionalfile 10Monosiga brevicollis tCWCH2Since Monosiga brevicollis is the only species among thechoanoflagellates in which the genome has been fullysequenced the presence of one tCWCH2 sequence(Additional file 10) may be more functionally meaning-ful than the singleton sequences in metazoa and fungiThe existence of an RFX domain (position 376-423)raised the possibility that the tCWCH2 sequence inMonosiga brevicollis and Arid2Rsc9 is derived from acommon ancestral geneUnclassified sequencesSix sequences were not classified into the above 13 cate-gories (Additional file 2) Four out of the six unclassifiedsequences contained weak similarities to PacC FunglFungl-4ZF and Mizf The remaining two sequencesfrom fungi did not show any significant homology tothe other sequencesIsolated tCWCH2 in metazoan gene familiesWe found rare occurrences of the tCWCH2 motif insome metazoan gene families including three tCWCH2-containing sequences in the Sp8 family (n = 37) one inthe TRP2 family (n = 8) one in the Kruumlppel-like factor5 family (n = 14) and one in the pbrm-1 family(n = 21) (Additional file 2) We did not include thesegenes in the following analysis because their significancewas not clear

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 5 of 18

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 2: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

This ZF-connecting hydrophobic core is associated withtwo tryptophan residues each of which is locatedbetween the zinc-binding cysteine residues in ZF1 andZF2 Mutation of the tryptophan in ZF1 (W255G) per-turbs the subcellular localization and function of theprotein and is associated with a pathological and conge-nital heart malformation [1314] The importance ofthese tryptophan residues is further supported by theirconservation among more than 40 Zic proteins identi-fied in a wide range of eumetazoan species [15]Previous studies have revealed that human GLI1

(PDBID 2GLI) and yeast Zap1 (PDBID 1ZW8) alsopossess ZFs with two tryptophans in the correspondingregion and that these tryptophans are located in thehydrophobic core formed by two adjacent ZFs [16]These data raise the possibility that domains containingthe consensus sequence ldquoCys-X-Trp-Xn-Cys-Xn-His-Xn-His-Xn-Cys-X-Trp-Xn-Cys-Xn-His-Xn-Hisrdquo whichwe have named the tandem CWCH2 (tCWCH2) motifare involved in the interaction between the two adjacentZFs Since our knowledge of the tCWCH2 motif is lim-ited to a small group of proteins the biological signifi-cance of the tCWCH2 structure is unclearIn the present study we performed a computer-based

analysis of the tCWCH2 motif using sequence dataderived from public databases We classified thetCWCH2-containing sequences into gene classes andthen determined their conservation status in each pro-tein family and their phylogenic distribution We alsoidentified sequence features unique to tCWCH2-con-taining ZFDs The significance of the tCWCH2 motif isdiscussed in terms of its possible structural and func-tional roles

ResultsThree-dimensional structure of known tCWCH2 motifsTo investigate the positions of the two key tryptophanresidues in different tCWCH2 motifs we performed astructural alignment to compare the three-dimensional(3D) structures of the tCWCH2 motifs from ZIC3(PDBID 2RPC) GLI1 (PDBID 2GLI) and Zap1 (PDBID1ZW8) ZFDs (Figure 1A) with that of two C2H2 ZFsOur data show that all of these C2H2 ZF motifs formglobular structures composed of two anti-parallel bsheets and an a helix (bba) The two tryptophan resi-dues between the two zinc-chelating cysteine residues(Figure 1B) were localized onto the anti-parallel b sheetwhere they were juxtaposed to each other The relativepositions of the two tryptophans were similar amongthe three tCWCH2 structures as was each zinc-chelat-ing residue The tryptophan side chains were close tothe zinc-chelating histidine in the same ZF and thehydrophobic residues in the same and the opposite ZF(within 5Aring Additional file 1) This result suggested that

the tryptophans play a role in both the stabilization oftheir own ZFs and the formation of a hydrophobic corebetween the two adjacent ZFs The conserved position-ing of the tCWCH2 motif-forming residue in evolutio-narily distant sources led us to investigate itsphylogenetic distribution and structural features

Database search for tCWCH2-containing sequencesWe performed a comprehensive search of tCWCH2motif-containing genes in current databases We per-formed an initial search of the non-redundant NCBIprotein database with PHI-BLAST using the tCWCH2pattern deduced from the ZIC comparison and humanZIC3 (NP_003404) as the query sequence This searchyielded a set of 709 amino acid sequences that were ten-tatively classified into 11 gene classes according to theirannotations and the pilot phylogenetic tree analyses(see Methods Figure 2) For the fine analysis ofthe tCWCH2-containing sequences we performedTBLASTN and PBLAST searches of the non-redundantsequence collection of the NCBI database using therepresentative sequences of the initial 11 gene classesand 3 specific genes (two Dictyostelium genes and oneMonosiga gene) as the key sequences In order to exam-ine the distribution of the tCWCH2 sequence motif in aphylogenetic tree of eukaryotes TBLASTN searcheswere also performed on a whole-genome shotgunsequence database of 24 organisms The sequencesobtained were manually checked to remove any redun-dant sequences The final tCWCH2 collection contained587 sequences that represented the non-redundanttCWCH2-containing sequences from a wide range oforganisms (Additional file 2) Because the Mizf andZap1ZafA family genes and one Dictyostelium genecontain two independent tCWCH2 motifs our collec-tion included a total of 637 independent sequenceregions for tCWCH2At this point we re-evaluated the initial sequence pat-tern for PHI-BLAST because it was based solely on thesequence compilation of the Zic family genes The initialPHI-BLAST pattern recovered 90 (572637) of the col-lected sequences (Additional file 3) Among the 65 non-matching sequences 20 sequences showed deviation inthe C W and H residues of the tCWCH2 motif and 50sequences showed derangement of the sequence lengthbetween the two CWCH2 motifs This mismatchappeared to be distributed randomly and suggested thatoptimization of the PHI-BLAST pattern was unlikely tobe informative Therefore we optimized the lengths ofthe intervening sequences PHI-BLAST searches usingthe original pattern ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo with a tCWCH2consensus sequence (see Methods) yielded 911sequences of which 459 sequences were included in our

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 2 of 18

non-redundant tCWCH2 collection This number corre-sponded to 83 of the tCWCH2 sequences in the NCBIprotein database (total = 551) A looser patternldquoxCxWx(135)Cx(515)Hx(25)Hx(542)CxWx(16)Cx(531)Hx(25)Hxrdquo yielded 968 sequences that contains469 tCWCH2 sequences (85 of the total) (Additionalfile 3) The sequences identified by the looser patternincluded Zfp407 sequences that contain a pair ofCWCH2 ZFs separated by another two C2H2 ZFsThese data suggest that decreasing the stringency of thesearch (increasing the intervening arbitrary residuenumber) impairs the fidelity of the search Thereforewe consider that the original PHI-BLAST pattern ismore useful for the tCWCH2 survey

Classification and domain structure of tCWCH2sequence-containing proteinsWe classified the non-redundant tCWCH2-containingsequences based on the available flanking amino acidsequences of the tCWCH2 domain (Figure 2) Eachgene contained one or two tCWCH2 motifs among atotal of 2-9 C2H2 ZF motifs Proteins containing onlyone tCWCH2 domain suggest that a single tCWCH2constitutes the minimal ZFD component In the ZFDs

composed of both tCWCH2 and non-tCWCH2 C2H2tCWCH2s were always placed at the N-terminal end ofthe clustered ZFDsThe 587 sequences included one fungusmetazoan

common gene family (Arid2Rsc9) four metazoan genefamilies (ZicGliGlis Mizf Aebp2 Zfp106) three fungusgene families (PacC Zap1ZafA Clr1) two social amoebagenes (named here as DdHp) and one Monosiga gene Inaddition we identified three novel fungal gene groups(Tandem-CWCH2-protein C-terminal Fungus-Gli-likeand Fungus-Gli-like-4ZF) We have summarized thestructural features of each tCWCH2 gene class togetherwith major gene functions below and in Figure 2

Function structure and classification of the gene classescontaining tCWCH2ZicGliGlisThe proteins encoded by these three gene classes med-iate various processes in animal development [17-23]Their ZFD is commonly composed of five C2H2 ZFsand can bind DNA [724-26]Arid2Rsc9Arid2Rsc9 genes are known to encode components ofan ATP-dependent chromatin remodeling complex

Figure 1 Structures of tCWCH2 sequence motifs from three different proteins (A) Superimposition of the 3D structures of ZIC3 GLI1 andZap1 tCWCH2 is shown in stereo view Backbones of the protein structures are indicated by the flat ribbon model The side chains of twoconserved tryptophan residues in tCWCH2 are indicated by the stick model Red = ZIC3 (PDBID 2RPC) Blue = GLI1 (PDBID 2GLI) Green = Zap1(PDBID 1ZW8) (B) Amino acid sequence alignment of the tCWCH2 regions Zinc-chelating cysteine and histidine residues are shown in grayboxes and conserved tryptophan residues are shown in white letters with black boxes In (A) and (B) the gray lines with arrowheads indicate thesequence between the two CWCH2 motifs (linker sequence)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 3 of 18

Figure 2 Domain structure and function of tCWCH2-containing proteins The 11 gene classes are listed in descending order of the numberof representative genes in each class A Monosiga gene and two Dictyostelium genes which do not belong to these classes are indicated Genename is indicated at the left of each row (open box = C2H2 ZF gray box = CWCH2 gray boxes linked with thick lines = tCWCH2 open boxwith curved ends = other domains) Representative genes and their amino acid length are indicated on the left and references are shown onthe right Hs Homo sapiens Dm Drosophila melanogaster Sc Saccharomyces cerevisiae Afu Aspergillus fumigatus Spo Schizosaccharomycespombe Ro Rhizopus oryzae Mb Monosiga brevicollis Dd Dictyostelium discoideum

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 4 of 18

[27-29] The ARID (AT-rich interaction domain) andtCWCH2 domains are conserved in Arid2 and Rsc9(Additional file 4) Another conserved sequence motifwas found in the C-terminal flanking region of thetCWCH2 motif This sequence motif is summarized asldquoIxL (ST) AxL (IV) L (KR) N (IL) times (KR)rdquo and wenamed this motif the LIL domain PHI-BLAST searchesfor LIL domains suggested that this domain was distrib-uted only in the Arid2Rsc9 proteins (data not shown)PacCPacC mediates gene expression regulation by ambientpH in filamentous fungi and yeasts [3031] It containsan N-terminal DNA binding domain with three ZFsOther ZF domains named A B and C are involved inresponses to pH change [3233]MizfMizf was originally reported as a seven ZF-containingprotein [34-36] We identified two additional C2H2 ZFsin the gene sequence encoding this protein (ZF2 andZF8 in Additional file 5) The two tCWCH2 motifs inthe first four ZFs (ZF1-ZF4) were separated by longerintervening sequences (gt35 aa) than most ZFsAebp2Mammalian Aebp2 is a DNA binding transcription fac-tor [37] Its Drosophila homolog jing is required forcellular differentiation [38-42]Zap1ZafAThe proteins encoded by Zap1ZafA should be groupedtogether because of the similarity in structure of their ZFdomains (Additional file 6) and similarities in their func-tional roles in zinc homeostasis [43-49] Our alignments ofZap1ZafA sequences from seven fungal species (Ustilagomaydis Cryptococcus neoformans Aspergillus fumigatusMagnaporthe grisea Candida albicans Saccharomyces cer-evisiae and Yarrowia lipolytica) reveal significant similarityin their ZFDs (Additional file 6) The ZFDs contain a maxi-mum of eight C2H2 ZF motifs in which the two N-terminalZF pairs (ZF1-ZF2 ZF3-ZF4) can form tCWCH2 motifsZF1-2 and ZF3-4 are separated by longer interveningsequences than most ZFs Our alignments showed thatconservation of ZF1 ZF2 and ZF8 was incompleteFungus-Gli-like (Fungl)The amino acid sequences encoded by this group ofgenes have been known as ldquofungus Gli likerdquo Here wehave named these sequences Fungl (Fungus-Gli-like)(Additional file 7 data not shown)Zfp106Zfp106 is a metazoan ZF protein of unknown function[5051] It contains two pairs of ZFs at each end WD40repeats (IPR001680 Zfp106) are known to mediate pro-tein-protein interactions [52-55]Tandem-CWCH2-protein-C-terminal (Twincl)The sequences are composed of 468 to 1458 aminoacid residues (data not shown) On the basis of these

structural features (Figure 2 Additional file 8) wenamed this gene family Twincl (Tandem CWCH2 pro-tein C-terminal)Clr1The ZFs among members of the Clr1 family [56] werehighly divergent and other sequences were not con-served (Additional file 9 data not shown) Therefore weomitted this gene family from the consensus sequenceand phylogenetic tree analysesFungus-Gli-like-4ZF (Fungl-4ZF)This gene group includes annotations from Rhizopusoryzae and Batrachochytrium dendrobatidis representingthe fungal phyla Mucoromycotina and Chytridiomycotarespectively The proteins encoded by the Fungl-4ZFgene family contain four ZFs that are weakly similar tothose in Gli and Fungl (Additional file 7) Based on theZF motif organization we tentatively grouped themseparately from Fungl However based on theirsequence similarity and phylogenetic classification it ispossible that the Fungl-4ZF and Fungl gene groups arederived from a common ancestral gene [5758]Dictyostelium discoideum tCWCH2The social amoeba Dictyostelium discoideum belongs tothe Amoebozoa supergroup There are two tCWCH2-containing genes in this gene class DdHp1 (Dictyoste-lium discoideum hypothetical protein) and DdHp2 TheZFD alignment of these genes is shown in Additionalfile 10Monosiga brevicollis tCWCH2Since Monosiga brevicollis is the only species among thechoanoflagellates in which the genome has been fullysequenced the presence of one tCWCH2 sequence(Additional file 10) may be more functionally meaning-ful than the singleton sequences in metazoa and fungiThe existence of an RFX domain (position 376-423)raised the possibility that the tCWCH2 sequence inMonosiga brevicollis and Arid2Rsc9 is derived from acommon ancestral geneUnclassified sequencesSix sequences were not classified into the above 13 cate-gories (Additional file 2) Four out of the six unclassifiedsequences contained weak similarities to PacC FunglFungl-4ZF and Mizf The remaining two sequencesfrom fungi did not show any significant homology tothe other sequencesIsolated tCWCH2 in metazoan gene familiesWe found rare occurrences of the tCWCH2 motif insome metazoan gene families including three tCWCH2-containing sequences in the Sp8 family (n = 37) one inthe TRP2 family (n = 8) one in the Kruumlppel-like factor5 family (n = 14) and one in the pbrm-1 family(n = 21) (Additional file 2) We did not include thesegenes in the following analysis because their significancewas not clear

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 5 of 18

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 3: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

non-redundant tCWCH2 collection This number corre-sponded to 83 of the tCWCH2 sequences in the NCBIprotein database (total = 551) A looser patternldquoxCxWx(135)Cx(515)Hx(25)Hx(542)CxWx(16)Cx(531)Hx(25)Hxrdquo yielded 968 sequences that contains469 tCWCH2 sequences (85 of the total) (Additionalfile 3) The sequences identified by the looser patternincluded Zfp407 sequences that contain a pair ofCWCH2 ZFs separated by another two C2H2 ZFsThese data suggest that decreasing the stringency of thesearch (increasing the intervening arbitrary residuenumber) impairs the fidelity of the search Thereforewe consider that the original PHI-BLAST pattern ismore useful for the tCWCH2 survey

Classification and domain structure of tCWCH2sequence-containing proteinsWe classified the non-redundant tCWCH2-containingsequences based on the available flanking amino acidsequences of the tCWCH2 domain (Figure 2) Eachgene contained one or two tCWCH2 motifs among atotal of 2-9 C2H2 ZF motifs Proteins containing onlyone tCWCH2 domain suggest that a single tCWCH2constitutes the minimal ZFD component In the ZFDs

composed of both tCWCH2 and non-tCWCH2 C2H2tCWCH2s were always placed at the N-terminal end ofthe clustered ZFDsThe 587 sequences included one fungusmetazoan

common gene family (Arid2Rsc9) four metazoan genefamilies (ZicGliGlis Mizf Aebp2 Zfp106) three fungusgene families (PacC Zap1ZafA Clr1) two social amoebagenes (named here as DdHp) and one Monosiga gene Inaddition we identified three novel fungal gene groups(Tandem-CWCH2-protein C-terminal Fungus-Gli-likeand Fungus-Gli-like-4ZF) We have summarized thestructural features of each tCWCH2 gene class togetherwith major gene functions below and in Figure 2

Function structure and classification of the gene classescontaining tCWCH2ZicGliGlisThe proteins encoded by these three gene classes med-iate various processes in animal development [17-23]Their ZFD is commonly composed of five C2H2 ZFsand can bind DNA [724-26]Arid2Rsc9Arid2Rsc9 genes are known to encode components ofan ATP-dependent chromatin remodeling complex

Figure 1 Structures of tCWCH2 sequence motifs from three different proteins (A) Superimposition of the 3D structures of ZIC3 GLI1 andZap1 tCWCH2 is shown in stereo view Backbones of the protein structures are indicated by the flat ribbon model The side chains of twoconserved tryptophan residues in tCWCH2 are indicated by the stick model Red = ZIC3 (PDBID 2RPC) Blue = GLI1 (PDBID 2GLI) Green = Zap1(PDBID 1ZW8) (B) Amino acid sequence alignment of the tCWCH2 regions Zinc-chelating cysteine and histidine residues are shown in grayboxes and conserved tryptophan residues are shown in white letters with black boxes In (A) and (B) the gray lines with arrowheads indicate thesequence between the two CWCH2 motifs (linker sequence)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 3 of 18

Figure 2 Domain structure and function of tCWCH2-containing proteins The 11 gene classes are listed in descending order of the numberof representative genes in each class A Monosiga gene and two Dictyostelium genes which do not belong to these classes are indicated Genename is indicated at the left of each row (open box = C2H2 ZF gray box = CWCH2 gray boxes linked with thick lines = tCWCH2 open boxwith curved ends = other domains) Representative genes and their amino acid length are indicated on the left and references are shown onthe right Hs Homo sapiens Dm Drosophila melanogaster Sc Saccharomyces cerevisiae Afu Aspergillus fumigatus Spo Schizosaccharomycespombe Ro Rhizopus oryzae Mb Monosiga brevicollis Dd Dictyostelium discoideum

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 4 of 18

[27-29] The ARID (AT-rich interaction domain) andtCWCH2 domains are conserved in Arid2 and Rsc9(Additional file 4) Another conserved sequence motifwas found in the C-terminal flanking region of thetCWCH2 motif This sequence motif is summarized asldquoIxL (ST) AxL (IV) L (KR) N (IL) times (KR)rdquo and wenamed this motif the LIL domain PHI-BLAST searchesfor LIL domains suggested that this domain was distrib-uted only in the Arid2Rsc9 proteins (data not shown)PacCPacC mediates gene expression regulation by ambientpH in filamentous fungi and yeasts [3031] It containsan N-terminal DNA binding domain with three ZFsOther ZF domains named A B and C are involved inresponses to pH change [3233]MizfMizf was originally reported as a seven ZF-containingprotein [34-36] We identified two additional C2H2 ZFsin the gene sequence encoding this protein (ZF2 andZF8 in Additional file 5) The two tCWCH2 motifs inthe first four ZFs (ZF1-ZF4) were separated by longerintervening sequences (gt35 aa) than most ZFsAebp2Mammalian Aebp2 is a DNA binding transcription fac-tor [37] Its Drosophila homolog jing is required forcellular differentiation [38-42]Zap1ZafAThe proteins encoded by Zap1ZafA should be groupedtogether because of the similarity in structure of their ZFdomains (Additional file 6) and similarities in their func-tional roles in zinc homeostasis [43-49] Our alignments ofZap1ZafA sequences from seven fungal species (Ustilagomaydis Cryptococcus neoformans Aspergillus fumigatusMagnaporthe grisea Candida albicans Saccharomyces cer-evisiae and Yarrowia lipolytica) reveal significant similarityin their ZFDs (Additional file 6) The ZFDs contain a maxi-mum of eight C2H2 ZF motifs in which the two N-terminalZF pairs (ZF1-ZF2 ZF3-ZF4) can form tCWCH2 motifsZF1-2 and ZF3-4 are separated by longer interveningsequences than most ZFs Our alignments showed thatconservation of ZF1 ZF2 and ZF8 was incompleteFungus-Gli-like (Fungl)The amino acid sequences encoded by this group ofgenes have been known as ldquofungus Gli likerdquo Here wehave named these sequences Fungl (Fungus-Gli-like)(Additional file 7 data not shown)Zfp106Zfp106 is a metazoan ZF protein of unknown function[5051] It contains two pairs of ZFs at each end WD40repeats (IPR001680 Zfp106) are known to mediate pro-tein-protein interactions [52-55]Tandem-CWCH2-protein-C-terminal (Twincl)The sequences are composed of 468 to 1458 aminoacid residues (data not shown) On the basis of these

structural features (Figure 2 Additional file 8) wenamed this gene family Twincl (Tandem CWCH2 pro-tein C-terminal)Clr1The ZFs among members of the Clr1 family [56] werehighly divergent and other sequences were not con-served (Additional file 9 data not shown) Therefore weomitted this gene family from the consensus sequenceand phylogenetic tree analysesFungus-Gli-like-4ZF (Fungl-4ZF)This gene group includes annotations from Rhizopusoryzae and Batrachochytrium dendrobatidis representingthe fungal phyla Mucoromycotina and Chytridiomycotarespectively The proteins encoded by the Fungl-4ZFgene family contain four ZFs that are weakly similar tothose in Gli and Fungl (Additional file 7) Based on theZF motif organization we tentatively grouped themseparately from Fungl However based on theirsequence similarity and phylogenetic classification it ispossible that the Fungl-4ZF and Fungl gene groups arederived from a common ancestral gene [5758]Dictyostelium discoideum tCWCH2The social amoeba Dictyostelium discoideum belongs tothe Amoebozoa supergroup There are two tCWCH2-containing genes in this gene class DdHp1 (Dictyoste-lium discoideum hypothetical protein) and DdHp2 TheZFD alignment of these genes is shown in Additionalfile 10Monosiga brevicollis tCWCH2Since Monosiga brevicollis is the only species among thechoanoflagellates in which the genome has been fullysequenced the presence of one tCWCH2 sequence(Additional file 10) may be more functionally meaning-ful than the singleton sequences in metazoa and fungiThe existence of an RFX domain (position 376-423)raised the possibility that the tCWCH2 sequence inMonosiga brevicollis and Arid2Rsc9 is derived from acommon ancestral geneUnclassified sequencesSix sequences were not classified into the above 13 cate-gories (Additional file 2) Four out of the six unclassifiedsequences contained weak similarities to PacC FunglFungl-4ZF and Mizf The remaining two sequencesfrom fungi did not show any significant homology tothe other sequencesIsolated tCWCH2 in metazoan gene familiesWe found rare occurrences of the tCWCH2 motif insome metazoan gene families including three tCWCH2-containing sequences in the Sp8 family (n = 37) one inthe TRP2 family (n = 8) one in the Kruumlppel-like factor5 family (n = 14) and one in the pbrm-1 family(n = 21) (Additional file 2) We did not include thesegenes in the following analysis because their significancewas not clear

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 5 of 18

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 4: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Figure 2 Domain structure and function of tCWCH2-containing proteins The 11 gene classes are listed in descending order of the numberof representative genes in each class A Monosiga gene and two Dictyostelium genes which do not belong to these classes are indicated Genename is indicated at the left of each row (open box = C2H2 ZF gray box = CWCH2 gray boxes linked with thick lines = tCWCH2 open boxwith curved ends = other domains) Representative genes and their amino acid length are indicated on the left and references are shown onthe right Hs Homo sapiens Dm Drosophila melanogaster Sc Saccharomyces cerevisiae Afu Aspergillus fumigatus Spo Schizosaccharomycespombe Ro Rhizopus oryzae Mb Monosiga brevicollis Dd Dictyostelium discoideum

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 4 of 18

[27-29] The ARID (AT-rich interaction domain) andtCWCH2 domains are conserved in Arid2 and Rsc9(Additional file 4) Another conserved sequence motifwas found in the C-terminal flanking region of thetCWCH2 motif This sequence motif is summarized asldquoIxL (ST) AxL (IV) L (KR) N (IL) times (KR)rdquo and wenamed this motif the LIL domain PHI-BLAST searchesfor LIL domains suggested that this domain was distrib-uted only in the Arid2Rsc9 proteins (data not shown)PacCPacC mediates gene expression regulation by ambientpH in filamentous fungi and yeasts [3031] It containsan N-terminal DNA binding domain with three ZFsOther ZF domains named A B and C are involved inresponses to pH change [3233]MizfMizf was originally reported as a seven ZF-containingprotein [34-36] We identified two additional C2H2 ZFsin the gene sequence encoding this protein (ZF2 andZF8 in Additional file 5) The two tCWCH2 motifs inthe first four ZFs (ZF1-ZF4) were separated by longerintervening sequences (gt35 aa) than most ZFsAebp2Mammalian Aebp2 is a DNA binding transcription fac-tor [37] Its Drosophila homolog jing is required forcellular differentiation [38-42]Zap1ZafAThe proteins encoded by Zap1ZafA should be groupedtogether because of the similarity in structure of their ZFdomains (Additional file 6) and similarities in their func-tional roles in zinc homeostasis [43-49] Our alignments ofZap1ZafA sequences from seven fungal species (Ustilagomaydis Cryptococcus neoformans Aspergillus fumigatusMagnaporthe grisea Candida albicans Saccharomyces cer-evisiae and Yarrowia lipolytica) reveal significant similarityin their ZFDs (Additional file 6) The ZFDs contain a maxi-mum of eight C2H2 ZF motifs in which the two N-terminalZF pairs (ZF1-ZF2 ZF3-ZF4) can form tCWCH2 motifsZF1-2 and ZF3-4 are separated by longer interveningsequences than most ZFs Our alignments showed thatconservation of ZF1 ZF2 and ZF8 was incompleteFungus-Gli-like (Fungl)The amino acid sequences encoded by this group ofgenes have been known as ldquofungus Gli likerdquo Here wehave named these sequences Fungl (Fungus-Gli-like)(Additional file 7 data not shown)Zfp106Zfp106 is a metazoan ZF protein of unknown function[5051] It contains two pairs of ZFs at each end WD40repeats (IPR001680 Zfp106) are known to mediate pro-tein-protein interactions [52-55]Tandem-CWCH2-protein-C-terminal (Twincl)The sequences are composed of 468 to 1458 aminoacid residues (data not shown) On the basis of these

structural features (Figure 2 Additional file 8) wenamed this gene family Twincl (Tandem CWCH2 pro-tein C-terminal)Clr1The ZFs among members of the Clr1 family [56] werehighly divergent and other sequences were not con-served (Additional file 9 data not shown) Therefore weomitted this gene family from the consensus sequenceand phylogenetic tree analysesFungus-Gli-like-4ZF (Fungl-4ZF)This gene group includes annotations from Rhizopusoryzae and Batrachochytrium dendrobatidis representingthe fungal phyla Mucoromycotina and Chytridiomycotarespectively The proteins encoded by the Fungl-4ZFgene family contain four ZFs that are weakly similar tothose in Gli and Fungl (Additional file 7) Based on theZF motif organization we tentatively grouped themseparately from Fungl However based on theirsequence similarity and phylogenetic classification it ispossible that the Fungl-4ZF and Fungl gene groups arederived from a common ancestral gene [5758]Dictyostelium discoideum tCWCH2The social amoeba Dictyostelium discoideum belongs tothe Amoebozoa supergroup There are two tCWCH2-containing genes in this gene class DdHp1 (Dictyoste-lium discoideum hypothetical protein) and DdHp2 TheZFD alignment of these genes is shown in Additionalfile 10Monosiga brevicollis tCWCH2Since Monosiga brevicollis is the only species among thechoanoflagellates in which the genome has been fullysequenced the presence of one tCWCH2 sequence(Additional file 10) may be more functionally meaning-ful than the singleton sequences in metazoa and fungiThe existence of an RFX domain (position 376-423)raised the possibility that the tCWCH2 sequence inMonosiga brevicollis and Arid2Rsc9 is derived from acommon ancestral geneUnclassified sequencesSix sequences were not classified into the above 13 cate-gories (Additional file 2) Four out of the six unclassifiedsequences contained weak similarities to PacC FunglFungl-4ZF and Mizf The remaining two sequencesfrom fungi did not show any significant homology tothe other sequencesIsolated tCWCH2 in metazoan gene familiesWe found rare occurrences of the tCWCH2 motif insome metazoan gene families including three tCWCH2-containing sequences in the Sp8 family (n = 37) one inthe TRP2 family (n = 8) one in the Kruumlppel-like factor5 family (n = 14) and one in the pbrm-1 family(n = 21) (Additional file 2) We did not include thesegenes in the following analysis because their significancewas not clear

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 5 of 18

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 5: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

[27-29] The ARID (AT-rich interaction domain) andtCWCH2 domains are conserved in Arid2 and Rsc9(Additional file 4) Another conserved sequence motifwas found in the C-terminal flanking region of thetCWCH2 motif This sequence motif is summarized asldquoIxL (ST) AxL (IV) L (KR) N (IL) times (KR)rdquo and wenamed this motif the LIL domain PHI-BLAST searchesfor LIL domains suggested that this domain was distrib-uted only in the Arid2Rsc9 proteins (data not shown)PacCPacC mediates gene expression regulation by ambientpH in filamentous fungi and yeasts [3031] It containsan N-terminal DNA binding domain with three ZFsOther ZF domains named A B and C are involved inresponses to pH change [3233]MizfMizf was originally reported as a seven ZF-containingprotein [34-36] We identified two additional C2H2 ZFsin the gene sequence encoding this protein (ZF2 andZF8 in Additional file 5) The two tCWCH2 motifs inthe first four ZFs (ZF1-ZF4) were separated by longerintervening sequences (gt35 aa) than most ZFsAebp2Mammalian Aebp2 is a DNA binding transcription fac-tor [37] Its Drosophila homolog jing is required forcellular differentiation [38-42]Zap1ZafAThe proteins encoded by Zap1ZafA should be groupedtogether because of the similarity in structure of their ZFdomains (Additional file 6) and similarities in their func-tional roles in zinc homeostasis [43-49] Our alignments ofZap1ZafA sequences from seven fungal species (Ustilagomaydis Cryptococcus neoformans Aspergillus fumigatusMagnaporthe grisea Candida albicans Saccharomyces cer-evisiae and Yarrowia lipolytica) reveal significant similarityin their ZFDs (Additional file 6) The ZFDs contain a maxi-mum of eight C2H2 ZF motifs in which the two N-terminalZF pairs (ZF1-ZF2 ZF3-ZF4) can form tCWCH2 motifsZF1-2 and ZF3-4 are separated by longer interveningsequences than most ZFs Our alignments showed thatconservation of ZF1 ZF2 and ZF8 was incompleteFungus-Gli-like (Fungl)The amino acid sequences encoded by this group ofgenes have been known as ldquofungus Gli likerdquo Here wehave named these sequences Fungl (Fungus-Gli-like)(Additional file 7 data not shown)Zfp106Zfp106 is a metazoan ZF protein of unknown function[5051] It contains two pairs of ZFs at each end WD40repeats (IPR001680 Zfp106) are known to mediate pro-tein-protein interactions [52-55]Tandem-CWCH2-protein-C-terminal (Twincl)The sequences are composed of 468 to 1458 aminoacid residues (data not shown) On the basis of these

structural features (Figure 2 Additional file 8) wenamed this gene family Twincl (Tandem CWCH2 pro-tein C-terminal)Clr1The ZFs among members of the Clr1 family [56] werehighly divergent and other sequences were not con-served (Additional file 9 data not shown) Therefore weomitted this gene family from the consensus sequenceand phylogenetic tree analysesFungus-Gli-like-4ZF (Fungl-4ZF)This gene group includes annotations from Rhizopusoryzae and Batrachochytrium dendrobatidis representingthe fungal phyla Mucoromycotina and Chytridiomycotarespectively The proteins encoded by the Fungl-4ZFgene family contain four ZFs that are weakly similar tothose in Gli and Fungl (Additional file 7) Based on theZF motif organization we tentatively grouped themseparately from Fungl However based on theirsequence similarity and phylogenetic classification it ispossible that the Fungl-4ZF and Fungl gene groups arederived from a common ancestral gene [5758]Dictyostelium discoideum tCWCH2The social amoeba Dictyostelium discoideum belongs tothe Amoebozoa supergroup There are two tCWCH2-containing genes in this gene class DdHp1 (Dictyoste-lium discoideum hypothetical protein) and DdHp2 TheZFD alignment of these genes is shown in Additionalfile 10Monosiga brevicollis tCWCH2Since Monosiga brevicollis is the only species among thechoanoflagellates in which the genome has been fullysequenced the presence of one tCWCH2 sequence(Additional file 10) may be more functionally meaning-ful than the singleton sequences in metazoa and fungiThe existence of an RFX domain (position 376-423)raised the possibility that the tCWCH2 sequence inMonosiga brevicollis and Arid2Rsc9 is derived from acommon ancestral geneUnclassified sequencesSix sequences were not classified into the above 13 cate-gories (Additional file 2) Four out of the six unclassifiedsequences contained weak similarities to PacC FunglFungl-4ZF and Mizf The remaining two sequencesfrom fungi did not show any significant homology tothe other sequencesIsolated tCWCH2 in metazoan gene familiesWe found rare occurrences of the tCWCH2 motif insome metazoan gene families including three tCWCH2-containing sequences in the Sp8 family (n = 37) one inthe TRP2 family (n = 8) one in the Kruumlppel-like factor5 family (n = 14) and one in the pbrm-1 family(n = 21) (Additional file 2) We did not include thesegenes in the following analysis because their significancewas not clear

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 5 of 18

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 6: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

tCWCH2 sequences in plant databasesTwo plant tCWCH2-containing sequences were foundOne sequence (XP_001771543) was the REF6 homologof a moss (Bryophyta Physcomitrella patens) (Additionalfile 10) Although REF6 is conserved in many plants theCWCH2 was not found in other plant species includingArabidopsis thaliana Oryza sativa and Vitis vinifera(Additional file 10) The other sequence AK110182 wasfrom Oryza sativa but is highly homologous to PacCThis sequence was otherwise only detected in fungi andtherefore might represent contamination of Oryza sativawith fungal materialIn the course of the classification procedure we

extracted the sequences belonging to each gene familyirrespective of the presence of tCWCH2 motifs identi-fied by BLAST search using representative entire aminoacid sequences Within this group we examined theextent of conservation of the cysteine tryptophan andhistidine residues contributing to the tCWCH2 motifsThe analysis indicated that the tryptophans in CWCH2motif-constituting residues were generally conserved(87-100) (Table 1)

Similarity among tCWCH2 gene classesWe performed molecular phylogenetic analyses to iden-tify similarities among the tCWCH2 gene classes Forthis purpose we selected flanking ZFs containingtCWCH2 (tCWCH2+1ZF) because the tCWCH2 itselfyielded very little information on sequence similaritiespresumably because it is short In addition genes widely

distributed among fungi or metazoa were used for thisanalysis because our preliminary analysis suggested thatgenes with a narrow phylogenetic distribution (TwinclClr1 Fungl-4ZF Dictyostelium tCWCH2) had stronglydivergent sequences that may confound the phylogenetictree analyses [59] Using the tCWCH2+1ZF (3ZFs)amino acid sequences we constructed phylogenetictrees by the Bayesian inference (BI) maximal likelihood(ML) and neighbor joining (NJ) methods The 89 (80tCWCH2 + 9 TFIIIA) sequences representing a widephylogenic range were subjected to this analysis usingthe DNA-binding classical C2H2 ZFs of TFIIIA as out-group sequences (Figure 3 sequence alignment in Addi-tional file 11 detailed tree in BI ML NJ in Additionalfiles 12 13 and 14)Each class of tCWCH2-containing genes was strappedwith high bootstrap values in the BI ML and NJ meth-ods supporting the validity of the above classificationPhylogenetic tree analyses revealed the relationshipsamong the tCWCH2 gene families and revealed a novelrelationship among the metazoan Zic Gli and Glis genefamilies These three gene families were groupedtogether whereas Gli and Glis formed a sub-group inthe branch of the ZicGliGlis group It is known thatvertebrate Zic Gli and Glis are composed of six threeand three paralogs respectively Interestingly Glis2 andGlis3 were separately grouped with their insect homo-logs (DmGlis2 and DmGlis3 respectively) and sea ane-mone Nematostella homologs (NveGlis2 and NveGlis3respectively Additional files 12 13 and 14) indicatingthat the two paralogs existed in the common ancestorsof cnidarians and bilaterians On the other hand verte-brate Zic and Gli paralogs were not grouped with theinvertebrate homologs (Additional files 12 13 and 14)suggesting that vertebrate Zic and Gli paralogs weregenerated in vertebrate ancestorsFungl was determined to have sequence similarity with

the ZicGliGlis group The Fungl group branch wasalways located closest to the ZicGliGlis group in theBI ML and NJ trees but statistical support was weak(BI 93) The other gene groups with more than twoclasses of tCWCH2 motif were not strongly supported(BI lt70) However Zap1ZafA class and Aebp2 classproteins were grouped together by the BI and ML meth-ods and the PacC family was placed in the basal root ofthe BI and ML trees beside the other tCWCH2+ZF1sequences from metazoa and fungi

Distribution of the tCWCH2 sequences in thephylogenetic treeBased on the above analysis we investigated the phylo-geny of the tCWCH2 motif (Figure 3) In global termstCWCH2 sequences were detected in all of the organ-isms we examined in the Opisthokonta supergroup

Table 1 Conservation of the tryptophan residues for thetCWCH2 motif in each gene family

Gene n

ZicGliGlis 282 993

Arid2Rsc9 61 967

PacC 48 1000

MizfZF12 40 925

MizfZF34 40 1000

Aebp2 39 1000

Zap1ZafAZF12 10 1000

Zap1ZafAZF34 33 970

Fungl 32 1000

Zfp106 16 875

Twincl 13 1000

Clr1 4 1000

Fungus Gli like 4ZF 2 1000

Dictyostelium gene 3 1000

Monosiga gene 1 1000

total 624 982aThe percentages indicate conservation of the tryptophan residues in eachgene class In Mizf and Zap1ZafA there are two independent tCWCH2 motifsfor which ZF12 (ZF1-ZF2) indicates the N-terminally located one and ZF34(ZF3-ZF4) indicates the C-terminally located one

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 6 of 18

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 7: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

including metazoa fungi choanoflagellate (Monosigabrevicollis) and microsporidia (Encephalitozoon cuniculi)In the Amoebozoa supergroup we found two tCWCH2-containing genes in the social amoeba Dictyostelium dis-coideum but not in Entamoeba histolytica Apart fromOpisthokonta and Amoebozoa two sequences werefound in the Plantae databases but their meaning is notclear Thus it appears that the distribution of thetCWCH2 sequence motif is essentially limited to theUniconta (a collective term for the Opisthokonta andAmoebozoa supergroups) in current sequence databasesWe examined the distribution of each tCWCH2 motif

class in the phylogenetic tree (Figure 4) Arid2Rsc9 classgenes were most widely detected in both fungi and meta-zoa On the other hand tCWCH2 was not detected in thesequences from the order Saccharomycetales (NP_013579Saccharomyces cerevisiae XP_718578 Candida albicansXP_506030 Yarrowia lipolytica) ZicGliGlis Aebp2 andMizf were widely distributed in metazoan groups butZfp106 was detected only in Bilateralia except EcdysozoaPacC Zap1ZafA and Fungl class genes were widely dis-tributed among the fungi groups Distribution of Twinclclass genes was restricted to Aspergillus and its close rela-tives Clr1 class sequences were found only in Schizosac-charomyces and Cryptococcus Fungl-4ZF class genes wererecovered only from the basal fungi (Rhizopus oryzae andBatrachochytrium dendrobatidis)

Additional sequence features in the tCWCH2 motifsWe generated tCWCH2 consensus sequences for eachgene class as well as for all classes to identify any addi-tional sequence features The Prosite databasePDOC00028 alignment was used for the reference con-sensus sequence of general C2H2 The extent of conser-vation is indicated by the size of letters and theconsensus sequences are aligned graphically (Figure 5)Classical C2H2 ZFs are known to have a consensussequence of ldquo(FY)xCx2CxFx7Lx2Hx4Hrdquo [1] ThetCWCH2 motifs also contain conserved phenylalanineand leucine residues between the cysteine and histidineresidues Phenylalanine and leucine are hydrophobicresidues that generally mediate a hydrophobic interac-tion within a single ZF (data not shown) These dataindicate that the general structure of a single ZF is wellconserved in the tCWCH2 motif (Figure 6A) and thatthere are some tCWCH2-specific structures apart fromthe conserved tryptophan residuesFirstly a position-specific bias to hydrophobic residues

was found in the C-terminal region adjacent to the firsthistidine residue in ZF1 and ZF2 (j1 j2 in Figure 56A) Valine leucine and isoleucine were frequentlyfound at j1 position (341 220 and 353 respec-tively) and valine leucine isoleucine and methioninewere frequently found at j2 position (143 143391 and 242 respectively) in contrast to those in a

Figure 3 Phylogenic tree of tCWCH2 (A) Tree indicating similarities among the tCWCH2 gene classes Statistical analysis was performed usingthree ZFs (tCWCH2 and a ZF in the C-terminal flanking region) (B) Sub-tree of ZicGliGlis class gene The alignment for this analysis is shown inAdditional file 11 BI NL and NJ analyses were carried out to construct the molecular phylogenetic trees (Additional files 12 13 and 14) The treepattern is based on the BI tree Scores for each branch indicate the statistical support values obtained in each phylogenetic tree constructionmethod BI (postprobability)ML (bootstrap value)NJ (bootstrap value) The absence of scores (-) indicates the branches that were notsupported by the corresponding methods

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 7 of 18

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 8: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

compilation of general C2H2 sequences (PDOC00028)(valine 42 leucine 95 isoleucine 53 methionine78 Figure 6B) The difference in the summed fre-quency of each position was statistically significant (j1P lt 1 times 10-100 j2 P lt 1 times 10-100 in a c2 test) Thesepositions were close to the tryptophan residues on theopposite side of the ZF as determined in the superim-posed 3D structures of Zic Gli and Zap1 tCWCH2s(Figure 6A) The biased residue frequency may reflectthe spatial restriction of amino acid choices Since theCWCH2 structure suggests the presence of four hydro-phobic residues in a compact space the residue pairedwith each tryptophan could not be a residue with alarge side chain such as phenylalanine or tryptophanSecondly the linker sequence of the two ZFs

in tCWCH2 was significantly longer (118 plusmn 44 aa aver-age plusmn standard deviation) than those of generalC2H2 sequences (Prosite PDOC00028 81 plusmn 50 aa)(P lt 1 times 10-100 Mann-Whitney U-test Figure 7) Therewere no canonical TGE(KR)P-like sequences in thetCWCH2 linker sequences whereas the TGE(KR)P lin-ker sequence was found in about 65 of the sequences inthe PDOC00028 alignment Accordingly conservation of

the tCWCH2 linker sequences was generally low (Fig-ure 5) and their lengths were more strongly divergentthan those of TGE(KR)P ZFs (P = 3 times 10-7 F-test) Elon-gation of the linker was commonly found in the Zap1ZafA (ZF1-2 162 plusmn 57 aa) Mizf (ZF1-2 192 plusmn 44 aaZF3-4 175 plusmn 23 aa) and Arid2Rsc9 (140 plusmn 69 aa)classesThirdly extra sequences were inserted between the

tryptophan and cysteine residues in tCWCH2 This typeof insertion was limited to the sequences from the firstZF of ZicGliGlis Arid2Rsc9 and Zap1ZafA and thesecond ZF of Arid2Rsc9 classes The number of aminoacid residues between the two cysteines of tCWCH2varied from four to 34 and was mostly four residues inother classes [15] This insertion forms an extra loopingstructure in ZIC3 (PDBID 2RPC Figure 1 6A) [13]

DiscussionThe tCWCH2 motif is a hallmark of inter-zinc fingerinteractionsThe tCWCH2 motif was originally proposed as an evo-lutionarily conserved sequence involved in the interac-tion between ZFs in human ZIC3 [1315] The presence

Figure 4 Distribution of tCWCH2-containing genes in a eukaryotic phylogenetic tree Tree pattern (gray curved lines) is based on [78-80]The distribution of tCWCH2 sequences is indicated by the black curved line The distribution of each tCWCH2 gene class is indicated by acolored area The Monosiga tCWCH2 sequence may be derived from the common ancestor of Arid2Rsc9 (See Results)

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 8 of 18

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 9: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Figure 5 Sequence conservation among the classes of tCWCH2 sequence motifs Amino acid sequences were aligned and consensussequences were generated (see Methods) Mizf ZF1-2 Mizf ZF3-4 and Zap1ZafA ZF1-2 and ZF3-4 are indicated separately The PDOC00028C2H2 consensus is shown at the bottom as a general C2H2 consensus sequence (indicated by the tandem repeat of the consensus sequence)The short insertion sequences listed below were initially located at the sites indicated by the colored arrowheads in the alignments describedabove but were separated to allow more comprehensive analyses These sequences represent the longer linker sequences and the insertion ofextra sequences described in the Results section To achieve this analysis we omitted the four sequences AAWT01013938 (Schmidteamediterranea Aebp2) CAG05504 (Tetraodon nigroviridis Gli) XP_001602003 (Nasonia vitripennis Gli) and XP_785526 (Strongylocentrotus purpuratusGli) because these sequences contain exceptionally divergent sequences that disrupt the alignments

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 9 of 18

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 10: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

of the tCWCH2 motif in the Gli family Glis familyZap1 and PacC has been previously reported [13]Superimposition analysis revealed a tightly conservedstructure among ZIC3 GLI and Zap1 supporting theimportance of the tCWCH2 motif in structural termsStrong conservation of the tCWCH2 tryptophan resi-dues in each gene class suggests that there has beenstrong selective pressure for them during evolutionOur analysis revealed additional structural features of

tCWCH2 besides the conserved tryptophans which maybe explained by adaptation of its role in mediating theinter-ZF interaction (Figure 8) Firstly there is a prefer-ence for valine leucine or isoleucine in the j1 and j2position This can be regarded as an adaptive adjust-ment for forming hydrophobic cores of the appropriatesize Furthermore hydrophobic interaction between j1and j2 is likely because they are closely located within4Aring distance in the 3D structure models of ZIC3 GLIand Zap1 (Additional file 1 data not shown) This alsocan influence the amino acid preferences in j1 and j2position Secondly elongation of the linker sequencesmay also reflect a structural adaptation to allow themotif to face the two globular C2H2 units Linker lengthand sequence are important for the function of theDNA-binding C2H2 ZF because they determine the dis-tance of each ZF and the linker contacts to the DNAbase [60-62] However optimization of the canonical

linker length for DNA-binding may not be enough toyield opposed positioning of ZF units The bendingbetween the two ZFs may be larger than is dictated bythe DNA binding state of the ZFs as revealed in theGLI-DNA complex structure [7] Longer linker sizeseems to be a general feature of tCWCH2 but conserva-tion of the linker sequences within each class variesnotably Linker conservation in PacC is particularlyhigh despite the widespread distribution of this gene infungi thus suggesting that the linker in PacC is underevolutionary constraintAnother structural feature the addition of extra

sequences is plausible if we assume the reduction ofstructural constraint as a consequence of specializationin the inter-ZF interaction For example the conservationof tCWCH2-forming ZF1 is lower than that of ZF2-5 inZic proteins [15] In the case of the GLI-DNA complexthe tCWCH2-forming ZF1 does not directly participatein DNA binding [12] and the sequence conservation ofGli family ZF1 is also lower than in ZF2-5 (Figure 5unpublished observation) The extra sequences in theZicGliGlis Zap1ZafA (ZF3-4) and AridRsc9 classesshow poor conservation (Figure 5)It is unclear if tCWCH2 is the only structure required

for inter-ZF interactions The C2H2 ZFD is capable ofachieving protein-to-protein interactions in many cases[6] Interestingly in a previous study to design synthetic

Figure 6 j1 and j2 positions of tCWCH2 (A) Stereo view of the tCWCH2 sequence motif A wire model of the main chains is shown withZIC3 GLI1 and Zap1 in green and TFIIIA and Zif268 in red The side chains of tryptophan residues in tCWCH2 and those in the j1 j2 residuesare indicated by cylinders ZIC3 GLI1 and Zap1 are gray cylinders and the others are red cylinders The corresponding positions (j1 j2) areindicated by red boxes in Figure 5 (B) Frequency of amino acid appearance in j1 j2 and those of general C2H2 (PDOC00028) is shown Valineleucine and isoleucine appear frequently at both positions and methionine frequently appears in j2 This table summarizes all of the tCWCH2sequences Highlighted numbers indicate the preferential residues for j1 and j2 in tCWCH2

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 10 of 18

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 11: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Figure 7 Comparison of linker length between the tCWCH2 sequence motif and general C2H2 ZF The bars indicate the mean lengths ofthe linker sequence between C2H2 motifs in the indicated groups We defined the linker length as the number of amino acids between the lasthistidine of a ZF and the first cysteine of the C-terminally flanking ZF The PDOC00028 (general C2H2) alignment contained 12326 sequencesand we found 10025 linkers in it The general C2H2 sequences was further divided into two groups based on the presence or absence of ldquoTGE(KR)Prdquo The sequence numbers of each group tCWCH2 614 general C2H2 10025 ldquoTGE(KP)rdquo 4640 ldquonon-TGE(KR)Prdquo 5385) Error bar standarddeviation P lt 1 times 10-100 in Mann-Whitney U-test

Figure 8 Evolution of the tCWCH2 domain We propose that the tCWCH2 patterns were generated from classical C2H2 sequencesconcurrently with the acquisition of the additional sequence features

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 11 of 18

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 12: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

ZFs utilizing in vitro-evolving protein-to-protein interac-tions [16] it was found that the hydrophobic residuesin zif268 that interacted between the artificial proteinsand the zif268 C2H2 ZF were similar to those involvedin the GLI inter-ZF interaction Superimpositionrevealed that the position corresponding to the trypto-phan in tCWCH2 is occupied by a valine suggestingthat either intramolecular or intermolecular protein-to-protein interactions mediated by hydrophobic residuesare generally utilized for C2H2 ZFs However the strongconservation of the tCWCH2 and accompanying struc-tural features suggest that tCWCH2 is a highly specia-lized C2H2 ZF for inter-ZF interactions between twoadjacent ZFs

Evolution of the tCWCH2 motifWe hypothesize that the original tCWCH2 motifs weregenerated from classical C2H2 ZFs (Figure 8) that occurwidely in eukaryotic organisms including Plantae Exca-vata Chromalveolata and Rhizaria tCWCH2 motifswere found in the vicinity of C2H2 ZF in severaltCWCH2 gene classes It is proposed that the tCWCH2sequence appeared in the common ancestor of Opistho-konta or Unikonta (OpisthokontaAmoebozoa) Theacquisition of the tryptophans and additional structuralfeatures of tCWCH2 may have occurred concurrently inthe course of evolution Whether the evolution oftCWCH2 occurred once or multiple times duringOpisthokontaAmoebozoa evolution remains unclearHowever its short sequence and some isolated distribu-tion in the plant supergroup and metazoan gene familiesmay favor the multiple-origins hypothesis Although thelineage relationship among the tCWCH2 classes remainsunclear we can presume the presence of the followingtCWCH2 ancestral genes an Arid2Rsc9 commonancestor which may have existed before the diversifica-tion of metazoa and fungi a ZicGliGlis commonancestor an Aebp2 ancestor and a Mizf ancestor in theearly metazoa a Fungl ancestor a PacC ancestor aFungl-4ZF ancestor and a Zap1ZafA ancestor in theearly fungi a Zfp106 ancestor in early bilaterians aTwincl ancestor in the founder of Aspergillus and itsclose siblings and a Clr1 ancestor in the founder ofDikarya althogh most Dikarya may have lost the geneOnce the tCWCH2 structures were established theymay have been more strongly conserved in these classes

Function and role of tCWCH2There are several cases that directly indicate the func-tional importance of the tCWCH2 motif in living organ-isms In the case of PacC mutation of the tryptophan oftCWCH2 induces a loss of its DNA binding activity[3132] In case of human ZIC3 the tryptophan muta-tion is proposed to be associated with the occurrence of

a familial congenital heart defect and biochemical analy-sis revealed that transcriptional activity and protein sta-bility are decreased in the mutant protein [1314] Inyeast Zap1 interaction between the two ZF where thetCWCH2 motif is located is required for the high-affi-nity binding of the zinc ion [45] Also it has beenshown that the inter-ZF interaction is influenced by theconcentration of zinc ion [46] suggesting that thetCWCH2 motif is involved in zinc-sensing in yeastApart from the instances referred to above the func-

tional significance of the tCWCH2 domain remainsunknown We propose that tCWCH2 may play rolesother than the direct DNA binding that is often exertedby classical C2H2 because the linker lengths oftCWCH2 are more highly divergent than those of thecanonical C2H2 ZFs in which linker length is a criticalparameter for DNA-C2H2 ZF interactions [60-63]Furthermore the structure of GLI ZIC3 and Zap1tCWCH2 domains in which the two adjacent ZFs form asingle globular structure [71347] is clearly differentfrom the structure of the DNA-binding C2H2 ZF domainwhere each globular ZF unit wraps around the majorgroove of the DNA [7-1164] (PDBID GLI 2GLI YY11UBD WT1 2RPT Zif268 1MEY 1P47 TFIIIA 1TF3)In the case of GLI tCWCH2 which is the only tCWCH2with a known DNA-binding 3D structure the N-terminalZF of GLI1 tCWCH2 does not contact DNA [7]Assuming that the tCWCH2 domain has a function

other than DNA binding what roles could tCWCH2play Firstly our domain structure analysis implies thattCWCH2 likely regulates its neighboring domains suchas canonical C2H2 ZF WD40 and LIL ZicGliGlisPacC Mizf Aebp2 Zap1ZafA Fungl Clr1 and Fungl-4ZF possess canonical ZF(s) in the C-terminal flankingregion of the tCWCH2 In the case of PacC analyses ofCWCH2 motif-containing tryptophan mutants suggeststhat DNA binding mediated by ZF2 and ZF3 may beinfluenced by tCWCH2 motifs in ZF1 and ZF2 [32] Itis possible that tCWCH2 motifs in the other membersof this group act similarly in the regulation of neighbor-ing canonical C2H2 domains In transcription factorsDNA binding domains such as the canonical C2H2 ZFcan define a target sequence and fix the protein position(angle distance direction) with respect to the DNAThe tCWCH2 motif could provide a lsquohingersquo or modula-tory junction that controls the relative positioning ofother functional domains Secondly tCWCH2 motifsmay provide a capping structure that isolates the struc-tural influence of N-terminal flanking region In thecase of ZIC3 mutations in its tCWCH2 motif increasethe random coil contents more than the zinc-free state[13] A unique terminal structure in a domain composedof repeating structural motifs is also seen in other con-served domains [65-67] In the case of small leucine-rich

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 12 of 18

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 13: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

repeats and ankyrin repeats N-capping structures areresponsible for protein stability [6667] ThirdlytCWCH2 binding to other molecules is possible and isobserved for many conserved domains Transcriptionalregulator and chromatin remodeling proteins form com-plexes with many different proteins in order to be func-tionally active [6869] Therefore the tCWCH2 ZF maynot be associated with DNA binding but may serve inprotein recognition similar to many C2H2 ZFs [5]The distribution and conservation of the tCWCH2

sequence motif in various gene classes suggests that thismotif has important biological roles The genes we iden-tified containing one or more tCWCH2 motifs areknown to be involved in various biological processesincluding chromatin remodeling (Arid2Rsc9) zinchomeostasis (Zap1ZafA) pH sensing (PacC) cell cycleregulation and transcriptional regulation (ZicGliGlisPacC Mizf Aebp2 Zap1ZafA) In addition given thatseveral gene classes remain to be functionally character-ized (including Fungl Zfp106 Twincl Clr1 Fungl-4ZFMonosiga gene DdHp) the known functional repertoireof the tCWCH2 motif is likely to increase We proposethat the tCWCH2 sequence motif is a widespread andfunctional protein structural motif and that furtherstructural and functional analyses are required to fullyunderstand its roles in biological processes

Conclusions(1) The 3D structure of tCWCH2 is highly conservedamong human ZIC3 human GLI1 and yeast Zap1(2) Current databases contained 587 tCWCH2

sequences that can be classified into 11 major geneclasses (ZicGliGlis Arid2Rsc9 PacC Mizf Aebp2Zap1ZafA Fungl Zfp106 Twincl Clr1 and Fungl-4ZF)and other minority classes The tCWCH2 motifs werefound in transcription regulatory factors chromatinremodeling factors and pHzinc homeostasis regulatingfactors(3) The tCWCH2 motif is mostly found in Opistho-

konta (metazoa fungi and choanoflagellates) andAmoebozoa (amoeba Dictyostelium discoideum)(4) The tCWCH2 motif contains three additional struc-

tural features (i) bias to hydrophobic residues in the j1and j2 positions (ii) longer linkers between the two C2H2motifs and (iii) insertion of extra sequences The struc-tural features suggest that the tCWCH2 motif is a specia-lized motif involved in inter-zinc finger interactions

MethodsProtein structural analysisProtein structures were obtained from the NCBI(National Center for Biotechnology Information) databaseas PDB format files and we used iMol httpwwwpirxcomiMol and Swiss-PdbViewer httpspdbvvital-itch

as the PDB viewer applications The latter was also usedfor superimposing multiple protein structures Magic fitoptions were selected for the superimposition of structuredata because the number of amino acid residues was dif-ferent between proteins Zif268 (PDBID 1P47) andTFIIIA (PDBID 2J7J) were used for the reference struc-ture of the classical C2H2 ZF For the distance analysis ofthe neighboring amino acid residues we used ldquoNeighborsof Selected Residue Side chainrdquo option of the Swiss-PdbViewer The software showed the list of the neighbor-ing residues with distances We dealt with the residues ifthe same residues were detected as the neighbors inmore than 10 models of ZIC3 and Zap1 PDB files thatinclude 20 models respectively

Database searchWe conducted web-based BLAST searches using PatternHit Initiated (PHI)-BLAST of NCBI databases httpblastncbinlmnihgovBlastcgi For the sequence pattern of thePHI-BLAST searches we used ldquoxCxWx(235)Cx(515)Hx(25)Hx(525)CxWx(13)Cx(517)Hx(25)Hxrdquo The initialsequence pattern was deduced from the compilation ofthe Zic family sequences [1570] and subsequently theintervening sequence lengths were modified to the abovepattern until no additional sequences appeared in thesearch The seed sequences obtained by PHI-BLASTsearch were first classified into gene families based ontheir annotation and on phylogenetic tree analyses(see below) Then a TBLASTN search was performedagainst (1) all of the following databases GenBank+RefSeqNucleotides+EMBL+DDBJ+PDB excluding whole-genomeshotgun sequences and expressed sequence tags (nr optionin the NCBI TBLASTN search) and (2) whole genomeshotgun sequences of the organisms shown in Table 2The whole genome sequence TBLASTN search was per-formed using representative sequences for each genefamily (Table 3) initially on a eukaryote-wide basis andthen on a metazoa-and-fungus-wide basis For the meta-zoa-and-fungus-wide search sequences closely related tothe selected genes were searched in the following gene-species combinations Arid2Rsc9 PacC Zap1ZafAFungl Twincl Clr1 and Fungl-4ZF were searched infungi and ZicGliGlis Arid2Rsc9 Mizf Aebp2 andZfp106 were searched in metazoa The target organismsare listed in Table 2 These organisms were selected onthe basis of the following criteria (1) high genomesequence coverage (gt5times) (2) member of a major Eukar-yote taxon or considered to be in a phylogeneticallyimportant position The collected sequences were manu-ally checked for the presence of the sequence of thetCWCH2 motif In addition we performed proteinBLAST searches using representative sequences for 16gene families (Aebp2 Arid2 Clr1 Csr1 Fungl Fungl-4ZFGli Glis Mizf PacC Rsc9 Twincl ZafA Zap1 Zfp106

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 13 of 18

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 14: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Table 2 Eukaryotic organisms used for the TBLASTN search

Organism Supergroup Lower taxon

Entamoeba W Amoebozoa Entamoebida

Dictyostelium W Amoebozoa Mycetozoa

Paramecium W Chromalveolata Ciliata

Tetrahymena W Chromalveolata Ciliata

Guillardia W Chromalveolata Cryptophyta

Cryptosporidium W Chromalveolata Dinoflagella

Giardia W Excavata Diplomonadia

Euglena gracilis W Excavata Euglenozoa

Trichomonas vaginalis W Excavata Parabasalidea

Monosiga W Opisthokonta Choanoflagellata

Batrachochytrium dendrobatidis MF Opisthokonta Fungi Chytridiomycota

Aspergillus W Opisthokonta Fungi Dikarya Ascomycota

Aspergillus fumigatus MF Opisthokonta Fungi Dikarya Ascomycota

Candida W Opisthokonta Fungi Dikarya Ascomycota

Candida albicans MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans W Opisthokonta Fungi Dikarya Ascomycota

Neurospora W Opisthokonta Fungi Dikarya Ascomycota

Neurospora crassa MF Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Saccharomyces cerevisiae MF Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces W Opisthokonta Fungi Dikarya Ascomycota

Schizosaccharomyces pombe MF Opisthokonta Fungi Dikarya Ascomycota

Yarrowia lipolytica MF Opisthokonta Fungi Dikarya Ascomycota

Cryptococcus neoformans MF Opisthokonta Fungi Dikarya Basidiomycota

Ustilago maydis MF Opisthokonta Fungi Dikarya Basidiomycota

Antonospora locustae MF Opisthokonta Fungi Microsporidia

Encephalitozoon W Opisthokonta Fungi Microsporidia

Encephalitozoon cuniculi MF Opisthokonta Fungi Microsporidia

Rhizopus oryzae MF Opisthokonta Fungi Mucoromycotina

Monosiga brevicollis MF Opisthokonta Metazoa Choanoflagellida

Homo W Opisthokonta Metazoa Chordata

Nematostella W Opisthokonta Metazoa Cnidaria

Nematostella vectensis MF Opisthokonta Metazoa Cnidaria Anthozoa

Platynereis dumerilii MF Opisthokonta Metazoa Cnidaria Anthozoa

Hydra magnipapillata MF Opisthokonta Metazoa Cnidaria Hydrozoa

Hydra vulgaris MF Opisthokonta Metazoa Cnidaria Hydrozoa

Lottia gigantea MF Opisthokonta Metazoa Mollusca Gastropoda

Trichoplax W Opisthokonta Metazoa Placozoa

Trichoplax adhaerens MF Opisthokonta Metazoa Placozoa Trichoplax

Schmidtea mediterranea MF Opisthokonta Metazoa Platyhelminthes

Chlamydomonas W Plantae Chlorophyta

Arabidopsis W Plantae Tracheophyta

Oryza sativa W Plantae Tracheophyta

Bigelowiella W Rhizaria Cercozoa

Species name and taxa are indicated W = Eukaryote-wide search MF = metazoa-and-fungus-wide search

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 14 of 18

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 15: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Zic) irrespective of the presence of the tCWCH2 motifThese searches revealed the extent of tCWCH2 motif con-servation in each gene family Redundant sequencesderived from a single gene in a single species wereremoved from the collected sequences except for onerepresentative sequence We obtained 587 non-redundantsequences containing the tCWCH2 sequence motifFor the re-evaluation of the initial PHI-BLAST pat-

tern we generated a tCWCH2 consensus sequenceof ldquoLVCKWDGCSEKLFDSPEELVDHVCEDHVGTQLEY-TCLWKGCDRFPFKSRYKLIRHIRSHTGEKPrdquo This

sequence was constructed using the following steps (1)alignment of tCWCH2 by ClustalX software [71] (2)elimination of positions where the frequency of theinserted space ldquo-rdquo was gt50 (3) selection of the aminoacid residues most frequently appearing in each positionWe performed PHI-BLAST again using four patterns(Additional file 3)Conserved protein domains in the collected sequences

were identified by searching the Conserved DomainDatabase httpwwwncbinlmnihgovStructurecddcddshtml that included domains imported from Pfamand SMART databases

Sequence analysisAmino acid sequences of the tCWCH2 motif and oneC2H2 ZF in the carboxy flanking were aligned usingClustalX software [71] with default parameter settingsThe final adjustment of the alignment was performedmanually For the phylogenic tree analyses MEGA4Neighbor-Joining tree (NJ JTT model alpha value wasdetermined by Tree-Puzzle [72]) httpwwwmegasoft-warenet [7374] RAxML Maximal Likelihood (MLWAG model) tree httpphylobenchvital-itchraxml-bb [75] and MrBayes312 Bayesian Inference (BIWAG model) tree [76] were used To generate theconsensus sequence figure the aligned sequence wassubjected to a web-based analysis WebLogo httpweblogoberkeleyedu[77] The PDOC00028 alignmentin the Prosite database (12328 sequences httpauexpasyorgprosite) was used for the reference sequencerepresenting the general C2H2 domain This alignmentwas entered directly into WebLogo and subjected to lin-ker length analysis We aligned all of the sequences andthen extracted the alignment of each taxon to generatethe hierarchically ordered alignment of the consensussequences The linker sequence length between the twoC2H2 ZF motifs was defined by the number of aminoacid residues between the C-terminal histidine residueof the first ZF and the N-terminal cysteine residue ofthe second ZFThe PDOC00028 sequence alignment contains the

gene name and species together with the start and endposition number (eg GLI1_HUMAN301-330) We cal-culated the linker length by subtracting the end-positionnumber from the start-position number for two ZFsWe excluded linker lengths that were too long(gt41) because the average length of general C2H2sequences (PDOC00028) was 2795 (n = 12328 standarddeviation = 139 max = 42 min = 10) and the maxi-mum linker length of tCWCH2 was 37

List of abbreviations3D three-dimensional tCWCH2 tandem CWCH2 ZFzinc finger ZFD zinc finger domain

Table 3 Query sequences for the TBLASTN search

Gene ID Organism Supergroup

AEBP2 EAW96398 Homo sapiens OpisthokontaMetazoa

ARID2 BAB13383 Homo sapiens OpisthokontaMetazoa

Clr1 NP_596236 Schizosaccharomycespombe

OpisthokontaFungi

DdHp XP_646336 Dictyosteliumdiscoideum

Amoebozoa

Fungl XP_752394 Aspergillus fumigatus OpisthokontaFungi

Fungl XM_572828 Cryptococcusneoformans

OpisthokontaFungi

GLI1 1405326A Homo sapiens OpisthokontaMetazoa

GLIS2 NP_115964 Homo sapiens OpisthokontaMetazoa

MIZF NP_945322 Homo sapiens OpisthokontaMetazoa

PacC Q00203 Aspergillus niger OpisthokontaFungi

REF6 XP_001771543 Physcomitrella patens Plantae

Rsc9 NP_596197 Schizosaccharomycespombe

OpisthokontaFungi

Twincl XP_751009 Aspergillus fumigatus OpisthokontaFungi

Twincl XP_370514 Magnaporthe grisea OpisthokontaFungi

Unclassified XP_001876539 Laccaria bicolor OpisthokontaFungi

ZafA ABJ98717 Aspergillus fumigatus OpisthokontaFungi

Zap1ZafA AAD26467 Candida albicans OpisthokontaFungi

Zap1ZafA XP_001797205 Phaeosphaerianodorum

OpisthokontaFungi

Zap1ZafA XP_001930819 Pyrenophora tritici-repentis

OpisthokontaFungi

Zap1 NP_012479 Saccharomycescerevisiae

OpisthokontaFungi

ZFP106 NP_071918 Homo sapiens OpisthokontaMetazoa

ZIC3 NP_003404 Homo sapiens OpisthokontaMetazoa

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 15 of 18

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 16: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Additional file 1 Neighboring residues of the tryptophans intCWCH2 3D structure (A) Distances of the tCWCH2-forming residuesfrom the tCWCH2 ZF1 tryptophan (left) and the tCWCH2 ZF2 tryptophan(right) Black and red letters indicate residues in the ZF1 and the ZF2respectively Gothic letters zinc chelating residue j1 and j2 in Figure5 and 6 Note that both tryptophan residues were close to the residuesin the other C2H2 ZF unit (B) Stereo view of the ZIC3 GLI1 and Zap1tryptophan side chain adjacent residues within 5 Aring Tryptophan residueszinc chelating histidine j1 and j2 residues are labeled For ZIC3 andZap1 20 NMR models in wire model are superimposed For GLI1cylinder model are shown White carbon red oxygen blue nitrogenyellow sulfurClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S1PDF ]

Additional file 2 List of tandem CWCH2-containing genes Gene IDof NCBI or Ensembl httpwwwensemblorg or JGI Lottia giganteagenome httpgenomejgi-psforgLotgi1Lotgi1homehtml database andorganism names that we collected are listed and categorized accordingto gene group The EDV29271 Trichoplax adhaerens Zic amino acidsequence was manually added from the genome database and indicatesas ldquoEDV29271+editrdquo The inset in right bottom indicates the numbers ofnon-redundant sequences in each classClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S2PDF ]

Additional file 3 PHI-BLAST pattern validity Percentage occurrenceof the PHI-BLAST pattern in the tCWCH2 library (A) Most frequent aminoacid sequence for tCWCH2 in the PHI-BLAST result A loose searchpattern increases the number of BLAST hits and increases thepercentages of recovery of library (B) The tCWCH2 sequence motifcollection contains 551 sequences from the NCBI protein database and36 sequences from the NCBI nucleotide or other databasesClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S3PDF ]

Additional file 4 The sequence of the conserved domains of Arid2and Rsc9 The ARID domain (A) and the ZF domain (B) are conserved inArid2 and Rsc9 Core amino acid residues of the ARID domain areindicated by asterisks () in (A) The structure of the tCWCH2 motifs andflanking sequence are shown in (B) Conserved sequence [IxL (ST) AxL (IV) L (KR) N (IL) x(KR)] are indicated by asterisks () Hs Homo sapiensDr Danio rerio Bm Brugia malayi Dm Drosophila melanogaster NcNeurospora crassa Spo Schizosaccharomyces pombeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S4PDF ]

Additional file 5 Zinc finger domains of Mizf We found novel ZFsin Mizf A pseudo ZF structure was found between ZF4 and ZF5(indicated as ZF) Hs Homo sapiens Dr Danio rerio Ta Trichoplaxadhaerens Ci Ciona intestinalis Dm Drosophila melanogaster CeCaenorhabditis elegansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S5PDF ]

Additional file 6 Sequence alignment of Zap1 and ZafA Zinchomeostasis genes Zap1 and ZafA have structural similarity Conservationof ZF8 was weak Um Ustilago maydis Cn Cryptococcus neoformans AfuAspergillus fumigatus Mg Magnaporthe grisea Yl Yarrowia lipolytica CaCandida albicans Sc Saccharomyces cerevisiaeClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S6PDF ]

Additional file 7 Sequence alignment of zinc finger domain ofFungl human GLI3 and Fungl-4ZF genes We propose the commongene name Fungl Fungus Gli like Afu Aspergillus fumigatus NcNeurospora crassa Yl Yarrowia lipolytica Mgl Malasszia globosa UmUstilago maydis Ro Rhizopus oryzae Ecu Encephalitozoon cuniculi CaCandida albicans Lbi Laccaria bicolor Eb Enterocytozoon bieneusi AloAntonospora locustae Cn Cryptococcus neoformans Hs Homo sapiensBde Batrachochytrium dendrobatidisClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S7PDF ]

Additional file 8 Sequence alignment of Twincl zinc finger domainBased on its structural features we named this gene family Twincl(Tandem CWCH2 protein C-terminal) Afu Aspergillus fumigatus NfNeosartorya fischeri Cim Coccidioides immitis Pn Phaeosphaeria nodorumGz Gibberella zeae Mgr Magnaporthe grisea Nc Neurospora crassaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S8PDF ]

Additional file 9 Sequence alignment of Clr1 zinc finger domainSpo Schizosaccharomyces pombe Sj Schizosaccharomyces japonicus CnCryptococcus neoformansClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S9PDF ]

Additional file 10 Sequence alignment of Monosiga Dictyosteliumand plant gene (A) Zinc finger domain alignment of Dictyosteliumdiscoideum hypothetical protein (DdHp) and Monosiga brevicollis tCWCH2DdHp2 contains two tCWCH2 domains ZF12 (ZF1-ZF2) and ZF34 (ZF3-ZF4) (B) Plant REF6 Physcomitrella patens (XP_001771543) gene hastCWCH2 sequence It has strong homology to Arabidopsis REF6(NP_680116) Oryza sativa gene (NP_001045137) and Vitis vinifera(CAO64178) gene but the tryptophan residue is not conserved inArabidopsis and other plants An alignment of zinc finger domain ofthese genes is shown Amino acid identity between sequences is shownby a dot ()Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S10PDF ]

Additional file 11 Sequence alignment for phylogenic tree analysisConserved cysteine and histidine residues are indicated with an asterisk() Afr Artemia franciscana Afu Aspergillus fumigatus Aga Anophelesgambiae Alo Antonospora locustae Ape Asterina pectinifera BdeBatrachochytrium dendrobatidis Bfl Branchiostoma floridae Cin Cionaintestinalis Cn Cryptococcus neoformans Co Corbicula sp Da Dicyemaacuticephalum Dd Dictyostelium discoideum Dj Dugesia japonica DmDrosophila melanogaster Hr Halocynthia roretzi Hs Homo sapiens HvHydra vulgaris Lbl Loligo bleekeri Mms Mus musculus Nf Neosartoryafischeri Nve Nematostella vectensis Oo Octopus ocellatus Pi Pandinusimperator Ro Rhizopus oryzae Sc Saccharomyces cerevisiae SmSchistosoma mansoni Sp Strongylocentrotus purpuratus Sso Spisulasolidissima Ssu Scolionema suvaense Ta Trichoplax adhaerens Tt Tubifextubifex Um Ustilago maydis Xl Xenopus laevis Yl Yarrowia lipolyticaClick here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S11PDF ]

Additional file 12 BI tree of tCWCH2+1ZF BI tree analysis wasperformed on 8 times 106 generations using WAG model Average standarddeviation of split frequencies was lower than 0001 at the end of theanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S12PDF ]

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 16 of 18

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 17: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

Additional file 13 ML tree of tCWCH2+1ZF ML tree analysis wasperformed using WAG model with ldquoempirical base frequenciesrdquoldquomaximum likelihood searchrdquo and ldquoestimate proportion of invariablesitesrdquo options of RAxML [75] 100 replicates were set for the bootstrapanalysis Abbreviations are the same as those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S13PDF ]

Additional file 14 NJ tree of tCWCH2+1ZF NJ tree analysis wasperformed using JTT model and ldquopair wise deletionrdquo ldquoRate among site =different (gamma distribution = 062)rdquo options of MEGA4 (httpwwwmegasoftwarenet [7374]) The gamma distributions alpha parameterwas calculated by ldquoTree-Puzzlerdquo (httpwwwtree-puzzlede [72]) 1000replicates were set for the bootstrap analysis Abbreviations are the sameas those in Additional file 11Click here for file[ httpwwwbiomedcentralcomcontentsupplementary1471-2148-10-53-S14PDF ]

AcknowledgementsWe acknowledge Dr Hideko Urushihara (Tsukuba University) for valuablediscussions concerning the Dictyostelium genes This study was funded bythe RIKEN Brain Science Institute and Grant-in-Aid for ChallengingExploratory Research (21657034) from Japan Society for the Promotion ofScience

Author details1Laboratory for Behavioral and Developmental Disorders RIKEN Brain ScienceInstitute Wako-shi Saitama 351-0198 Japan 2Saitama University BrainScience Institute Saitama-shi Saitama 338-8570 Japan

Authorsrsquo contributionsMH carried out the sequence alignment 3D structure analysis and statisticalanalysis JA conceived and designed the study and analyzed the data MHand JA wrote the paper and approved the final manuscript

Received 3 September 2009Accepted 19 February 2010 Published 19 February 2010

References1 Wolfe SA Nekludova L Pabo CO DNA recognition by Cys2His2 zinc

finger proteins Annu Rev Biophys Biomol Struct 2000 29183-2122 Babu MM Iyer LM Balaji S Aravind L The natural history of the WRKY-

GCM1 zinc fingers and the relationship between transcription factorsand transposons Nucleic Acids Res 2006 346505-6520

3 Krishna SS Majumdar I Grishin NV Structural classification of zinc fingerssurvey and summary Nucleic Acids Res 2003 31532-550

4 Brown RS Zinc finger proteins getting a grip on RNA Curr Opin StructBiol 2005 1594-98

5 Brayer KJ Segal DJ Keep your fingers off my DNA protein-proteininteractions mediated by C2H2 zinc finger domains Cell Biochem Biophys2008 50111-131

6 Gamsjaeger R Liew CK Loughlin FE Crossley M Mackay JP Sticky fingerszinc-fingers as protein-recognition motifs Trends Biochem Sci 20073263-70

7 Pavletich NP Pabo CO Crystal structure of a five-finger GLI-DNAcomplex new perspectives on zinc fingers Science 1993 2611701-1707

8 Foster MP Wuttke DS Radhakrishnan I Case DA Gottesfeld JM Wright PEDomain packing and dynamics in the DNA complex of the N-terminalzinc fingers of TFIIIA Nat Struct Biol 1997 4605-608

9 Pavletich NP Pabo CO Zinc finger-DNA recognition crystal structure of aZif268-DNA complex at 21 Aring Science 1991 252809-817

10 Houbaviy HB Usheva A Shenk T Burley SK Cocrystal structure of YY1bound to the adeno-associated virus P5 initiator Proc Natl Acad Sci USA1996 9313577-13582

11 Stoll R Lee BM Debler EW Laity JH Wilson IA Dyson HJ Wright PEStructure of the Wilms tumor suppressor protein zinc finger domainbound to DNA J Mol Biol 2007 3721227-1245

12 Papworth M Kolasinska P Minczuk M Designer zinc-finger proteins andtheir applications Gene 2006 36627-38

13 Hatayama M Tomizawa T Sakai-Kato K Bouvagnet P Kose S Imamoto NYokoyama S Utsunomiya-Tate N Mikoshiba K Kigawa T Aruga JFunctional and structural basis of the nuclear localization signal in theZIC3 zinc finger domain Hum Mol Genet 2008 173459-3473

14 Chhin B Hatayama M Bozon D Ogawa M Schon P Tohmonda TSassolas F Aruga J Valard AG Chen SC Bouvagnet P Elucidation ofpenetrance variability of a ZIC3 mutation in a family with complex heartdefects and functional analysis of ZIC3 mutations in the first zinc fingerdomain Hum Mutat 2007 28563-570

15 Aruga J Kamiya A Takahashi H Fujimi TJ Shimizu Y Ohkawa K Yazawa SUmesono Y Noguchi H Shimizu T et al A wide-range phylogeneticanalysis of Zic proteins implications for correlations between proteinstructure conservation and body plan complexity Genomics 200687783-792

16 Wang BS Grant RA Pabo CO Selected peptide extension contactshydrophobic patch on neighboring zinc finger and mediatesdimerization on DNA Nat Struct Biol 2001 8589-593

17 Aruga J The role of Zic genes in neural development Mol Cell Neurosci2004 26205-221

18 Merzdorf CS Emerging roles for zic genes in early development Dev Dyn2007 236922-940

19 Matise MP Joyner AL Gli genes in development and cancer Oncogene1999 187852-7859

20 Koebernick K Pieler T Gli-type zinc finger proteins as bipotentialtransducers of Hedgehog signaling Differentiation 2002 7069-76

21 Zhang J Tang Q Zimmerman-Kordmann M Reutter W Fan H Activationof B lymphocytes by GLIS a bioactive proteoglycan from Ganodermalucidum Life Sci 2002 71623-638

22 Attanasio M Uhlenhaut NH Sousa VH OrsquoToole JF Otto E Anlag KKlugmann C Treier AC Helou J Sayer JA et al Loss of GLIS2 causesnephronophthisis in humans and mice by increased apoptosis andfibrosis Nat Genet 2007 391018-1024

23 Senee V Chelala C Duchatelet S Feng D Blanc H Cossec JC Charon CNicolino M Boileau P Cavener DR et al Mutations in GLIS3 areresponsible for a rare syndrome with neonatal diabetes mellitus andcongenital hypothyroidism Nat Genet 2006 38682-687

24 Sakai-Kato K Ishiguro A Mikoshiba K Aruga J Utsunomiya-Tate N CDspectra show the relational style between Zic- Gli- Glis-zinc fingerprotein and DNA Biochim Biophys Acta 2008 17841011-1019

25 Mizugishi K Aruga J Nakata K Mikoshiba K Molecular properties of Zicproteins as transcriptional regulators and their relationship to GLIproteins J Biol Chem 2001 2762180-2188

26 Beak JY Kang HS Kim YS Jetten AM Functional analysis of the zinc fingerand activation domains of Glis3 and mutant Glis3(NDH1) Nucleic AcidsRes 2008 361690-1702

27 Damelin M Simon I Moy TI Wilson B Komili S Tempst P Roth FPYoung RA Cairns BR Silver PA The genome-wide localization of Rsc9 acomponent of the RSC chromatin-remodeling complex changes inresponse to stress Mol Cell 2002 9563-573

28 Mohrmann L Verrijzer CP Composition and functional specificity of SWI2SNF2 class chromatin remodeling complexes Biochim Biophys Acta 2005168159-73

29 Zhang X Azhar G Zhong Y Wei JY Zipzapp200 is a novel zinc fingerprotein contributing to cardiac gene regulation Biochem Biophys ResCommun 2006 346794-801

30 Tilburn J Sarkar S Widdick DA Espeso EA Orejas M Mungroo JPenalva MA Arst HN Jr The Aspergillus PacC zinc finger transcriptionfactor mediates regulation of both acid- and alkaline-expressed genesby ambient pH Embo J 1995 14779-790

31 Fernandez-Martinez J Brown CV Diez E Tilburn J Arst HN Jr Penalva MAEspeso EA Overlap of nuclear localisation signal and specific DNA-binding residues within the zinc finger domain of PacC J Mol Biol 2003334667-684

32 Espeso EA Tilburn J Sanchez-Pulido L Brown CV Valencia A Arst HN JrPenalva MA Specific DNA recognition by the Aspergillus nidulans threezinc finger transcription factor PacC J Mol Biol 1997 274466-480

33 Vincent O Rainbow L Tilburn J Arst HN Jr Penalva MA YPXLI is a proteininteraction motif recognized by aspergillus PalA and its humanhomologue AIP1Alix Mol Cell Biol 2003 231647-1655

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 17 of 18

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References
Page 18: RESEARCH ARTICLE Open Access Characterization of … · RESEARCH ARTICLE Open Access Characterization of the ... the two N-terminal ZFs form a single ... Many classes of ZFs have

34 Sekimata M Homma Y Sequence-specific transcriptional repression byan MBD2-interacting zinc finger protein MIZF Nucleic Acids Res 200432590-597

35 Sekimata M Takahashi A Murakami-Sekimata A Homma Y Involvement ofa novel zinc finger protein MIZF in transcriptional repression byinteracting with a methyl-CpG-binding protein MBD2 J Biol Chem 200127642632-42638

36 Sekimata M Homma Y Regulation of Rb gene expression by an MBD2-interacting zinc finger protein MIZF during myogenic differentiationBiochem Biophys Res Commun 2004 325653-659

37 He GP Kim S Ro HS Cloning and characterization of a novel zinc fingertranscriptional repressor A direct role of the zinc finger motif inrepression J Biol Chem 1999 27414678-14684

38 Sedaghat Y Miranda WF Sonnenfeld MJ The jing Zn-finger transcriptionfactor is a mediator of cellular differentiation in the Drosophila CNSmidline and trachea Development 2002 1292591-2606

39 Cao R Zhang Y The functions of E(Z)EZH2-mediated methylation oflysine 27 in histone H3 Curr Opin Genet Dev 2004 14155-164

40 Cao R Zhang Y SUZ12 is required for both the histonemethyltransferase activity and the silencing function of the EED-EZH2complex Mol Cell 2004 1557-67

41 Kim H Kang K Kim J AEBP2 as a potential targeting protein forPolycomb Repression Complex PRC2 Nucleic Acids Res 2009 372940-2950

42 Liu Y Montell DJ Jing a downstream target of slbo required fordevelopmental control of border cell migration Development 2001128321-330

43 Zhao H Eide DJ Zap1p a metalloregulatory protein involved in zinc-responsive transcriptional regulation in Saccharomyces cerevisiae MolCell Biol 1997 175044-5052

44 Kim MJ Kil M Jung JH Kim J Roles of Zinc-responsive transcriptionfactor Csr1 in filamentous growth of the pathogenic Yeast Candidaalbicans J Microbiol Biotechnol 2008 18242-247

45 Bird AJ Zhao H Luo H Jensen LT Srinivasan C Evans-Galea M Winge DREide DJ A dual role for zinc fingers in both DNA binding and zincsensing by the Zap1 transcriptional activator Embo J 2000 193704-3713

46 Qiao W Mooney M Bird AJ Winge DR Eide DJ Zinc binding to aregulatory zinc-sensing domain monitored in vivo by using FRET ProcNatl Acad Sci USA 2006 1038674-8679

47 Wang Z Feng LS Matskevich V Venkataraman K Parasuram P Laity JHSolution structure of a Zap1 zinc-responsive domain provides insightsinto metalloregulatory transcriptional repression in Saccharomycescerevisiae J Mol Biol 2006 3571167-1183

48 Moreno MA Ibrahim-Granet O Vicentefranqueira R Amich J Ave P Leal FLatge JP Calera JA The regulation of zinc homeostasis by the ZafAtranscriptional activator is essential for Aspergillus fumigatus virulenceMol Microbiol 2007 641182-1197

49 Kim WI Lee WB Song K Kim J Identification of a putative DEAD-box RNAhelicase and a zinc-finger protein in Candida albicans by functionalcomplementation of the S cerevisiae rok1 mutation Yeast 200016401-409

50 Grasberger H Bell GI Subcellular recruitment by TSG118 and TSPYLimplicates a role for zinc finger protein 106 in a novel developmentalpathway Int J Biochem Cell Biol 2005 371421-1437

51 Grasberger H Ye H Mashima H Bell GI Dual promoter structure ofZFP106 regulation by myogenin and nuclear respiratory factor-1 Gene2005 344143-159

52 Zuberi AR Christianson GJ Mendoza LM Shastri N Roopenian DCPositional cloning and molecular characterization of animmunodominant cytotoxic determinant of the mouse H3 minorhistocompatibility complex Immunity 1998 9687-698

53 Feldman RM Correll CC Kaplan KB Deshaies RJ A complex of Cdc4pSkp1p and Cdc53pcullin catalyzes ubiquitination of the phosphorylatedCDK inhibitor Sic1p Cell 1997 91221-230

54 Nakayama J Saito M Nakamura H Matsuura A Ishikawa F TLP1 a geneencoding a protein component of mammalian telomerase is a novelmember of WD repeats family Cell 1997 88875-884

55 Renault L Nassar N Vetter I Becker J Klebe C Roth M Wittinghofer A The17 Aring crystal structure of the regulator of chromosome condensation(RCC1) reveals a seven-bladed propeller Nature 1998 39297-101

56 Thon G Klar AJ The clr1 locus regulates the expression of the crypticmating-type loci of fission yeast Genetics 1992 131287-296

57 James TY Kauff F Schoch CL Matheny PB Hofstetter V Cox CJ Celio GGueidan C Fraker E Miadlikowska J et al Reconstructing the earlyevolution of Fungi using a six-gene phylogeny Nature 2006 443818-822

58 Hibbett DS Binder M Bischoff JF Blackwell M Cannon PF Eriksson OEHuhndorf S James T Kirk PM Lucking R et al A higher-level phylogeneticclassification of the Fungi Mycol Res 2007 111509-547

59 Philippe H Zhou Y Brinkmann H Rodrigue N Delsuc F Heterotachy andlong-branch attraction in phylogenetics BMC Evol Biol 2005 550

60 Laity JH Dyson HJ Wright PE DNA-induced alpha-helix capping inconserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers J Mol Biol 2000 295719-727

61 Laity JH Dyson HJ Wright PE Molecular basis for modulation ofbiological function by alternate splicing of the Wilmsrsquo tumor suppressorprotein Proc Natl Acad Sci USA 2000 9711932-11935

62 Peisach E Pabo CO Constraints for zinc finger linker design as inferredfrom X-ray crystal structure of tandem Zif268-DNA complexes J Mol Biol2003 3301-7

63 Elrod-Erickson M Rould MA Nekludova L Pabo CO Zif268 protein-DNAcomplex refined at 16 Aring a model system for understanding zinc finger-DNA interactions Structure 1996 41171-1180

64 Kim CA Berg JM A 22 Aring resolution crystal structure of a designed zincfinger protein bound to DNA Nat Struct Biol 1996 3940-945

65 Kobe B Kajava AV The leucine-rich repeat as a protein recognition motifCurr Opin Struct Biol 2001 11725-732

66 Merz T Wetzel SK Firbank S Pluckthun A Grutter MG Mittl PR Stabilizingionic interactions in a full-consensus ankyrin repeat protein J Mol Biol2008 376232-240

67 McEwan PA Scott PG Bishop PN Bella J Structural correlations in thefamily of small leucine-rich repeat proteins and proteoglycans J StructBiol 2006 155294-305

68 Monahan BJ Villen J Marguerat S Bahler J Gygi SP Winston F Fissionyeast SWISNF and RSC complexes show compositional and functionaldifferences from budding yeast Nat Struct Mol Biol 2008 15873-880

69 Ishiguro A Ideta M Mikoshiba K Chen DJ Aruga J Zic2-dependenttranscriptional regulation is mediated by DNA-dependent proteinkinase Poly-ADP ribose polymerase and RNA helicase A J Biol Chem2007 2829983-9995

70 Aruga J Odaka YS Kamiya A Furuya H Dicyema Pax6 and Zic tool-kitgenes in a highly simplified bilaterian BMC Evol Biol 2007 7201

71 Thompson JD Gibson TJ Plewniak F Jeanmougin F Higgins DG TheCLUSTAL_X windows interface flexible strategies for multiple sequencealignment aided by quality analysis tools Nucleic Acids Res 1997254876-4882

72 Schmidt HA Strimmer K Vingron M von Haeseler A TREE-PUZZLEmaximum likelihood phylogenetic analysis using quartets and parallelcomputing Bioinformatics 2002 18502-504

73 Kumar S Nei M Dudley J Tamura K MEGA a biologist-centric softwarefor evolutionary analysis of DNA and protein sequences Brief Bioinform2008 9299-306

74 Tamura K Dudley J Nei M Kumar S MEGA4 Molecular EvolutionaryGenetics Analysis (MEGA) software version 40 Mol Biol Evol 2007241596-1599

75 Stamatakis A Hoover P Rougemont J A rapid bootstrap algorithm for theRAxML Web servers Syst Biol 2008 57758-771

76 Ronquist F Huelsenbeck JP MrBayes 3 Bayesian phylogenetic inferenceunder mixed models Bioinformatics 2003 191572-1574

77 Crooks GE Hon G Chandonia JM Brenner SE WebLogo a sequence logogenerator Genome Res 2004 141188-1190

78 Galagan JE Henn MR Ma LJ Cuomo CA Birren B Genomics of the fungalkingdom insights into eukaryotic biology Genome Res 2005151620-1631

79 Song J Xu Q Olsen R Loomis WF Shaulsky G Kuspa A Sucgang RComparing the Dictyostelium and Entamoeba genomes reveals anancient split in the Conosa lineage PLoS Comput Biol 2005 1e71

80 Archibald JM The puzzle of plastid evolution Curr Biol 2009 19R81-88

doi1011861471-2148-10-53Cite this article as Hatayama and Aruga Characterization of the tandemCWCH2 sequence motif a hallmark of inter-zinc finger interactions BMCEvolutionary Biology 2010 1053

Hatayama and Aruga BMC Evolutionary Biology 2010 1053httpwwwbiomedcentralcom1471-21481053

Page 18 of 18

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
      • Results
        • Three-dimensional structure of known tCWCH2 motifs
        • Database search for tCWCH2-containing sequences
        • Classification and domain structure of tCWCH2 sequence-containing proteins
        • Function structure and classification of the gene classes containing tCWCH2
          • ZicGliGlis
          • Arid2Rsc9
          • PacC
          • Mizf
          • Aebp2
          • Zap1ZafA
          • Fungus-Gli-like (Fungl)
          • Zfp106
          • Tandem-CWCH2-protein-C-terminal (Twincl)
          • Clr1
          • Fungus-Gli-like-4ZF (Fungl-4ZF)
          • Dictyostelium discoideum tCWCH2
          • Monosiga brevicollis tCWCH2
          • Unclassified sequences
          • Isolated tCWCH2 in metazoan gene families
          • tCWCH2 sequences in plant databases
            • Similarity among tCWCH2 gene classes
            • Distribution of the tCWCH2 sequences in the phylogenetic tree
            • Additional sequence features in the tCWCH2 motifs
              • Discussion
                • The tCWCH2 motif is a hallmark of inter-zinc finger interactions
                • Evolution of the tCWCH2 motif
                • Function and role of tCWCH2
                  • Conclusions
                  • Methods
                    • Protein structural analysis
                    • Database search
                    • Sequence analysis
                      • List of abbreviations
                      • Acknowledgements
                      • Author details
                      • Authors contributions
                      • References