sequence finishing and gene mapping for candida albicans and … · the degree of synteny that we...

14
Copyright © 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.034652 Sequence Finishing and Gene Mapping for Candida albicans Chromosome 7 and Syntenic Analysis Against the Saccharomyces cerevisiae Genome Hiroji Chibana,* ,1 Nao Oka,* Hironobu Nakayama, Toshihiro Aoyama, B. B. Magee, P. T. Magee and Yuzuru Mikami 1 *Research Center for Pathogenic Fungi and Microbial Toxicoses, Chiba University, Chiba 260-8673, Japan, Suzuka National College of Technology, Suzuka 510-0294, Japan and Department of Genetics, Cell Biology, and Development, University of Minnesota, Saint Paul, Minnesota 55108 Manuscript received August 10, 2004 Accepted for publication April 23, 2005 ABSTRACT The size of the genome in the opportunistic fungus Candida albicans is 15.6 Mb. Whole-genome shotgun sequencing was carried out at Stanford University where the sequences were assembled into 412 contigs. C. albicans is a diploid basically, and analysis of the sequence is complicated due to repeated sequences and to sequence polymorphism between homologous chromosomes. Chromosome 7 is 1 Mb in size and the best characterized of the 8 chromosomes in C. albicans. We assigned 16 of the contigs, ranging in length from 7309 to 267,590 bp, to chromosome 7 and determined sequences of 16 regions. These regions included four gaps, a misassembled sequence, and two major repeat sequences (MRS) of 16 kb. The length of the continuous sequence attained was 949,626 bp and provided complete coverage of chromosome 7 except for telomeric regions. Sequence analysis was carried out and predicted 404 genes, 11 of which included at least one intron. A 7-kb indel, which might be caused by a retrotransposon, was identified as the largest difference between the homologous chromosomes. Synteny analysis revealed that the degree of synteny between C. albicans and Saccharomyces cerevisiae is too weak to use for completion of the genomic sequence in C. albicans. C ANDIDA albicans is an important pathogenic fun- making it impossible to join contigs that flank them. gus, causing diseases ranging from superficial Additionally, sequence polymorphisms were identified thrush and vaginal infections in overtly healthy humans between homologous chromosomes (Chibana et al. to systemic infections of immunocompromised hosts, 2000). For these reasons, it appeared that it would be such as patients who have undergone organ transplants difficult to complete the sequence of the entire C. albi- or those undergoing intensive chemotherapy. In this cans genome after whole-genome shotgun sequencing. diploid organism, mating is regulated by a mating-type- As part of characterizing the genome, a physical map- like locus (Hull et al. 2000; Magee and Magee 2000; ping project in C. albicans strain 1161 is underway at Tzung et al. 2001), but a perfect meiotic or haploid the University of Minnesota (Chibana et al. 1998). A phase has not been identified. It is supposed that repro- macro-restriction map was constructed using Sfi I, reveal- duction is primarily by clonal propagation (Pujol et al. ing that the chromosome number is 8 and that the size 1993; Graser et al. 1996; Lott et al. 1999; Xu et al. of the genome is 16 Mb (Chu et al. 1993). Chromosome 1999), which has led to sequence polymorphisms be- 7 was shown to be composed of four Sfi I fragments tween homologous chromosomes (Lott et al. 1999). named 7C, 7A, 7F, and 7G (Chu et al. 1993). A more Because meiotic analysis is not available in this fungus, accurate physical map for chromosome 7 was constructed the whole-genome sequence will facilitate efficient and using fosmid contiguous clones and random breakage expedient research advances. However, to attain it, we mapping (Chibana et al. 1998). In parallel, whole- had to overcome several obstacles. Major repeat se- genome shotgun sequencing and assembly for the C. quences (MRSs), the length of which are at least 16 kbp, albicans strain SC5314 genome was carried out at the are located on all chromosomes except chromosome 3 Stanford Genome Technology Center. The most recent (Chibana et al. 1998; Chindamporn et al. 1998), and assembly of the sequences, assembly 19, is composed of partial sequences of MRS are distributed in the genome, 412 supercontigs, including 146 homologous pairs (http://www-sequence.stanford.edu/group/candida/ index.html). For this study, one from each pair of the The entire chromosome 7 sequence has been deposited at DDBJ/ homologous contigs was arbitrarily discarded, and the EMBL/GenBank under the project accession no. AP006852. remaining contigs were used to create a reference hap- 1 Corresponding author: Research Center for Pathogenic Fungi and loid genome consisting of 266 supercontigs ( Jones et Microbial Toxicoses, Chiba University, 1-8-1 Inohana Chuo-ku, Chiba 260-8673, Japan. E-mail: [email protected] al. 2004). We used this haploid set of supercontigs as Genetics 170: 1525–1537 ( August 2005)

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

Copyright © 2005 by the Genetics Society of AmericaDOI: 10.1534/genetics.104.034652

Sequence Finishing and Gene Mapping for Candida albicans Chromosome 7and Syntenic Analysis Against the Saccharomyces cerevisiae Genome

Hiroji Chibana,*,1 Nao Oka,* Hironobu Nakayama,† Toshihiro Aoyama,† B. B. Magee,‡

P. T. Magee‡ and Yuzuru Mikami1

*Research Center for Pathogenic Fungi and Microbial Toxicoses, Chiba University, Chiba 260-8673, Japan, †Suzuka NationalCollege of Technology, Suzuka 510-0294, Japan and ‡Department of Genetics, Cell Biology, and Development,

University of Minnesota, Saint Paul, Minnesota 55108

Manuscript received August 10, 2004Accepted for publication April 23, 2005

ABSTRACTThe size of the genome in the opportunistic fungus Candida albicans is 15.6 Mb. Whole-genome shotgun

sequencing was carried out at Stanford University where the sequences were assembled into 412 contigs.C. albicans is a diploid basically, and analysis of the sequence is complicated due to repeated sequencesand to sequence polymorphism between homologous chromosomes. Chromosome 7 is 1 Mb in size andthe best characterized of the 8 chromosomes in C. albicans. We assigned 16 of the contigs, ranging inlength from 7309 to 267,590 bp, to chromosome 7 and determined sequences of 16 regions. These regionsincluded four gaps, a misassembled sequence, and two major repeat sequences (MRS) of �16 kb. The lengthof the continuous sequence attained was 949,626 bp and provided complete coverage of chromosome 7except for telomeric regions. Sequence analysis was carried out and predicted 404 genes, 11 of whichincluded at least one intron. A 7-kb indel, which might be caused by a retrotransposon, was identified asthe largest difference between the homologous chromosomes. Synteny analysis revealed that the degreeof synteny between C. albicans and Saccharomyces cerevisiae is too weak to use for completion of the genomicsequence in C. albicans.

CANDIDA albicans is an important pathogenic fun- making it impossible to join contigs that flank them.gus, causing diseases ranging from superficial Additionally, sequence polymorphisms were identified

thrush and vaginal infections in overtly healthy humans between homologous chromosomes (Chibana et al.to systemic infections of immunocompromised hosts, 2000). For these reasons, it appeared that it would besuch as patients who have undergone organ transplants difficult to complete the sequence of the entire C. albi-or those undergoing intensive chemotherapy. In this cans genome after whole-genome shotgun sequencing.diploid organism, mating is regulated by a mating-type- As part of characterizing the genome, a physical map-like locus (Hull et al. 2000; Magee and Magee 2000; ping project in C. albicans strain 1161 is underway atTzung et al. 2001), but a perfect meiotic or haploid the University of Minnesota (Chibana et al. 1998). Aphase has not been identified. It is supposed that repro- macro-restriction map was constructed using SfiI, reveal-duction is primarily by clonal propagation (Pujol et al. ing that the chromosome number is 8 and that the size1993; Graser et al. 1996; Lott et al. 1999; Xu et al. of the genome is 16 Mb (Chu et al. 1993). Chromosome1999), which has led to sequence polymorphisms be- 7 was shown to be composed of four SfiI fragmentstween homologous chromosomes (Lott et al. 1999). named 7C, 7A, 7F, and 7G (Chu et al. 1993). A moreBecause meiotic analysis is not available in this fungus, accurate physical map for chromosome 7 was constructedthe whole-genome sequence will facilitate efficient and using fosmid contiguous clones and random breakageexpedient research advances. However, to attain it, we mapping (Chibana et al. 1998). In parallel, whole-had to overcome several obstacles. Major repeat se- genome shotgun sequencing and assembly for the C.quences (MRSs), the length of which are at least 16 kbp, albicans strain SC5314 genome was carried out at theare located on all chromosomes except chromosome 3 Stanford Genome Technology Center. The most recent(Chibana et al. 1998; Chindamporn et al. 1998), and assembly of the sequences, assembly 19, is composed ofpartial sequences of MRS are distributed in the genome, 412 supercontigs, including 146 homologous pairs

(http://www-sequence.stanford.edu/group/candida/index.html). For this study, one from each pair of the

The entire chromosome 7 sequence has been deposited at DDBJ/ homologous contigs was arbitrarily discarded, and theEMBL/GenBank under the project accession no. AP006852. remaining contigs were used to create a reference hap-

1Corresponding author: Research Center for Pathogenic Fungi and loid genome consisting of 266 supercontigs (Jones etMicrobial Toxicoses, Chiba University, 1-8-1 Inohana Chuo-ku, Chiba260-8673, Japan. E-mail: [email protected] al. 2004). We used this haploid set of supercontigs as

Genetics 170: 1525–1537 ( August 2005)

Page 2: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1526 H. Chibana et al.

Syntenic analysis: BLASTX on NCBI with default values wasour starting point and closed up the polymorphismsused to compare the whole sequence of chromosome 7 againstbetween the homologous pairs of the supercontigs.the S. cerevisiae genome. Sequences of open reading frames

Although the strain used for the mapping project is and also sequences of intergenic spaces were surveyed. If thedifferent from the strain used for the sequence project, score was �50, it was taken to indicate that the corresponding

sequence was an orthologous gene. If at least two alleles lo-major differences between these strains on the chromo-cated within 20 kb on C. albicans chromosome 7 were alsosome level have not been identified. In this work welocated within 20 kb in the genome of S. cerevisiae, then thatdemonstrate how the physical maps help with comple-region was defined as a synteny block.

tion of the chromosomal sequence, e.g., the completionof the whole sequence of chromosome 7. In anotheraspect of this work, we examine whether there are obvi- RESULTSous syntenic regions between the Saccharomyces cerevisiae

Identification and mapping of supercontigs on chro-and C. albicans genomes. The degree of synteny that wemosome 7: In previous work, 39 DNA probes were se-identified did not provide gene linkages useful for gapquenced and mapped to chromosome 7, and the chro-closing. This work is a pilot study for completion of themosomal location of 11 of these probes was determinedwhole-genome sequence of C. albicans.accurately using random breakage mapping (Chibanaet al. 1998). The sequences of these probes were usedto perform BLASTN searches against the Stanford Can-MATERIALS AND METHODSdida genome website (http://www-sequence.stanford.

DNA amplification for gap closing: PCR with each primeredu:8080/bncontigs19super.html) and 15 contigs werepair (shown in supplementary data at http://www.genetics.assigned to chromosome 7 (Figure 1). Of the supercon-org/supplemental/) was carried out with Ready-To-Go PCR

beads (Amersham Biosciences) using genomic DNA of C. albi- tigs, 19-10248 and 19-20248, 19-10187 and 19-20187, 19-cans SC5314 as a template DNA. PCR was carried out using a 10219 and 19-20219, 19-10110 and 19-20110, and 19-hotstart of 3 min at 94� followed by 35 cycles of 94� for 10 10253 and 19-20253 were identified as homologoussec, 50� for 10 sec, and 68� for 1 min, concluding with 68� for

pairs, and 19-10262, 19-2485, 19-2305, 19-2175, and 19-10 min. Long PCR was carried out with LA PCR kit ver.2.12506 were identified as haploid contigs. To fill the gaps(Takara, Tokyo). Conditions used were a hotstart of 3 min at

94� followed by 35 cycles of 98� for 10 sec and 68� for 20 min, between the supercontigs, a haploid set of the supercon-concluding with a final extension of 72� for 10 min. Genomic tigs (Jones et al. 2004), comprising 19-10248, 19-10187,DNA from C. albicans strain SC5314 (Fonzi and Irwin 1993) 19-10219, 19-10110, and 19-10253, was derived from thewas used for all sequence analysis in this work.

diploid set of supercontigs. Recognition sites for SfiIDetermination of DNA sequence: DNA products amplifiedwere identified on 19-10262, 19-2485, 19-10187, and 19-by long PCR were randomly fragmented into 1- to 1.5-kb

fragments with HydroShear (Genomachines). The fragmented 10219. The supercontigs were mapped precisely onDNA was blunted and ligated with pNot I linker (Takara), using chromosome 7 using the SfiI map (Chu et al. 1993) anda DNA blunting kit (Takara). The ligated product was digested an accurate physical map derived from fosmid contigwith Not I and ligated with pBlueScriptSK�. Determination of

and random breakage maps for chromosome 7 (Figuresequence was performed with primers pBlF, GAGCGGATAA1) (Chibana et al. 1998).CAATTTCACACAGGAAACAG; and pBlRN, CCCTCGAGGT

CGACGGTATC. DNA was labeled with BigDye terminator cy- Conserved orientation of SfiI fragment 7F: Two MRSscle sequencing kits version 1.1 (ABI), and the sequence was exist on chromosome 7, one between 7A and 7F andread using ABI3100 (ABI) sequencer. another between 7F and 7G. Since the sequences of

Test analyses for gene prediction: All the genome sequencesMRSs are highly conserved across the chromosomes,and coding sequence data in S. cerevisiae currently exhibited bythe process of assembling the contigs across the MRSsEMBL were used for evaluation of the gene-finding programs

Glimmer 2.10 and Critica version 1.05. We selected the set of would be error prone. Indeed, there are inconsistenciesopen reading frames (ORFs) that met the criteria of having in the sequences across the MRSs among the supercon-the start codon ATG and the termination codon as TAA, TAG, tigs. The orientation of fragment 7F was determinedor TGA, and that extracted a domain of 303 bases or greater

for strain 1161 (Chibana et al. 1998, 2000). However,(100 aa or greater) in any of the six reading frames.since 7F is flanked by two inverted MRSs, it is possibleGlimmer 2.10 (Delcher et al. 1999) has a function that

permits training of the program, using data from a known set that the entire 7F region could be inverted in SC5314of ORFs to set parameters for that species and allowing better on one or both homologs due to homologous recombi-predictions for related species. For training purposes, we used nation between the MRSs. To resolve the inconsistenciesthe S. cerevisiae data set of ORFs of 303 or greater bases (100

and confirm the orientation of 7F on chromosome 7,aa or greater). For the Critica version 1.05 (http://geta.life.long PCR amplification was performed with two pairsuiuc.edu/�gary/programs/CRITICA/critica105b/critica.

html) analysis, sequence data of release 1 of RefSeq of micro- of PCR primers based on unique sequences flankingbial and fungi minus the sequence data of S. cerevisiae were both the MRSs. The sequence of the products obtainedused as reference sequences for BLASTN. When the termina- from the PCR amplifications, which were �16 kb, wastion position predicted with the gene-domain prediction tools

determined. When the counterpart of the primer pairsand the termination position of the gene domain of S. cerevisiaewas swapped, the PCR band did not appear. Thus, theas registered in EMBL corresponded, it was judged that the

domain prediction was correct. results indicate that the orientation of 7F in strain

Page 3: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1527Finishing of C. albicans Chromosome 7

Figure 1.—Overview of gap closing.A physical map, previously publishedby Chibana et al. (1998), is presentedwith the shaded bar. Probes assignedto chromosome 7 are located on themap. Open boxes on the shaded bardescribe sites for the probes that aremapped with random breakage map-ping. Sfi I sites are described with verti-cal open bars on the shaded bar. Thelocation for ODP2 was corrected by theauthors after publication. Two arrowson the shaded bar describe MRS. Al-though the MRSs are represented witha single repetitive sequence (RPS)(Iwaguchi et al. 1992) in this map, RPSis a tandem repeated sequence in someother MRS. The total length of chro-mosome 7 is 995 kbp in this map. Su-percontigs assigned on chromosome7 are described with a closed arrowheaded by a solid bar. Probes with aknown DNA sequence and mapped onthe supercontigs are described with anopen box. R2B9, Sfi I, RBP1, ARS3,ARG4, and G2E10 were used as land-marks to assign the supercontigs on thephysical map. To adjust the position ofthe landmarks, supercontig 19-10262was cleaved with a spacer. A sequencedeletion was identified on supercontig19-20248 and indicated with a spacer.Gaps between the supercontigs and re-gion including ambiguous sequenceswere amplified with PCR and are indi-cated by numbers 1–16 in parentheses.Ca35A5 represents a whole sequenceof cosmid inserted DNA no. AL033396.The inserted DNA of the cosmid wasisolated from strain 1161 and the se-quence was determined using themethods described by Tait et al. (1997).

Page 4: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1528 H. Chibana et al.

SC5314 is the same as in strain 1161 and in other strains of the results. Class 1 ORFs were identified with bothGlimmer 2 and Critica and class 2 ORFs were identifiedfor which the orientation was determined by Chibana

et al. 1998, 2000). by only one of the programs (Table 1). The third groupcontained ORFs that were predicted by neither pro-PCR amplification and sequencing to fill the gaps

between the supercontigs and to elucidate ambiguous gram.On chromosome 7, 516 open reading frames encod-sequences in the supercontigs: To obtain the sequence

of the gaps between the supercontigs, appropriate PCR ing �100 amino acids were identified. According to ourmethod of ORF classification, 373 class 1 and 18 classprimers were designed near the ends of each supercon-

tig. PCR was carried out using those primers, and the 2 ORFs were identified. For these ORFs, the specificityis �95%. The remaining 125 ORFs were not suggestedsequence of the PCR products was determined. Since

supercontigs 19-10248, 19-2305, and 19-10187 over- by either program. Of this remainder, 107 ORFs werenot counted as coding sequences, since the complemen-lapped each other, they were assumed to contain assem-

bly errors. About 10 kb extending from the end of 19- tary chain encoded an overlapping gene predicted withhigher probability. It was not possible to confirm that10248 to the middle of 19-10187 was amplified by PCR

and the sequence was determined. Another assembly the other 18 ORFs encoded polypeptides. BLASTX anal-ysis showed that 2 ORFs �100 amino acids had a scoreerror, caused by repetitive sequences located at three

points, was found on 19-10253. The assembly errors �50 against S. cerevisiae or Schizosaccharomyces pombe. Atotal of 20 ORFs were classified as class 3 genes. At 16were corrected, and the gaps between 19-10110 and 19-

10253 and between 19-10253 and 19-2335 were closed. sites a coding region was possibly divided by an intronor sequence error on the supercontigs. For these sites,Although supercontig 19-2335 was not identified with

the BLASTN search, it was assigned between 19-10253 sequence determination and intron identification wereperformed, and an intron was suggested in 9 cordingand 19-2506 because it shares homologous sequences

with 19-10253. sequences (CDSs) and two introns in each of 2 otherCDSs. In the remaining 5 sites, sequencing errors wereUndetermined sequences, which were depicted in the

Stanford assembly 19 using the letter n, ranged in length identified and corrected. A gene encoding a Leu-tRNAwas also identified. A total of 404 CDSs were predictedfrom a single nucleotide to 360 nucleotides and had a

total length of 659 bp distributed across seven regions as genes (Table 2, Figure 2). The gene maps, to whichBLAST and Pfam information are attached, are availableon the supercontigs. The unread regions were recov-

ered from the GenBank database, once the sequences as supplementary data at http://www.genetics.org/supplemental/.had been integrated with assembly 19 (Jones et al.

2004). These regions were amplified with flanking prim- Synteny between C. albicans chromosome 7 and thegenome of S. cerevisiae : In previous analyses of syntenyers, and their sequences were determined in this work.

Gene prediction of chromosome 7 in C. albicans: Gene- between C. albicans chromosome 7 and the S. cerevisiaegenome, few syntenic regions were identified becausefinding tools Glimmer 2.10 (Salzberg et al. 1998) and

Critica 1.05 work with high reliability for finding genes of the low resolution of sequence for the chromosome.The gene map was based on the fosmid contig map,within sequences, provided that the sequences do not

include introns. Thus, these tools have been used for which is composed of a tiling set of fosmid clones. Theaverage length of the fosmid DNA insert was 40 kbp,sequence analyses on prokaryotic genomes (Aggarwal

and Ramaswamy 2002; McHardy et al. 2004). Unlike and only 39 probes were mapped with their sequenceinformation (Chibana et al. 1998). The continuoussome other fungal species, only a small fraction of genes

in the genome of C. albicans carry introns (Braun et al. DNA sequence information covering chromosome 7 al-lowed us to perform syntenic analysis at a higher resolu-2005). The Candida intron structure is generally similar

to that of S. cerevisiae (Jones et al. 2004). For these tion. The gene arrangement of the 282 C. albicans ORFsidentified by sequence similarity to the genome of S.reasons, Glimmer 2 and Critica were employed to begin

gene finding. To apply these programs to gene finding cerevisiae was compared in the two fungi. A group ofORFs with close linkage in both fungi was called a syn-in fungal genomes, their reliability first has to be evalu-

ated. The genomic sequence of S. cerevisiae was surveyed teny block. A total of 32 synteny blocks were identifiedbetween chromosome 7 in C. albicans and the genomeusing these tools. In this test evaluation, ORFs were

classified into three groups according to the robustness in S. cerevisiae. The number of ORFs found in shared

�Figure 2.—Gene maps for C. albicans chromosome 7. Class 1 and class 2 ORFs are shown as rectangles with thick and thin

outlines, respectively. Class 3 ORFs are shown as rectangles with no outline. Each ORF was colored the same color as the best-hit species: red, blue, green, and yellow for S. cerevisiae, S. pombe, N. crassa, and Homo sapiens, respectively. Solid arrowheadspointing toward the coding sequence indicate the position of the intron. CDSs, which are reorganized in this work, are representedby asterisks. ORF number of assembly 19 is assigned in the column of C. albicans. Results of BlastX, Pfam, and more informationare available as supplementary data at http://www.genetics.org/supplemental/

Page 5: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1529Finishing of C. albicans Chromosome 7

Page 6: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1530 H. Chibana et al.

Figure 2.—Continued.

Page 7: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1531Finishing of C. albicans Chromosome 7

Figure 2.—Continued.

Page 8: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1532 H. Chibana et al.

Figure 2.—Continued.

Page 9: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1533Finishing of C. albicans Chromosome 7

Figure 2.—Continued.

Page 10: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1534 H. Chibana et al.

Figure 2.—Continued.

Page 11: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1535Finishing of C. albicans Chromosome 7

TABLE 1

Evaluation of the gene prediction programs using genomic data of S. cerevisiae

Class 2: Glimmer 2or Critica but not Predicted by neither

Class 1: Glimmer 2 Glimmer 2 Glimmer 2 norORF �100 aa and Critica and Critica Critica

No. of prediction 7467 (100) 4702 (77.1) 759 (12.4) 2006 (10.5)(sensitivity)

Correct prediction 5831 (78.1) 4498 (95.7) 721 (95.0) 612 (30.5)(specificity)

Numbers in parentheses are percentages.

synteny blocks ranged from 2 to 8 ORFs. The average The supercontigs released by Jones et al. (2004) wereassigned to chromosome 7, and the gap closing andof the number of ORFs per synteny block was 2.68 ORFs.

A synteny block composed of eight ORFs is the largest sequence correction of chromosome 7 in C. albicans wascarried out in this work. The length of the continuousarea of synteny between chromosome 7 and the S. cerevis-

iae genome. However, the order and direction of the sequence is 949,626 bp and covers the entire chromo-some 7 except for telomeric and subtelomeric regions.ORFs were jumbled, with at least three indels and four

inversions being found in this region (Figure 3). The total length of the missing sequences was only 289bp in the gaps and 659 bp in continuous sequence ofthe supercontigs. Thus, the coverage of the supercontigs

DISCUSSIONin assembly 19 was 99.9% of the continuous sequencesof chromosome 7.In previous work, probes G2E10 and R2B9 were as-

signed the edges of chromosome 7 (Chibana et al. The differences in length between the homologouspairs of the supercontigs of chromosome 7 are �1 kb,1998). G2E10 was assigned 35 kbp away from one end

of chromosome 7. The sequence of G2E10 was identi- except between 19-10248 and 19-20248. A region caus-ing the difference was identified and is depicted in Fig-fied 29 kb away from the end of contig 19-10262. This

indicates that the end of contig 19-10262 is 6 kb from ure 4. The difference in length is 7249 bp, due to asection in 19-20248 that contains three defective ORFsthe end of chromosome 7. Using similar reasoning, we

suggest that the end of contig 19-2506 reaches to within flanked by long terminal repeats (LTRs) of the classdescribed by Goodwin and Poulter (2000) as LTR �.20 kb of the other end of chromosome 7. The total

length of the continuous sequence composed of contigs Because there were no LTR sequences on the corre-sponding region in 19-20248, this region has been in-and filled gaps comes to 950 kb. The remaining se-

quences, including telomeres and subtelomeres, amount serted on only one homolog. Three ORFs included asequence homologous to pol- and gag-like elements andto 26 kb. The telomere sequence in C. albicans is com-

posed of 23 bp repeated sequences (Sadhu et al. 1991). to STA1, respectively. STA1 is associated with initiationof the developmental programs of pseudohyphal forma-Long PCR was carried out using PCR primers based

on the sequence of the telomere and supercontigs to tion and invasive growth response in S. cerevisiae (Vivieret al. 1997). Although the similarity is not high betweenamplify the telomeres and subtelomeres, but the ex-

pected product was not detected (data not shown). The the ORF on chromosome 7 and STA1 in S. cerevisiae, thisretrotransposon-like element might contribute to cellsame problem is likely to happen in other chromosome

gap closings. Other approaches, e.g., cloning into a cos- morphology polymorphism and pathogenesis in C. albi-cans. We suggest that this region should be investigatedmid vector, are needed to determine the sequence of

these regions. further.

TABLE 2

Gene prediction on chromosome 7

Class 3

ORF �100 aa Class 1 Class 2 ORF �100 aa ORF �100 aaa t-RNA

No. of prediction 516 372 17 17 2 1

a BLASTX analysis showed that two ORFs of �100 amino acids had a score �50.

Page 12: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1536 H. Chibana et al.

mosome 7 in C. albicans shared with the S. cerevisiaegenome. The reason for the high conservation of thisregion is unknown, although genes located within theregion are likely related. YCL002, which is at the rightend of the map (Figure 3), is only 3 kb from CEN3 inS. cerevisiae. It turns out that the centromere on chromo-some 7 exists in a completely different region (Sanyalet al. 2004). This implies that a centromere existed near

Figure 3.—The longest syntenic region between the ge- this domain in the common ancestor of the fungi ofnome in S. cerevisiae and chromosome 7 in C. albicans. The the Saccharomycetales group, including C. albicans.region between DCC1 and PGS1 includes 11 ORFs encoding It is clear that the degree of synteny between C. albi-polypeptides with �100 aa. Of these, 8 ORFs were identified

cans and S. cerevisiae is too weak to enable linkage infor-as sharing similarity in amino acid sequence with genes in S.mation in the S. cerevisiae genome to be useful for thecerevisiae with a score �50 in BLASTX results. Although link-

age of these genes is conserved in C. albicans and S. cerevisiae, completion of genomic sequence in C. albicans.at least four inversions and three indels were identified.

This work was supported by a grant-in-aid for Scientific Researchfor the Priority Areas “Infection and Host Response” and “GenomeBiology” and for “Frontier Studies in Pathogenic Fungi and Actinomy-

The greatest synteny block between chromosome 7 in cetes” through the Special Coordination Funds for Promoting Scienceand Technology from the Ministry of Education, Culture, Sports,C. albicans and all chromosomes in S. cerevisiae includesScience and Technology of Japan, The Sumitomo Foundation, andLEU2 and NFS1 (Figure 3). Interestingly, that regionthe Hokuto Foundation for Bioscience. The sequences of all thehas received attention from other groups. It was pub- contigs were provided by the Stanford Genome Technology Center at

lished that close linkage of these genes was conserved http://www-sequence.stanford.edu/group/candida. The sequencingin Ashbya gossypii, C. albicans, C. maltosa, C. rugosa, Pichia of C. albicans at the Stanford Genome Technology Center was accom-

plished with the support of the National Institute of Dental Researchanomala, S. cerevisiae, S. servazzi, Yamadazyma ohmeri, andand the Burroughs Welcome Fund. The gene-finding analyses usingZygosaccharomyces rouxii (Keogh et al. 1998; De la RosaGlimmer 2 and Critica was carried out by Kouzuki Tokio (Xanagen,et al. 2001). We surveyed the linkage of both genes in Kawasaki, Japan).

other fungi. The linkage of the genes was not conservedin S. pombe (Wood et al. 2002) (http://www.sanger.ac.uk/Projects/S_pombe/) or in Neurospora crassa (Gala-

LITERATURE CITEDgan et al. 2003) (http://www-genome.wi.mit.edu/anno-Aggarwal, G., and R. Ramaswamy, 2002 Ab initio gene identifica-tation/fungi/neurospora/index.html). The linkage of

tion: prokaryote genome annotation with GeneScan and GLIM-these genes is thus likely conserved only in closely re-MER. J. Biosci. 27: 7–14.

lated Ascomycetes species. In this work, it was revealed Braun, B. R., M. Hoog, C. Enfert, M. Martchenko, J. Dungan etal., 2005 A human-curated annotation of the Candida albicansthat the conserved region extends beyond LEU2 andgenome. Plos Genet. 1: e1.NFS1 and comprises the longest syntenic block in chro-

Chibana, H., B. B. Magee, S. Grindle, Y. Ran, S. Scherer et al.,1998 A physical map of chromosome 7 of Candida albicans.Genetics 149: 1739–1752.

Chibana, H., J. L. Beckerman and P. T. Magee, 2000 Fine-resolu-tion physical mapping of genomic diversity in Candida albicans.Genome Res. 10: 1865–1877.

Chindamporn, A., Y. Nakagawa, I. Mizuguchi, H. Chibana, M. Doiet al., 1998 Repetitive sequences (RPSs) in the chromosomesof Candida albicans are sandwiched between two novel stretches,HOK and RB2, common to each chromosome. Microbiology 144(Pt. 4): 849–857.

Chu, W. S., B. B. Magee and P. T. Magee, 1993 Construction ofan Sfi I macrorestriction map of the Candida albicans genome. J.Bacteriol. 175: 6637–6651.

De la Rosa, J. M., J. A. Perez, F. Gutierrez, J. M. Gonzalez, T.Ruiz et al., 2001 Cloning and sequence analysis of the LEU2homologue gene from Pichia anomala. Yeast 18: 1441–1448.

Figure 4.—Sequence indel between 19-10248 and 19- Delcher, A. L., D. Harmon, S. Kasif, O. White and S. L. Salzberg,1999 Improved microbial gene identification with GLIMMER.20248. The length of this indel is 7249 bp. LTR � (GoodwinNucleic Acids Res. 27: 4636–4641.and Poulter 2000) flanks this region. The best hits of three

Fonzi, W. A., and M. Y. Irwin, 1993 Isogenic strain constructionORFs are YLT1, hypothetical protein YLT1 (TrEMBL no.and gene mapping in Candida albicans. Genetics 134: 717–728.Q9COUO) suggested to encode a fungal retroelement pol

Galagan, J. E., S. E. Calvo, K. A. Borkovich, E. U. Selker, N. D.polyprotein in Yarrowia lypolytica (1e-44); YIL8, putative pro-Read et al., 2003 The genome sequence of the filamentoustein (TrEMBL no. Q94LP8) suggested to encode Gag or cap- fungus Neurospora crassa. Nature 422: 859–868.

sid-like proteins from LTR retrotransposons Oryza sativa (1e- Goodwin, T. J., and R. T. Poulter, 2000 Multiple LTR-retrotranspo-25); and STA1, putative protein (TrEMBL no. PO8640) sug- son families in the asexual yeast Candida albicans. Genome Res.gested to encode glucoamylase S1/S2 precursor (EC 3.2.1.3) 10: 174–191.(glucan 1,4-�-glucosidase) (1,4-�-d-glucan) in S. cerevisiae Graser, Y., M. Volovsek, J. Arrington, G. Schonian, W. Presber

et al., 1996 Molecular markers reveal that population structure(9e-06).

Page 13: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene

1537Finishing of C. albicans Chromosome 7

of the human pathogen Candida albicans exhibits both clonality Sadhu, C., M. J. McEachern, E. P. Rustchenko-Bulgac, J. Schmid,D. R. Soll et al., 1991 Telomeric and dispersed repeat se-and recombination. Proc. Natl. Acad. Sci. USA 93: 12473–12477.

Hull, C. M., R. M. Raisner and A. D. Johnson, 2000 Evidence for quences in Candida yeasts and their use in strain identification.J. Bacteriol. 173: 842–850.mating of the “asexual” yeast Candida albicans in a mammalian

host. Science 289: 307–310. Salzberg, S. L., A. L. Delcher, S. Kasif and O. White, 1998 Micro-bial gene identification using interpolated Markov models. Nu-Iwaguchi, S., M. Homma, H. Chibana and K. Tanaka, 1992 Isola-

tion and characterization of a repeated sequence (RPS1) of Can- cleic Acids Res. 26: 544–548.Sanyal, K., M. Baum and J. Carbon, 2004 Centromeric DNA se-dida albicans. J. Gen. Microbiol. 138: 1893–1900.

Jones, T., N. A. Federspiel, H. Chibana, J. Dungan, S. Kalman et quences in the pathogenic yeast candida albicans are all differentand unique. Proc. Natl. Acad. Sci. 101: 11374–11379.al., 2004 The diploid genome sequence of Candida albicans.

Proc. Natl. Acad. Sci. USA 101: 443–457. Tait, E., M. C. Simon, S. King, A. J. Brown, N. A. Gow et al., 1997A Candida albicans genome project: cosmid contigs, physical map-Keogh, R. S., C. Seoighe and K. H. Wolfe, 1998 Evolution of gene

order and chromosome number in Saccharomyces, Kluyveromyces ping, and gene isolation. Fungal Genet. Biol. 21: 308–314.Tzung, K. W., R. M. Williams, S. Scherer, N. Federspiel, T. Jonesand related fungi. Yeast 14: 443–457.

Lott, T. J., B. P. Holloway, D. A. Logan, R. Fundyga and J. Arnold, et al., 2001 Genomic evidence for a complete sexual cycle inCandida albicans. Proc. Natl. Acad. Sci. USA 98: 3249–3253.1999 Towards understanding the evolution of the human com-

mensal yeast Candida albicans. Microbiology 145 (Pt. 5): 1137– Vivier, M. A., M. G. Lambrechts and I. S. Pretorius, 1997 Coregu-1143. lation of starch degradation and dimorphism in the yeast Saccha-

Magee, B. B., and P. T. Magee, 2000 Induction of mating in Candida romyces cerevisiae. Crit. Rev. Biochem. Mol. Biol. 32: 405–435.albicans by construction of MTLa and MTLalpha strains. Science Wood, V., R. Gwilliam, M. A. Rajandream, M. Lyne, R. Lyne et al.,289: 310–313. 2002 The genome sequence of Schizosaccharomyces pombe. Nature

McHardy, A. C., A. Goesmann, A. Puhler and F. Meyer, 2004 De- 415: 871–880.velopment of joint application strategies for two microbial gene Xu, J., T. G. Mitchell and R. Vilgalys, 1999 PCR-restriction frag-finders. Bioinformatics 20: 1622–1631. ment length polymorphism (RFLP) analyses reveal both extensive

Pujol, C., J. Reynes, F. Renaud, M. Raymond, M. Tibayrenc et al., clonality and local genetic differences in Candida albicans. Mol.1993 The yeast Candida albicans has a clonal mode of reproduc- Ecol. 8: 59–73.tion in a population of infected human immunodeficiency virus-positive patients. Proc. Natl. Acad. Sci. USA 90: 9456–9459. Communicating editor: M. Sachs

Page 14: Sequence Finishing and Gene Mapping for Candida albicans and … · The degree of synteny that we Identification and mapping of supercontigs on chro-identified did not provide gene