highly dnasequences are - pnas.org species tested, except saccharomyces ande. coli. pictured are...

5
Proc. Natl. Acad. Sci. USA Vol. 89, pp. 1695-1699, March 1992 Biochemistry Highly conserved repetitive DNA sequences are present at human centromeres DEBORAH L. GRADY, ROBERT L. RATLIFF, DONNA L. ROBINSON, ERIN C. MCCANLIES, JULIANNE MEYNE, AND ROBERT K. MoYzIs* Center for Human Genome Studies and Life Sciences Division, Los Alamos National Laboratory, University of California, Los Alamos, NM 87545 Communicated by Alexander Rich, November 5, 1991 ABSTRACT Highly conserved repetitive DNA sequence clones, largely consisting of (GGAAT). repeats, have been isolated from a human recombinant repetitive DNA library by high-stringency hybridization with rodent repetitive DNA. This sequence, the predominant repetitive sequence in human satellites II and m, is similar to the essential core DNA of the Saceharomyces cerevisiae centromere, centromere DNA ele- ment (CDE) m. In situ hybridization to human telophase and Drosophila polytene chromosomes shows localization of the (GGAAT). sequence to centromeric regions. Hyperchromicity studies indicate that the (GGAAT). sequence exhibits unusual hydrogen bonding properties. The purine-rich strand alone has the same thermal stability as the duplex. Hyperchromicity studies of synthetic DNA variants indicate that all sequences with the composition (AATGN). exhibit this unusual thermal stability. DNA-mobility-shift assays indicate that specific HeLa-ceil nuclear proteins recognize this sequence with a rela- tive affinity >105. The extreme evolutionary conservation of this DNA sequence, its centromeric location, its unusual hydrogen bonding properties, its high affinity for specific nuclear proteins, and its similarity to functional centromeres isolated from yeast suggest that this sequence may be a component of the functional human centromere. Up to 10% of the DNA of human chromosomes consists of tandem arrays of repetitive sequences localized at the cen- tromere (1). These DNA arrays are known to consist of various copy numbers of a satellite (2), ,f satellite (3), and the three classic satellites I, II, and III (4). Although some or all of these repetitive sequences may be involved in centromeric function, there is no evidence, as yet, that the functional human centromere has been isolated. Evolutionary conservation of a DNA sequence is a likely indication of functional importance. The human telomere sequence (TTAGGG),, was identified and cloned by screening for evolutionarily conserved repetitive DNA sequences (5). Further work on the human telomere indicated: (i) its ex- treme conservation, present at least through vertebrates (i.e., >400 million years old) (6); (ii) its occasional amplification, often at chromosome fusion points (7); and (iii) its ability to form unusual DNA structures (8). Like the telomere se- quence, we reasoned that other important DNA regions, such as those involved in centromere function, would be con- served. We report here the identification of another class of highly conserved human repetitive DNA sequences that may be a component of the functional human centromere. MATERIALS AND METHODS Construction of a Human Repetitive DNA Library, Library Screening, Sequencing, Oligomer Synthesis, Thermal Hyper- chromicity, and in Situ Hybridization. All methods have been described (1, 5, 7, 9, 10). DNA-Mobility-Shift Assays. Preparation of protein extracts and DNA-mobility-shift assays were conducted as described by Strauss and Varshavsky (11). HeLa-cell 0.35 M NaCl nuclear extracts (-3 pg) and 32P-end-labeled DNA (-0. 14 ng) were incubated in 50 mM NaCl at 370C for 1 h to allow binding prior to electrophoresis on a low-ionic-strength 6% polyacryl- amide gel. Nonspecific protein binding was controlled by the addition of sheared Escherichia coli DNA or poly [d(I-C)]. Quantitation of DNA-mobility-shift-gel autoradiographs was conducted using a Visage 110 image analysis system. RESULTS Clone Isolation. A search for additional highly conserved repetitive DNA sequences was conducted using the methods used to identify the human telomere sequence (TTAGGG)X (5). The pHuR library (for plasmid human repeat) (5, 9) was screened with either hamster or mouse repetitive DNA, under conditions allowing only 85-100%o identical sequences (depending on length) to cross-hybridize (5). Positive clones were counter-screened with radiolabeled (GT)25 and (TTAGGG)7 oligomers, to eliminate clones containing these previously identified conserved repeats (5). Three clones were identified by screening with hamster repetitive DNA. One of these clones (pHuR98) has been reported (9). An additional five clones were identified with high-stringency hybridizations to mouse, rather than hamster, repetitive DNA (GenBank accession nos. M77215-M77221). The common sequence, shared by all eight clones, is the 5-nucleotide repeat (GGAAT), and diverged related se- quences. This sequence has been reported to be the core component of human satellites II and III (4). In addition, perfect and diverged CATCATCGA(A/G)T and CAAC- CCGA(A/G)T repeats, interspersed components of satellites II and III, respectively (4), are present in some of the clones (9). Zoo-blot analysis, using clone pHuR98 or synthetic consensus oligomers indicated that cross-hybridizing se- quences are present in all higher eukaryotic DNAs examined (Fig. 1), including vertebrates, insects, and plants. Interestingly, this conserved satellite sequence is similar to the central region of the yeast centromere sequence (CDE) III (Fig. 2). CDE III is the most critical component of the yeast centromere, based on sequence homology and directed mutational analysis (12, 13). Point mutations of the cytidines indicated in Fig. 2 abolish mitotic function (13). Nine nucle- otides of the critical region identified by mutational experi- ments are shown in Fig. 2, aligned with the similar regions of human satellites II and III. Along these nine core nucleotides, eight nucleotides are identical, with only a single thymidine missing in the human sequence. The probability of this short similarity occurring by chance is 6.1 x 10-5 or "5000 times in human DNA. What is intriguing is that these sequences are located at human centromeric regions (see below) and are present in "5000 times the expected abundance [1.2 x 108 Abbreviation: CDE, centromere DNA element. *To whom reprint requests should be addressed. 1695 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: trankhuong

Post on 30-Mar-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Proc. Natl. Acad. Sci. USAVol. 89, pp. 1695-1699, March 1992Biochemistry

Highly conserved repetitive DNA sequences are presentat human centromeresDEBORAH L. GRADY, ROBERT L. RATLIFF, DONNA L. ROBINSON, ERIN C. MCCANLIES, JULIANNE MEYNE,AND ROBERT K. MoYzIs*Center for Human Genome Studies and Life Sciences Division, Los Alamos National Laboratory, University of California, Los Alamos, NM 87545

Communicated by Alexander Rich, November 5, 1991

ABSTRACT Highly conserved repetitive DNA sequenceclones, largely consisting of (GGAAT). repeats, have beenisolated from a human recombinant repetitive DNA library byhigh-stringency hybridization with rodent repetitive DNA.This sequence, the predominant repetitive sequence in humansatellites II and m, is similar to the essential core DNA of theSaceharomyces cerevisiae centromere, centromere DNA ele-ment (CDE) m. In situ hybridization to human telophase andDrosophila polytene chromosomes shows localization of the(GGAAT). sequence to centromeric regions. Hyperchromicitystudies indicate that the (GGAAT). sequence exhibits unusualhydrogen bonding properties. The purine-rich strand alone hasthe same thermal stability as the duplex. Hyperchromicitystudies of synthetic DNA variants indicate that all sequenceswith the composition (AATGN). exhibit this unusual thermalstability. DNA-mobility-shift assays indicate that specificHeLa-ceil nuclear proteins recognize this sequence with a rela-tive affinity >105. The extreme evolutionary conservation ofthisDNA sequence, its centromeric location, its unusual hydrogenbonding properties, its high affinity for specific nuclear proteins,and its similarity to functional centromeres isolated from yeastsuggest that this sequence may be a component of the functionalhuman centromere.

Up to 10% of the DNA of human chromosomes consists oftandem arrays of repetitive sequences localized at the cen-tromere (1). These DNA arrays are known to consist ofvarious copy numbers ofa satellite (2), ,f satellite (3), and thethree classic satellites I, II, and III (4). Although some or allof these repetitive sequences may be involved in centromericfunction, there is no evidence, as yet, that the functionalhuman centromere has been isolated.

Evolutionary conservation of a DNA sequence is a likelyindication of functional importance. The human telomeresequence (TTAGGG),, was identified and cloned by screeningfor evolutionarily conserved repetitive DNA sequences (5).Further work on the human telomere indicated: (i) its ex-treme conservation, present at least through vertebrates (i.e.,>400 million years old) (6); (ii) its occasional amplification,often at chromosome fusion points (7); and (iii) its ability toform unusual DNA structures (8). Like the telomere se-quence, we reasoned that other importantDNA regions, suchas those involved in centromere function, would be con-served. We report here the identification of another class ofhighly conserved human repetitive DNA sequences that maybe a component of the functional human centromere.

MATERIALS AND METHODSConstruction of a Human Repetitive DNA Library, Library

Screening, Sequencing, Oligomer Synthesis, Thermal Hyper-chromicity, and in Situ Hybridization. All methods have beendescribed (1, 5, 7, 9, 10).

DNA-Mobility-Shift Assays. Preparation of protein extractsand DNA-mobility-shift assays were conducted as describedby Strauss and Varshavsky (11). HeLa-cell 0.35 M NaClnuclear extracts (-3 pg) and 32P-end-labeled DNA (-0. 14 ng)were incubated in 50mM NaCl at 370C for 1 h to allow bindingprior to electrophoresis on a low-ionic-strength 6% polyacryl-amide gel. Nonspecific protein binding was controlled by theaddition of sheared Escherichia coli DNA or poly [d(I-C)].Quantitation of DNA-mobility-shift-gel autoradiographs wasconducted using a Visage 110 image analysis system.

RESULTSClone Isolation. A search for additional highly conserved

repetitive DNA sequences was conducted using the methodsused to identify the human telomere sequence (TTAGGG)X(5). The pHuR library (for plasmid human repeat) (5, 9) wasscreened with either hamster or mouse repetitive DNA,under conditions allowing only 85-100%o identical sequences(depending on length) to cross-hybridize (5). Positive cloneswere counter-screened with radiolabeled (GT)25 and(TTAGGG)7 oligomers, to eliminate clones containing thesepreviously identified conserved repeats (5). Three cloneswere identified by screening with hamster repetitive DNA.One of these clones (pHuR98) has been reported (9). Anadditional five clones were identified with high-stringencyhybridizations to mouse, rather than hamster, repetitiveDNA (GenBank accession nos. M77215-M77221).The common sequence, shared by all eight clones, is the

5-nucleotide repeat (GGAAT), and diverged related se-quences. This sequence has been reported to be the corecomponent of human satellites II and III (4). In addition,perfect and diverged CATCATCGA(A/G)T and CAAC-CCGA(A/G)T repeats, interspersed components of satellitesII and III, respectively (4), are present in some of the clones(9). Zoo-blot analysis, using clone pHuR98 or syntheticconsensus oligomers indicated that cross-hybridizing se-quences are present in all higher eukaryotic DNAs examined(Fig. 1), including vertebrates, insects, and plants.

Interestingly, this conserved satellite sequence is similar tothe central region of the yeast centromere sequence (CDE)III (Fig. 2). CDE III is the most critical component of theyeast centromere, based on sequence homology and directedmutational analysis (12, 13). Point mutations of the cytidinesindicated in Fig. 2 abolish mitotic function (13). Nine nucle-otides of the critical region identified by mutational experi-ments are shown in Fig. 2, aligned with the similar regions ofhuman satellites II and III. Along these nine core nucleotides,eight nucleotides are identical, with only a single thymidinemissing in the human sequence. The probability of this shortsimilarity occurring by chance is 6.1 x 10-5 or "5000 timesin human DNA. What is intriguing is that these sequences arelocated at human centromeric regions (see below) and arepresent in "5000 times the expected abundance [1.2 x 108

Abbreviation: CDE, centromere DNA element.*To whom reprint requests should be addressed.

1695

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Proc. Natl. Acad. Sci. USA 89 (1992)

a

_ Nc .E a-0

2 &. Cxe eu E

.4z

kb

8.0-

4.0-

2.0-

1.0-

0.5-

o .a.C VI0.

o. Fo in

IA

_'

FIG. 1. Conservation of the (GGAAT), repetitive sequence.Representative DNAs from a variety of eukaryotic species werehybridized to a 32P-labeled synthetic CAACCCGAGT(GGAAT)6deoxyoligonucleotide [Sat III consensus sequence (4)]. E. coli DNAwas used as carrier DNA to prevent competition with conservedrepetitive sequences (5). Primate DNA filters were placed in separatehybridization bags to avoid intrafilter competition. Hybridizationconditions were 15-200C below the melting temperature for perfectlymatched duplexes (5, 9). Positive hybridization was obtained with all16 species tested, except Saccharomyces and E. coli. Pictured arehybridizations to human, orangutan, chicken, maize, Drosophila,sea urchin (Strongylocentrotus), and yeast (Saccharomyces) DNAscut with either Sau3AI (left lanes of pairs) or Rsa I (right lanes ofpairs), electrophoresed through a 1% agarose gel, and blotted tonitrocellulose (5, 9). Exposure time was 4 h for primate DNA filtersand 24 h for all other filters. Similar results were obtained withradiolabeled pHuR98 DNA (9) or (GGAATCAT)5 or (GGAAT)6oligomers. kb, Kilobase(s).

base pairs (bp) (4)]. It should be noted that all three classichuman satellites have sequence similarities to yeast CDEs(Fig. 2) and that yeast CDE sequence similarities to othereukaryotic satellite DNAs were described after their initialidentification (12).Chromosomal Localization of the Highly Conserved Repet-

itive Sequence. In situ hybridization with biotinylated satelliteIII consensus sequence oligomers gave prominent hybridiza-tion to the centromeric regions of human chromosomes, inaddition to the adjacent heterochromatin regions of chromo-somes 1, 9, 16, and Y (Fig. 3). By using synchronized cellpopulations, a greater fraction of telophase chromosomeswas produced. As can be seen in Fig. 3A, the hybridizationsignals are directly at the centromeric constriction. Approx-imately 80%6 of the centromeres give distinct signals onmetaphase or telophase chromosomes, similar to the effi-ciency obtained for human telomere sequences (5). Whetherthis represents random in situ hybridization efficiency,clearly the case for (TTAGGG)6 hybridization (5), or variablecopy numbers of satellite II or III sequences on differenthuman chromosomes is yet to be investigated. Previous insitu hybridization studies, using less-sensitive autoradio-graphic detection, indicated that at least half of the humanchromosomes contain centromeric satellite II- or III-relatedsequences (15).A biotinylated (ATTCC)6 oligomer hybridized weakly to

the chromocenter (centromeric) region of Drosophila poly-tene chromosomes (data not shown). PCR amplification ofDrosophila DNA using (ATTCC)6 as an oligomeric primeryielded a number of discrete bands. Hybridization of these

SAT 11 SAT SAT II RELATED HUMANSAT III SEQUENCE

CDE I CDE 11 CDE III YEASTCENTROMERE

* ---// \ 78 -86 bp 87 -95%AT X

\

\

/G C

- ATCAC TG -' CDE I

- ATTCCATG - SAT 11

I 1\

TGATTTCC - OCDE III

- ATGA - TTCC - SAT 11logo logo

-TTGA-TTCC - SAT III

FIG. 2. Diagrammatic representation of the Saccharomyces cen-tromere and similar human repetitive DNA sequences. The 111- to119-bp consensus yeast centromere region is diagrammed, as origi-nally determined by sequence similarity (12, 13). Three centromereDNA elements, designated CDE I (8 bp), CDE 11 (78-86 bp), andCDE III (25 bp) are shown, aligned along human repetitive DNAswith similar DNA sequences. Functional mutational analysis of theyeast centromere has indicated that CDE I plays a minor role inmitotic stability (13). Mutations in CDE I (open arrows) reducemitotic stability <10-fold (i.e., 10-5 to 10-a). Only a portion of the25-bp CDE III region, defined by sequence similarity, is essential formitotic function (13). This central "core" region is (T/A)TG(A/T)TTTCCGAA, similar to the originally defined CDE III element(12). The remaining "conserved" nucleotides outside this "core"region appear to be functionally less important (13). Mutations in thetwo cytidine residues totally (large arrow) or significantly (>1000-fold, small arrow) eliminate mitotic function.

PCR-amplified bands to Drosophila polytene chromosomesgave distinct hybridization to the chromocenter (Fig. 3B).Thermal Stability. The G/C-strand asymmetry in

(GGAAT), is reminiscent oftelomeric repeats that are capableof forming stable G G base pairs (5, 8). Melting curves of the(GGAAT)6 repeat exhibit unusual properties (Fig. 4 and Table1). The purine-rich strand alone has the same thermal stabilityas the duplex (Fig. 4). Gel electrophoresis studies indicate thatthe (GGAAT)6 oligomer migrates between the single-strand(ATTCC)6 oligomer and (GGAAT)6-(ATTCC)6 duplex, sug-gesting that a fold-back or multistrand structure is present(data not shown). Oligodeoxynucleotides with substitutions ofalternative nucleotides in the (GGAAT) repeating unit weresynthesized and melting curves were determined (Table 1).For most oligomers, either no melting curve or a gradualincrease in absorbance as the temperature was increased, dueto purine intrastrand unstacking, was observed (Table 1 andFig. 4). The only substituted oligomers that exhibit similarthermal stabilities were (GCAAT)6, (GAAAT)6, (GTAAT)6,(CGAAT)4, and (GGCAT)6 (Table 1), the later two expectedto be stably self-complimentary utilizing normal G C and APTbase pairs. Interestingly, these variants represent the mostfrequent variants actually found in cloned human satelliteDNAs (GenBank release 66), accounting for >70%o of thesingle-base variants. The probability that this observed fre-quency of base changes in satellite II and III sequences israndom is extremely unlikely (X2 test; P = 0.001).

Since (GCAAT)6, (GAAAT)6, and (GTAAT)6 exhibit highthermal stability, a mixed oligomer (GNAAT)6 was synthe-sized and its thermal stability was determined. This mixedoligomer exhibited the unusual stability originally obtainedfor (GGAAT)6 (Table 1). All combinations of base mismatchand pairing at the N position in the (AATGN)" repeat appearto be compatible with the observed thermal stability. Atandem array ofthe conserved unit ofthis repeat (AATG) wassynthesized [i.e., (GAAT)8 (Table 1)]. It also exhibits highthermal stability.A possible structure for (AATGN),, consistent with this

data, involves G-A base pairing between nucleotides on oneor multiple strands. Base pairing in a helical turn would

16% Biochemistry: Grady et al.

Proc. Natl. Acad. Sci. USA 89 (1992) 1697

FIG. 3. In situ hybridization to human telophase and Drosophila polytene chromosomes. In situ hybridization was conducted in 2x standardsaline citrate/30%o (vol/vol) formamide at 370C as described (5, 6, 9). Chromosomes were counterstained with propidium iodide (orange) afterincubation with fluorescein-labeled avidin and one amplification with avidin antibody, to detect the biotinylated DNA (yellow). (A) Hybridizationof biotinylated (GGAAT)6 to human telophase chromosomes. (B) Hybridization of biotinylated Drosophila DNA PCR products primed with(ATTCC)6 to Drosophila polytene chromosomes. PCR conditions were 30 cycles at 940C for 1 minm, 55C for 2 min, and 720C for 2 min with 5/AM primer and using standard protocols (GeneAmp Kit, Perkin-Elmer). The annealing temperature (550C) was 4100C below the meltingtemperature for perfectly matched duplexes (Table 1) and 415°C above the heteroduplex of this primer with known Drosophila centromericsatellite sequences [i.e., (GAGAA)"; Table 1 and ref. 14].

involve G'A base pairs bracketing two A-T base pairs, asfollows:

5'-AATGGAATGG-3'* 1111 l-1111

3'-GTAAGGTAAG-5'

The stability of the (GAAT)8 repeat (Tablel) can be ade-quately explained by the presence of such stable G-A basepairs, since oligomers with adjacent alternating G-A basepairs have been shown to be as stable as normal Watson-Crick duplexes (16). Whether this structure or more exoticalternatives are correct remains to be determined.

1.3

1.2

0

1.1

1.040 60 80

TEMPERATURE (°C)

FIG. 4. Thermal hyperchromicity profiles of synthetic oligodeox-ynucleotides. Hyperchromicity profiles of(GGAAT)6 (solid line), (AT-TCC)6 (dashed and dotted line), (GGAAT)6-(ATTCC)6 duplexes(dashed line), and (GGATT)6 (dotted line) in 50mM NaCl are shown (6).A/Ao is the ratio ofobserved absorption to initial absorption at 260 nm.

DNA-Mobility-Shift Assays. The clone pHuR98 consists ofthree divergent CAACCCGA(G/A)T sequences interspersedwith GGAAT repeats (9). DNA-mobility-shift assays wereconducted to determine if this sequence binds nuclear pro-teins in a sequence-specific manner. Two discrete DNA-protein complexes were observed (Fig. 5). Evidence thatthese shifted DNA bands result from DNA-protein interac-tions included (i) incubating with decreasing amounts ofextract that led to decreased signals and (ii) digesting withproteinase K prior to DNA-protein binding that eliminatedthe shifted bands (data not shown).

Kinetic analysis of the formation of these two DNA-mobility-shift complexes can be seen in Fig. 5A. After a30-sec incubation only the lower DNA-protein band hasformed. With time, formation of the higher molecular weightband occurs. These results, as well as the salt and temper-ature dependence of complex formation (Fig. 5), suggest thatthe slower-migrating complex may be a multimer of thefaster-migrating complex.The relative formation of these specific DNA-protein

complexes in the presence of increasing amounts of E. coli,or poly [d(I-C)] DNA can be seen in Fig. SB. The HeLanuclear protein(s) responsible for the observed DNA mobilityshift has a 10,400-fold greater affinity for the pHuR98 DNAsequence. If there is 1 to a maximum of 25 binding sites perpHuR98 DNA (assuming each GGAAT repeat is capable ofbinding a single protein) (9) and if there are no pseudosites inE. coli DNA, then the actual relative affinity is greater than105 to 2 x 106 (17). Competition experiments using anotherclone consisting primarily of GGAAT repeats (pHuR94)show that this sequence can compete for the pHuR98 bindingprotein(s). Cloned Alu (1), a satellite (2), or telomere (5)repetitive DNA sequences do not compete for this protein(s)(data not shown).

DISCUSSIONHighly conserved human repetitive DNA sequences wereisolated from a library constructed from randomly shearedand reassociated DNA (5, 9). This library contains a greater

Biochemistry: Grady et al.

Proc. Nati. Acad. Sci. USA 89 (1992)

Table 1. Hyperchromicity of synthetic oligodeoxynucleotidesOligomer Sequence Tm, 0C

AA B C D E F G H

(GGAAT)6(ATTCC)6(GGAAT)6iATTCC)6(AATGG)6(GcAAT)6(GAAAT)6(GTAAT)6(GNAAT)6(GNAAT)C(AT7NC)6(GGAAA)6(GGAAc)6(GGAAG)6(GGAGT)6(GGATT)6(GGAcT)6(GGGAT)6(GGTAT)6(GGcAT)6(GGTAA)6(GGAAT)4(AGAAT)4(AcAAT)4(ATAAT)4(AAAAT)4(cGAAT)4(CAAAT)4(ccAAT)4i-r_ A A'r

Human satellite DNAHuman satellite DNAHuman satellite DNA duplexHuman satellite DNAHuman satellite DNA variantHuman satellite DNA variantHuman satellite DNA variantMixed oligomerMismatched duplex

Human satellite DNA

65 (Fig. 4)- (Fig. 4)65 (Fig. 4)656560566258

44- (Fig. 4)52

69

65

58

B1.5

CDzo 1.0z

Ulw

4c-Jw 0.5

F

0

*0 00 0*~~a

@0

. .. .. ...

i

100 1,000 10,000MASS EXCESS COMPETITOR DNA

tCTlA 1)4(TGAAT)4(TcAAT)4(rrAAT)4(TAAAT)4(GGAA)8 38(GGAA)8-(TTCC)8 Duplex 66(GAAT)8 56(GATA)8 32(GTAG)8(GGTA)8(GAGAA)6 Drosophila satellite DNA(GAGAA)6.(ATTCC)6 Drosophila-human mismatched 40

duplex(GGAAAT)5 Yeast CDE III polymer 46(GGAAAT)5.(ATTTCC)s Yeast CDE III duplex 62

Oligodeoxynucleotides 18-51 nucleotides long were synthesized,hybridized, and denatured in 50 mM NaCl. The thermal denaturationtemperature (Tm) is taken at the last linear height increase inhyperchromicity (see Fig. 4). Nucleotide variations from the con-served (GGAAT) repeat are represented in small uppercase letters.

representation of sequences that are cut infrequently withrestriction enzymes, such as centromeric repeats (9). Otherthan the (GT), and (TTAGGG),, sequences reported previ-ously (5), eight clones consisting predominantly of(GGAAT),, repeats were also obtained. These clones wereisolated at high stringency with either mouse or hamsterDNA, indicating that related sequences are present in thesetwo rodent genomes. Southern blot, PCR, and in situ hybrid-ization analyses confirmed the conservation of this sequenceamong diverse species (Figs. 1 and 3). These results indicatethat this or closely related sequences are >1 billion years old,the last branch point between lineages of the organismsexamined. This is the most conserved DNA componentfound at the human centromere, since a satellite sequencesare present only in primates (18).

In situ hybridization analysis localized the major clusters ofthis sequence to the centromeric region ofhuman andDrosophilachromosomes (Fig. 3). In addition to the small signals at centro-

FIG. 5. DNA-mobility-shift analysis. 32P-end-labeled pHuR98 insertDNA (158 nucleotides long) was incubated with 0.35M NaCl extracts ofHeLa-cell nuclear proteins (11). InA, increasing incubation times (lanes;A, 0 time; B, 30 s; C, 1.5 min; D, 3 min; E, 7.5 min; F, 15 min; G, 30 min;H, 1h) yielded two discrete bands shifted to higher molecularweight. Saltconcentrations >50 mM reduced the amount of the lower band with aconcomitant increase of the upper band. Temperatures <37°( had theopposite effect. In B, quantitation of the amount of DNA-proteincomplex, at various competitor DNA concentrations, was obtained andnormalized to the optimum conditions shown in A. e, E. coli DNAcompetitor; o, poly [d(I-)] competitor.

meric regions, major hybridization signals at the large hetero-chromatic blocks on human chromosomes 1, 9, 16, and Y wereobserved (Fig. 3). These regions are known to consist of largeamplified blocks of the (GGAAT), repeat (9). Some of theseheterochromatic regions are recent amplifications during primatespeciation, not present in homologous chimpanzee and gorillachromosomes (19). Telomeric (TTAGGG)" repeats can undergosuch periodic amplifications without apparent disruption of cel-lular function (7). A "core" of functional sequences is stillpresent on each chromosome, even in the presence of presum-ably nonfunctional amplified blocks of the same sequence (7).Therefore, large variations in DNA copy number between spe-cies (or chromosomes) cannot be used as an argument for a lackofbiological function, as has been suggested (20). It is interestingto note that the heterochromatic regions that contain amplifieddegenerate (GGAAT)0 repeats (9, 14, 19) are both adjacent to thecentromeric constriction, which also contains this sequence (Fig.3), and are found on chromosomes with a high incidence ofnondisjunction and other mitotic/meiotic abnormalities (21). Ifthe (GGAAT)1 sequence is a component ofthe functional humancentromere, then sporadic anomalous amplification might beexpected to lead to these two results. Without an efficientmechanism to remove such amplifications (clearly lacking in thecase of telomeric repeats; ref. 7), the cell might tolerate such"rusting hulks," especially since DNA "bulk" surrounding thecentromere seems to be necessary for higher eukaryotic chro-mosome function (22, 23).The unusual thermal stability ofthe (GGAAT)" repeat may

be the result of unusually stable G-A base pairs (Fig. 4). The

100,000

1698 Biochemistry: Grady et A

Proc. Natl. Acad. Sci. USA 89 (1992) 1699

CentromereDomains

DNA Domains

Pairing Kinetochore

Kinetochore

Central

hyperchromicity experiments with variants of the (GGAAT),,repeat (Table 1) favor this interpretation. Stable G-A basepairing in other oligodeoxynucleotides has been reported (16,24, 25) and is a common structural motif participating in thethree-dimensional structure of tRNA (26). How stretches ofthis sequence can faithfully replicate, with the natural duplexhaving comparable stability to the purine-rich strand (Fig. 4),or whether a true duplex even exists under all physiologicalconditions are questions yet to be answered. Like telomericsequences (5), this unusual DNA structure may representanother "code" utilized for an important biological function.The ability of nuclear proteins to recognize this sequence

with high specificity is consistent with this speculation (Fig.5). The relative specificity observed in crude HeLa nuclearextracts (>105) is comparable to other highly selective pro-tein-DNA interactions, such as the lac repressor-operatorDNA interaction (17) or the binding ofthe yeast CBF3 proteincomplex to CDE III (27). Whether the DNA sequence itselfor a potentially unusual DNA structure is being recognized inthese interactions is yet to be determined.

Fig. 6 is a schematic diagram of the centromeric region ofhuman chromosomes (23). It should be noted that a numberoffunctional domains are present in this region, and each maycontain hundreds of thousands of base pairs of DNA. Forexample, the kinetochore domain stretches along the entireouter surface ofthe centromere region (Fig. 6 and refs. 23 and28). It is not known how the chromatin domains in thecentromeric region are coiled or folded. It is thought, how-ever, that only a component of the chromatin fiber extendsout of the centromere region to the kinetochore plate (22).While DNA sequences responsible for interacting with thekinetochore may be interspersed with other "spacer" DNApresent in the central region (Fig. 6 Upper right and ref. 28),it is also possible that sequences adjacent to the spacer DNAcan be folded in such a manner as to be on the outside of thecentromere region (Fig. 6 Lower right). Positioned as such,they would be ideally located to interact with kinetochoreproteins. The latter model is consistent with the known linearmolecular DNA organization of a satellite and classicalsatellite sequences yet accounts for the observed concordantmetaphase in situ hybridization patterns of these arrays (Fig.3 and refs. 9 and 23).

Alternatively, small clusters of satellite sequences, diffi-cult to detect by hybridization in the presence of largeamplified satellite blocks, may be interspersed in a-satellitearrays. Such simple-sequence satellites would not be cut byenzymes that free the a-satellite sequences as discreteblocks. Both models predict that the bulk ofDNA sequencespresent in the centromeric region may only be important to"space" the kinetochore and pairing DNA domains in thecorrect orientation. Either model is consistent with the rapid

FIG. 6. Diagrammatic representation of hu-man centromere domains. DNA regions respon-sible for interacting with the kinetochore may beeither interspersed with central-domain DNA(Upper right) or adjacent to central-domainDNA (Lower right). In either case, chromatinfolding in the centromeric region could placethese DNA regions on the outer surface of thechromosome, in place to interact with kineto-chore proteins.

turnover of most DNA sequences at the centromere [i.e., asatellite at human centromeres (2), mouse satellite at mousecentromeres (29), telomeric (TTAGGG),, tracts at many othervertebrate centromeres (7), etc.] yet proposes that a "core"of conserved sequences is maintained that represent theactual kinetochore (and possibly pairing) domains (Fig. 6).We propose that the conserved (GGAAT), repeat and/or itsinterspersed CATCATCGA(A/G)T and CAACCCGA(A/G)T sequences may represent such a component.

This work was supported by grants from the U.S. Department ofEnergy to R.K.M.

1. Moyzis, R. K., Torney, D. C., Meyne, J., Buckingham, J. M., Wu,J.-R., Burks, C., Sirotkin, K. M. & Goad, W. B. (1989) Genomics 4,273-289.

2. Willard, H. F. & Waye, J. S. (1987) Trends Genet. 3, 192-198.3. Waye, J. S. & Willard, H. F. (1989) Proc. Natl. Acad. Sci. USA 86,

6250-6254.4. Prosser, J., Frommer, J., Paul, C. & Vincent, P. C. (1986) J. Mol. Biol.

187, 145-155.5. Moyzis, R. K., Buckingham, J. M., Cram, L. S., Dani, M., Deaven,

L. L., Jones, M. D., Meyne, J., Ratliff, R. L. & Wu, J.-R. (1988) Proc.Natl. Acad. Sci. USA 85, 6622-6626.

6. Meyne, J., Ratliff, R. L. & Moyzis, R. K. (1989) Proc. Natl. Acad. Sci.USA 86, 7049-7053.

7. Meyne, J., Baker, R. J., Hobart, H. H., Hsu, T. C., Ryder, 0. A., Ward,0. G., Wiley, J. E., Wurster-Hill, D. H., Yates, T. L. & Moyzis, R. K.(1990) Chromosoma 99, 3-10.

8. Williamson, J. R., Raghuraman, M. K. & Cech, T. R. (1989) Cell 59,871-880.

9. Moyzis, R. K., Albright, K. L., Bartholdi, M. F., Cram, L. S., Deaven,L. L., Hildebrand, C. E., Joste, N. E., Longmire, J. L., Meyne, J. &Schwarzacher-Robinson, T. (1987) Chromosoma 95, 375-386.

10. Riethman, H. C., Moyzis, R. K., Meyne, J., Burke, D. T. & Olson,M. V. (1989) Proc. Nat!. Acad. Sci. USA 86, 6240-6244.

11. Strauss, F. & Varshavsky, A. (1984) Cell 37, 889-901.12. Fitzgerald-Hayes, M., Clarke, L. & Carbon, J. (1982) Cell 29, 235-244.13. Carbon, J. & Clarke, L. (1990) New Biol. 2, 10-19.14. Lohe, A. R. & Brutlag, D. L. (1986) Proc. Nat!. Acad. Sci. USA 83,

696-700.15. Gosden, J. R., Mitchell, A. R., Buckland, R. A., Clayton, R. P. &

Evans, H. J. (1975) Exp. Cell Res. 92, 148-158.16. Li, Y., Zon, G. & Wilson, W. D. (1991) Biochemistry 30, 7566-7572.17. von Hippel, P. H. & Berg, 0. G. (1986) Proc. Natl. Acad. Sci. USA 83,

1608-1612.18. Maio, J. J., Brown, F. L. & Musich, P. R. (1981) Chromosoma 83,

103-125.19. Miller, D. A. (1977) Science 198, 1116-1124.20. John, B. & Miklos, G. L. G. (1979) Int. Rev. Cytol. 58, 1-114.21. Bove, A., Bove, J. & Gropp, A. (1984) Adv. Hum. Genet. 14, 1-57.22. Rattner, J. B. (1987) Chromosoma 95, 175-181.23. Pluta, A. F., Cooke, C. A. & Earnshaw, W. C. (1990) Trends Biochem.

Sci. 15, 181-185.24. Prive, G. G., Heinemann, U., Chandrasegaran, S., Kan, L. S., Kopka,

M. L. & Dickerson, R. E. (1987) Science 238, 498-504.25. Li, Y., Zon, G. & Wilson, W. D. (1991) Proc. Natl. Acad. Sci. USA 88,

26-30.26. Rich, A. & RajBhandary, U. L. (1976)Annu. Rev. Biochem. 45, 805-860.27. Lechner, J. & Carbon, J. (1991) Cell 64, 717-725.28. Zinkowski, R. P., Meyne, J. & Brinkley, B. R. (1991) J. Cell Biol. 13,

1091-1110.29. Pardue, M. L. & Gall, J. G. (1970) Science 168, 1356-1358.

Biochemistry: Grady et al.

malm