single-sperm determinationofgenetic gv.globin hormone loci ...proc. natl. acad. sci. usa86 (1989)...

5
Proc. Natl. Acad. Sci. USA Vol. 86, pp. 9389-9393, December 1989 Genetics Single-sperm typing: Determination of genetic distance between the Gv.globin and parathyroid hormone loci by using the polymerase chain reaction and allele-specific oligomers (genetic recombination/linkage) XIANGFENG CUI*, HONGHUA LI*, TUSHAR M. GORADIAtt, KENNETH LANGEt, HAIG H. KAZAZIAN, JR.§, DAVID GALAS*, AND NORMAN ARNHEIM*¶ *Molecular Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-1340; tDepartment of Biomathematics, University of California School of Medicine, Los Angeles, CA 90024-1766; tDivision of Health Sciences and Technology, Harvard University-Massachusetts Institute of Technology, Cambridge, MA 02139; and §Department of Pediatrics, Johns Hopkins University, School of Medicine, Baltimore, MD 21205 Communicated by Elizabeth F. Neufeld, August 8, 1989 (received for review June 14, 1989) ABSTRACT The frequency of recombination between the Gy-globin (HBG2) and parathyroid hormone (PTH) loci on the short arm of human chromosome 11 was estimated by typing >700 single-sperm samples from two males. The sperm-typing technique employed involves the polymerase chain reaction and allele-specific oligonucleotide hybridization. Our maximum likelihood recombination fraction estimate of 0.16 (95% con- fidence interval, 0.13-0.19) falls well within previous estimates based on family studies. With current technology and a sample size of 1000 sperm, recombination fractions down to =0.009 can be estimated with statistical reliability; with a sample size of 5000 sperm, this value drops to about 0.004. Reasonable technological improvements could result in the detection of recombination frequencies <0.001. The determination of genetic linkage between polymorphic markers in humans has traditionally depended upon pedigree analysis. Such studies have a number of inherent difficulties when compared to linkage analysis in experimental orga- nisms. Humans of the appropriate genotypes cannot be crossed at will. In those crosses that are available, linkage phase is often unknown; this uncertainty blurs the distinction between recombinants and nonrecombinants. Finally, the difficulty in obtaining large numbers of offspring from fam- ilies informative for a particular set of genetic markers effectively limits statistically reliable recombination fre- quency estimates to 0.01 or 0.02 (i.e., 1 or 2 centimorgans). A radically different approach to measuring recombination frequency has been suggested (1) that has a number of advantages when compared to traditional family studies. This approach has, as its basis, the direct molecular analysis of DNA sequences in single meiotic products. The method uses the polymerase chain reaction (PCR) (2-4) to amplify poly- morphic DNA sequences at two or more genetic loci in a single sperm. Analysis of single human sperm by this typing procedure has already demonstrated Mendelian segregation of alleles at each of two loci and independent assortment of genes on nonhomologous chromosomes (1). With sperm typing a direct count of recombinant and nonrecombinant meiotic products can be made and used to estimate recom- bination frequency. Since a single human semen sample contains >300 million sperm, this approach is not limited by sample-size considerations and is capable of high-resolution genetic mapping. Before embarking on studies to measure very small re- combination fractions using the sperm-typing approach, we thought it prudent to estimate the frequency of recombination between two loci whose genetic distance from one another had already been determined by pedigree analysis. We have, therefore, estimated by sperm typing the recombination fraction between the parathyroid hormone gene locus (PTH) and the G y-globin gene locus (HBG2) on the short arm of human chromosome 11. In these experiments, we examined 708 sperm samples from two males. We have also developed a mathematical approach for analyzing sperm-typing data and discuss the level of gene-mapping resolution currently made possible using this approach. MATERIALS AND METHODS Sperm Isolation. Sperm were purified from semen samples and single sperm were isolated manually by micromanipula- tion using methods described (1) with the following modifi- cations: (i) Pyrex (9800) tubes rather than plastic tubes appeared to decrease sperm adherence to the tube; (ii) 25 ILI of human semen was diluted 1:8 in water; (iii) fractionation was carried out on a step-gradient consisting of 300 pI of 50%, 150 A.l of 40%, and 150 pl of 30% (wt/vol) sucrose; and (iv) centrifugation was for 10 min at room temperature at 250 x g followed by collection of the sperm band in the 50% fraction. Polymorphic Markers. Restriction fragment length poly- morphisms have been found in both the PTH and Gy-globin loci. At the PTH locus, there is a Taq I restriction fragment length polymorphism located in the second intron due to a G -> A transition (ref. 5 and T. Igarashi and H. Kronenberg, personal communication). At the Gy-globin locus, a HindIII restriction fragment length polymorphism caused by a T -V G transversion (6) exists 82 base pairs upstream of the third exon. The PCR primers for PTH amplification were 5'- GATCTCTTCCTGGGAAGAAG-3' and 5'-GATACCTG- CAAAAGACATGG-3'. The primers for the Gy-globin locus were 5'-AGTGACTAGTGCTGCAAGAA-3' and 5'-CTCT- GCATCATGGGCAGTGA-3'. The allele-specific oligonucle- otide probes for the PTH locus were 5'-TCCCCACTTC- GAAATGATA-3' (allele A) and 5'-TCCCCACTTTGAAAT- GATA-3' (allele a), and the probes for Gy-globin were 5'- TTCTGGGTGGAAGCTTGGT-3'(allele B) and 5'-TTCTGG- GTGGAAGCTGGGT-3' (allele b). Sperm Lysis and PCR Conditions. Some major modifica- tions have been made to our original sperm-typing procedure (1). Instead of lysing sperm with proteinase K and SDS, we now carry out lysis of the sample (about 1 Al) with 2.5 p.1 of 200 mM KOH/50 mM dithiothreitol for 10 min at 650C. After Abbreviations: PCR, polymerase chain reaction; df, degree(s) of freedom; SE, standard error. ITo whom correspondence should be sent. 9389 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on April 3, 2020

Upload: others

Post on 26-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Single-sperm Determinationofgenetic Gv.globin hormone loci ...Proc. Natl. Acad. Sci. USA86 (1989) 9391 these( valuesoverlap0.0. (Weformconfidenceintervalsby taking 2 SDs on either

Proc. Natl. Acad. Sci. USAVol. 86, pp. 9389-9393, December 1989Genetics

Single-sperm typing: Determination of genetic distance between theGv.globin and parathyroid hormone loci by using the polymerasechain reaction and allele-specific oligomers

(genetic recombination/linkage)

XIANGFENG CUI*, HONGHUA LI*, TUSHAR M. GORADIAtt, KENNETH LANGEt, HAIG H. KAZAZIAN, JR.§,DAVID GALAS*, AND NORMAN ARNHEIM*¶*Molecular Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-1340; tDepartment ofBiomathematics, University of California School of Medicine, Los Angeles, CA 90024-1766; tDivision of Health Sciences and Technology, HarvardUniversity-Massachusetts Institute of Technology, Cambridge, MA 02139; and §Department of Pediatrics, Johns Hopkins University, School ofMedicine, Baltimore, MD 21205

Communicated by Elizabeth F. Neufeld, August 8, 1989 (received for review June 14, 1989)

ABSTRACT The frequency of recombination between theGy-globin (HBG2) and parathyroid hormone (PTH) loci on theshort arm of human chromosome 11 was estimated by typing>700 single-sperm samples from two males. The sperm-typingtechnique employed involves the polymerase chain reaction andallele-specific oligonucleotide hybridization. Our maximumlikelihood recombination fraction estimate of 0.16 (95% con-fidence interval, 0.13-0.19) falls well within previous estimatesbased on family studies. With current technology and a samplesize of 1000 sperm, recombination fractions down to =0.009can be estimated with statistical reliability; with a sample sizeof 5000 sperm, this value drops to about 0.004. Reasonabletechnological improvements could result in the detection ofrecombination frequencies <0.001.

The determination of genetic linkage between polymorphicmarkers in humans has traditionally depended upon pedigreeanalysis. Such studies have a number of inherent difficultieswhen compared to linkage analysis in experimental orga-nisms. Humans of the appropriate genotypes cannot becrossed at will. In those crosses that are available, linkagephase is often unknown; this uncertainty blurs the distinctionbetween recombinants and nonrecombinants. Finally, thedifficulty in obtaining large numbers of offspring from fam-ilies informative for a particular set of genetic markerseffectively limits statistically reliable recombination fre-quency estimates to 0.01 or 0.02 (i.e., 1 or 2 centimorgans).A radically different approach to measuring recombination

frequency has been suggested (1) that has a number ofadvantages when compared to traditional family studies. Thisapproach has, as its basis, the direct molecular analysis ofDNA sequences in single meiotic products. The method usesthe polymerase chain reaction (PCR) (2-4) to amplify poly-morphic DNA sequences at two or more genetic loci in asingle sperm. Analysis of single human sperm by this typingprocedure has already demonstrated Mendelian segregationof alleles at each of two loci and independent assortment ofgenes on nonhomologous chromosomes (1). With spermtyping a direct count of recombinant and nonrecombinantmeiotic products can be made and used to estimate recom-bination frequency. Since a single human semen samplecontains >300 million sperm, this approach is not limited bysample-size considerations and is capable of high-resolutiongenetic mapping.

Before embarking on studies to measure very small re-combination fractions using the sperm-typing approach, we

thought it prudent to estimate the frequency of recombinationbetween two loci whose genetic distance from one anotherhad already been determined by pedigree analysis. We have,therefore, estimated by sperm typing the recombinationfraction between the parathyroid hormone gene locus (PTH)and the Gy-globin gene locus (HBG2) on the short arm ofhuman chromosome 11. In these experiments, we examined708 sperm samples from two males. We have also developeda mathematical approach for analyzing sperm-typing data anddiscuss the level of gene-mapping resolution currently madepossible using this approach.

MATERIALS AND METHODSSperm Isolation. Sperm were purified from semen samples

and single sperm were isolated manually by micromanipula-tion using methods described (1) with the following modifi-cations: (i) Pyrex (9800) tubes rather than plastic tubesappeared to decrease sperm adherence to the tube; (ii) 25 ILIof human semen was diluted 1:8 in water; (iii) fractionationwas carried out on a step-gradient consisting of300 pI of50%,150 A.l of 40%, and 150 pl of 30% (wt/vol) sucrose; and (iv)centrifugation was for 10 min at room temperature at 250 xg followed by collection of the sperm band in the 50%fraction.

Polymorphic Markers. Restriction fragment length poly-morphisms have been found in both the PTH and Gy-globinloci. At the PTH locus, there is a Taq I restriction fragmentlength polymorphism located in the second intron due to a G-> A transition (ref. 5 and T. Igarashi and H. Kronenberg,personal communication). At the Gy-globin locus, a HindIIIrestriction fragment length polymorphism caused by a T -V Gtransversion (6) exists 82 base pairs upstream of the thirdexon. The PCR primers for PTH amplification were 5'-GATCTCTTCCTGGGAAGAAG-3' and 5'-GATACCTG-CAAAAGACATGG-3'. The primers for the Gy-globin locuswere 5'-AGTGACTAGTGCTGCAAGAA-3' and 5'-CTCT-GCATCATGGGCAGTGA-3'. The allele-specific oligonucle-otide probes for the PTH locus were 5'-TCCCCACTTC-GAAATGATA-3' (allele A) and 5'-TCCCCACTTTGAAAT-GATA-3' (allele a), and the probes for Gy-globin were 5'-TTCTGGGTGGAAGCTTGGT-3'(allele B) and 5'-TTCTGG-GTGGAAGCTGGGT-3' (allele b).Sperm Lysis and PCR Conditions. Some major modifica-

tions have been made to our original sperm-typing procedure(1). Instead of lysing sperm with proteinase K and SDS, wenow carry out lysis of the sample (about 1 Al) with 2.5 p.1 of200 mM KOH/50 mM dithiothreitol for 10 min at 650C. After

Abbreviations: PCR, polymerase chain reaction; df, degree(s) offreedom; SE, standard error.ITo whom correspondence should be sent.

9389

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Apr

il 3,

202

0

Page 2: Single-sperm Determinationofgenetic Gv.globin hormone loci ...Proc. Natl. Acad. Sci. USA86 (1989) 9391 these( valuesoverlap0.0. (Weformconfidenceintervalsby taking 2 SDs on either

Proc. Natl. Acad. Sci. USA 86 (1989)

the addition of 2.5 1.l of 300 mM KCI/900 mM Tris HCI, pH8.3/200 mM HCl, the neutralized sample is brought to 25 plwith PCR reagents. The final concentration of all componentsis: lx PCR buffer [50 mM KCl/2.5 mM MgCl2/gelatin (100,ug/ml)/100 mM Tris-HCl, pH 8.3]; all four dNTPs, each at187.5 ,uM; the four primers (two sets), each at 1 IuM;Escherichia coli DNA (optional) (1 ng/,ul); and 0.5 unit ofThermus aquaticus (Taq) polymerase. After 20 cycles (95TC,15-sec denaturation/60'C, 15-sec annealing/720C, 1-min ex-tension) of PCR, each sample was divided into two 12.5-1ulportions and to each aliquot was added 37.5 ,ul containing 1xPCR buffer, all four dNTPs, each at 250 ,uM; one of the twosets of primers, each primer at 1 ,uM; and 1 unit of Taqpolymerase. All reaction mixtures were overlaid with mineraloil, and 50 cycles of PCR were carried out as before exceptthat different annealing temperatures (PTH at 54°C andGy-globin at 65°C) were used. After PCR amplification, eachsperm sample was typed with each of the four allele-specificprobes (two for each locus) as described (1), except that thefinal washing temperatures were: 58°C for allele A, 56°C forallele a, 61°C for allele B, and 63°C for allele b.

MODEL DESCRIPTIONBy assuming fully efficient PCR amplification so that everyallele present in a sample is detected, exactly one normalhaploid sperm per sample and no contamination by exogenousDNA, we would expect to observe only sperm of the twoparental or two recombinant types. Because these assump-tions can fail in practice, there are 24 = 16 possible typingoutcomes corresponding to a sample reacting positively ornegatively with each ofthe four allelic probes. Ifthe first locushas alleles A and a and the second locus has alleles B and b,then, for example, the nonstandard outcome AaB-, represent-ing the detection of alleles A, a, and B but not b, is possiblewhen two sperm having genotypes AB and aB enter a tube.The actual outcomes we observed are shown in Table 1.To account for the full complexity ofour data, we postulate

a general model (model 1) containing 14 parameters: (i) therecombination fraction 6 between the loci; (ii) the probabilityyn of n sperm present in a tube, n = 0, 1, 2, 3, 4; (iii) theprobability that a particular polymorphic region will beamplified to a detectable level (four parameters aA, aa, aB,and ab); and (iv) the probability of a contamination event

Table 1. Typing 708 single sperm samples at the PTH andGvglobin loci using four allele-specific oligonucleotide probes

Observed Sperm samples, no.

sperm type Donor 1 Donor 2 Total

1( )2 (-b)3 (-B-)4 (--Bb)S (-a-)6 (-a-b)7 (-aB-)8 (-aBb)9 (A---)10 (A-b)11 (A-B-)12 (A-Bb)13 (Aa-)14 (Aa-b)15 (AaB-)16 (AaBb)

47460

1518

1034

11892630

348

425

899

5 111 13 18

15 33124 22711 1510 21

107 1%21 476 90

368

0

61016

The probe sequences corresponding to the PTH alleles (A and a)and Gy-globin alleles (B and b) are given in Materials and Methods.

(four parameters PA, Pa, PB, and Pb). Appendix I illustrateshow the probabilities of each of the 16 outcomes can becalculated as functions of these parameters. These 16 prob-abilities are necessarily complicated because some errors canbe masked by a second compensating error.

Finally, we made the additional plausible assumption thateach sampled tube is statistically independent of the othertubes. This means the sperm-typing data are multinomial, andstandard maximum likelihood techniques can be used toestimate parameters and test hypotheses. One can also testthe overall validity ofthe general model and submodels by theusual x2 statistics.

RESULTSWe have elected to maximize likelihoods by a variant ofFisher's scoring algorithm that employs iteratively re-weighted least squares (7, 8). Asymptotic standard errors(SEs) of the parameter estimates are then immediately avail-able as a by-product of inverting the expected informationmatrix (9). Table 2 presents the maximum likelihood esti-mates and their asymptotic SEs for the data from each donorseparately and for the combined data under model 1.

Altogether, we analyzed 708 sperm samples: 341 fromdonor 1 (35 years old) and 367 from donor 2 (50 years old).The recombination fractions estimated for donors 1 and 2 are0.1923 and 0.1361 with SEs of 0.0253 and 0.0216, respec-tively. In the combined data the estimated recombinationfraction is 0.1618 with an SE of 0.0168. The average effi-ciency (a) of detecting an allele actually present in a tube is=0.95 in the combined data. This is considerably higher thanthe efficiency reported in the original data of Li et al. (1) andis most probably the result of the modified lysis and PCRprocedures.Our data are consistent with little sample contamination.

The average maximum likelihood estimate for a contamina-tion parameter in our combined data is only 0.022. Moreover,three out offour ofthe 95% confidence intervals estimated for

Table 2. Model 1 maximum likelihood estimates of theparameters 6, a, p, and y for the donors bothindividually and combined

L Parameter estimate

L1 = Donor 1-678.584 6A, 0.9650 (0.0272); &a, 0.9581 (0.0291);

.B-, 0.8989 (0.0340); &b, 0.8920 (0.0343);A, 0.0; #a, 0.0;

aB, 0.01385 (0.0224); Pib, 0.004351 (0.0205);6, 0.1923 (0.0253); fo, 0.1374 (0.0196);fl, 0.7663 (0.0286); f2, 0.09624 (0.0246);

F3, 0.0; F4, 0.0.L2= Donor 2

-699.174 6A, 0.9773 (0.0255); a-, 0.9743 (0.0223);d-B, 0.9947 (0.0175); ab, 0.9112 (0.0309);

PA, 0.02701 (0.0209); Pia, 0.01180 (0.0158);#B, 0.03877 (0.0238); Pb, 0.06328 (0.0253);

6, 0.1361 (0.0216); fo, 0.1304 (0.0203);fj, 0.8070 (0.0282); f2, 0.06257 (0.0244);

53, 0.0; F4, 0.0.L3= Donors 1 + 2

-1387.079 6A, 0.9731 (0.0187); &a, 0.9679 (0.0181);&B, 0.9483 (0.0205); ab, 0.9012 (0.0243);

fi4A, 0.01403 (0.0151); Pia, 0.007239 (0.0131);p 0.02839 (0.0169); ib, 0.03698 (0.0170);

6, 0.1618 (0.0168); jo, 0.1352 (0.0145);fl, 0.7878 (0.0218); f2, 0.07705 (0.0205);

j3, 0.0; F4, 0.0.Values in parentheses are SE. L, maximum log likelihood.

9390 Genetics: Cui et al.

Dow

nloa

ded

by g

uest

on

Apr

il 3,

202

0

Page 3: Single-sperm Determinationofgenetic Gv.globin hormone loci ...Proc. Natl. Acad. Sci. USA86 (1989) 9391 these( valuesoverlap0.0. (Weformconfidenceintervalsby taking 2 SDs on either

Proc. Natl. Acad. Sci. USA 86 (1989) 9391

these ( values overlap 0.0. (We form confidence intervals bytaking 2 SDs on either side of the maximum likelihoodestimates.) Direct experiments to estimate these contamina-tion rates detected no signals with any of the four allele-specific oligonucleotide probes in a series of 63 tubes thatreceived all the PCR reagents with the exception that waterwas added instead of a sperm sample. These control exper-iments provide statistically significant evidence of an overallcontamination rate of <0.05.

Analysis of the combined data also indicates that the vastmajority of the samples (0.79) received a single sperm aftermicromanipulation. Among the remainder were tubes withzero or two sperm at frequencies of 0.13 and 0.08, respec-tively. No evidence was found for tubes with more than twosperm. Thus, parameters y3 and y4 were dropped whenadditional models were considered. Sperm chromosome aneu-ploidy will have little effect on the frequency of tubes thatappear to have zero or two sperm since the estimated averagerate for any particular chromosome is -0.001 (10).The difference between the estimated recombination frac-

tions of donors 1 and 2 is probably compatible with their SEs.A formal assessment of heterogeneity between the donorscan be made using a likelihood ratio test. The statistic

2(L1 + L2- L1+2) = 2(-678.6 - 699.2 + 1387.1) = 18.6,

based on the maximum log likelihoods given in Table 2,follows an approximate X'f distribution with degrees offreedom (df) = 12 + 12 - 12 = 12 determined by thedifference in numbers of independent parameters. The bor-derline P value of 0.10 for this test owes as much to thediscrepant contamination parameters (J3) as it does to thedifference in estimated recombination fractions (6). We ten-tatively conclude that no significant heterogeneity existsbetween the donors. Other, more extensive, experimentswould be needed to confirm or refute this claim.Appendix II attempts to reduce model 1 to the most

parsimonious submodel consistent with the combined datafrom both donors. The submodels described there are derivedfrom the general model by imposing various constraints onthe a and m parameters. For instance, it may be adequate toassume that the efficiency parameters (a) are locus specific(aA = aa and aB = ab). We consider six models and concludethat model 5, which assumes aA = aa, aB = ab, and PA = Pa= PRB = fis, is the most parsimonious submodel consistentwith the data. Since every parameter except 6 is a nuisanceparameter, the necessity of choosing between the generalmodel and a more parsimonious submodel becomes an issueonly if it affects either the estimate 6 of 6 or its SE.Fortunately, under model 5, 0 is 0.1609 with SE = 0.0166; thisbarely differs from the 0 of 0.1618 with SE = 0.0168 for model1. Thus, in these data it is a matter of indifference whether thegeneral model or one of the simpler consistent submodels isused.

In fact, a rough analysis of the data agrees with the resultsof both models 1 and 5. By disregarding sperm categories thatare explicitly erroneous, we can estimate 6 very simply. InTable 1, all categories except 6, 7, 10, and 11 correspond totubes with obvious errors. Tubes in categories 6, 7, 10, and11 ostensibly possess one allele at each locus. In the absenceof errors, the usual maximum likelihood counting estimate foro is 6 = (nab + nAB)/n = 80/503 = 0.159, where, for example,nab is the number of observed sperm having the recombinanthaplotype ab and n is the total number of sperm attributed toall four categories. In the absence of any errors, the SE of thisestimate is [6(1 - I)/n]1/2 = 0.0163. Such a close matchbetween the above crude estimate and the more sophisticatedestimates of models 1 and 5 should not necessarily beexpected in other data sets.

DISCUSSIONOur estimates ofthe recombination fraction can be comparedto those obtained by others using family studies. Aftercorrection and submission for locus-ordering studies byGenetics Analysis Workshop IV (19), the original PTH-,8-globin linkage data of Antonarakis et al. (11) gave anestimated male recombination fraction around 0.12 with SE

0.05 [for instance, see Wong et al. (12)]. These estimatesare entirely consistent with our estimated male recombina-tion fraction of 0. 161. The abstract by Leppert et al. (13) alsogives a preliminary map of the chromosome lip region. Thismap depicts a male recombination fraction of 0.17 for thePTH-p-globin interval, with no SE listed. Thus, the knownpedigree data are in good agreement with the sperm-typingdata. The obvious advantage of the sperm-typing data is theincreased precision afforded by the small SE (0.016). Thisadditional precision stems from the larger number of meioticproducts analyzed and the more certain knowledge of phase.

It is clear from our experiments and data analysis thathigh-resolution measurements of recombination frequencysimultaneously depend upon high efficiencies of detecting anallele if present in a tube (a), on low frequencies of contam-ination (J*), and on a low frequency oftubes with zero or morethan one sperm (y). If these errors did not exist, thenestimates of the recombination fraction between genetic locicould be made with a high degree of statistical accuracy thatwould solely be a function of the number of sperm aspredicted by large sample statistical theory. The fact thatexperimental errors can occur demands that more than theexpected number of samples be studied to achieve the samestatistical reliability for a given level of recombination. Howmuch larger the sample size has to be depends upon thefrequency of errors. Fig. 1 plots the relationship between 6and the asymptotic SE of 0 for a sample size of 1000 spermunder a number of different conditions. Curve A indicatesthis relationship if there were no errors of the kind we havediscussed. Three additional curves are shown. In curve C theparameters are set at the maximum likelihood estimatesderived from our combined data. Curve B reflects the rela-tionship between 6 and the SE of 0 under conditions wherewe have reduced the contamination and multiple sperm errorrates and increased the probability of detection to levels thatwe feel will be achievable with some incremental technolog-ical advances. Curve D demonstrates what the relationshipwould be if the various error rates were markedly increased.

-0

.

So.0

E-a

o2S

Io

0.1

0.01-

B.001 _A__--,0.0001 0.001 0.01

Recombination fraction0.1 1

(on log scale)

FIG. 1. The relationship between 0 and the SE of 6 based on thefrequency of various errors and a sample size of 1000 sperm. Allcalculations presuppose model 5. Curve A assumes no errors; curveB assumes a common efficiency rate a = 0.98, a common contam-ination rate =0.002, and sperm distribution probabilities yO = 0.01,yj = 0.98, and y2 = 0.01; curve C assumes the model 5 maximumlikelihood estimates from the combined data, namely efficiency ratesaA = aa = 0.97, aB = ab = 0.94, a common contamination rate /3 =0.02, and sperm distribution probabilities yo = 0.14, Vy = 0.79, andy2 = 0.07; and curve D assumes a common efficiency rate a = 0.85,a common contamination rate (8 = 0.10, and sperm distributionprobabilities yo = 0.25, Yi = 0.55, and y2 = 0.20.

Genetics: Cui et al.

Dow

nloa

ded

by g

uest

on

Apr

il 3,

202

0

Page 4: Single-sperm Determinationofgenetic Gv.globin hormone loci ...Proc. Natl. Acad. Sci. USA86 (1989) 9391 these( valuesoverlap0.0. (Weformconfidenceintervalsby taking 2 SDs on either

Proc. Natl. Acad. Sci. USA 86 (1989)

In general, the asymptotic SE of 0 is inversely proportionalto (n)"12, where n is sample size. With the levels of errorexperienced in this study and a sperm sample size of 1000, weshould be able to measure recombination rates down to 0.009in the sense that the 95% confidence interval for a truerecombination fraction of 0.009 just barely excludes 0.0.Flow cytometry for sperm isolation will facilitate the analysisof larger sample sizes. For a sperm sample size of 5000, weshould be able to measure recombination rates down to 0.004.Incremental technological improvements that further lowerthe errors could result in an even higher level of resolution.For example, substituting flow cytometry for micromanipu-lation should reduce the number of sperm samples containingmore than one sperm. Additional refinements of the lysisprocedure and the PCR conditions as well as enhanceddetection of PCR product will increase the a values. Thus, ifthe parameters given in Fig. 1, curve B, could be realized,then recombination rates down to 0.001 could be measuredwith a sample size of 5000 sperm.The sperm-typing approach provides a unique tool for the

study of many problems in human genetics currently consid-ered intractable. Sperm typing will allow investigation of therelationship between recombination frequency and physicaldistance for specific chromosomal regions. In conjunctionwith gel electrophoresis procedures for large DNA fragmentsand chromosome-walking data, it will be possible to measurethe frequency of recombination between genetic markerswhose physical distance apart is known exactly; this will beespecially useful in the precise localization of recombinationhot spots.Sperm typing will also provide a unique approach to deter-

mining the chromosomal order of DNA polymorphisms thatare so tightly linked that they cannot be resolved by pedigreeanalysis. Because larger numbers of meiotic products can beexamined, gene order can readily be determined by three-pointcrosses. This approach does not demand extremely high-resolution estimates of recombination fractions since geneorder is deduced by determining which gamete class repre-sents double crossovers. Optimal mapping strategies usingsperm typing are considered in refs. 14 and 15.Because it is now possible to obtain data on recombination

frequencies from a single individual, the question of whetherdifferent males have the same or different rates of recombi-nation for the same chromosomal interval and whether the ratefor a specific interval changes with age can be addressed.There is very little information available on these effects(16-18). Such information could be valuable for genetic coun-seling. Although our experiments were not designed to spe-cifically study these questions, our results suggest that spermtyping should make it possible to obtain definitive answers.

In conclusion, the ability to analyze DNA sequences insingle sperm can provide unique approaches to a number offundamental problems in human genetics.

APPENDIX ITo perform maximum likelihood, it is necessary to computethe probability of each outcome category. Each category orcell corresponds to a particular subset of the four alleles (A,a, B, and b) being detected. Because different errors cansometimes compensate for one another, determination of acell probability is not entirely straightforward. We proceedby the roundabout path of defining indicator random varia-bles that signal the nondetection of a particular allele in atube. Specifically, let

f 0 if allele A is detected in the tube)A = 11 otherwise,

and define Xa, XB, and Xb similarly.At this point we focus on the typical outcome category

AaB-. This allows us to illustrate the general procedurewithout tedious repetition of detail. By standard inclusion-exclusion arguments, we have the relationship

Prob{detect A, a, and B, but not b}

= E[(1 - XA)(1 - Xa)(1 -XB)Xb]

= E(xb) - E(VAXb) - E(Vab) - E(XBXb)

+ E(,AXaXb) + E(XAXBXb) + E(XaXBXb)

- E(XAXaXBxb),

involving expectations (E) of products of the indicator ran-dom variables. Since our model postulates that the contam-ination process is independent of the ordinary nondetectionof alleles, the expectations in Eqs. 1 can be decomposed byfactoring out the contamination events and conditioning onthe number of sperm present in the tube. This leads to, forexample,

E(XAXaXb)=Prob{no contamination by A, a, b}

X E(XAXaXblno contamination by A, a, b)

= (1 - /3A)(1 - 13a)(1 - Pb)4

x Z [E(CVpy.aXbln sperm in tube, no contamination)n=O

x Prob{n sperm in tube}]

4= (1 -PA)(1 -_ 3a)(1 -8b) > (PAab)'Yn

n=O

= (1 - 1A)(1 - /a)(1 - Pb)G(PAab),

where PAab is the probability that a single sperm shows no A,a, or b alleles in the absence of contamination, and whereG(s) = X4O ys", 0 C S C 1, is the generating function for therandom number of sperm that enter the tube.

Calculating Prob{detect A, a, and B, but not b}, therefore,reduces to calculating probabilities such as PAab. This isstraightforward if we condition on the true haplotype of thesingle sperm involved. For example, with Ab and aB as thetwo nonrecombinant haplotypes,

PAab = (1 - aA)(1/2)0 + (1 - aA)(l - ab)(l/2)(1- 6)

+ (1 - aa)(1/2)(1 - 0) + (1 - aa)(1 - ab)(1/2)0.

All ofthe other expectations in Eqs. 1 can be computed in likemanner. Since every other cell probability has a decompo-sition similar to Eqs. 1, this completes our capsule descrip-tion of how to compute the cell probabilities.

APPENDIX IIIt is natural to simplify the general model by assuming either(i) that efficiencies are locus-specific (aA = aa, aB = ab) or(ii) that contamination rates are locus-specific (PA = Pa, P8B= 8b). These hypotheses are supported by forming approx-imate 95% confidence intervals from the combined data

9392 Genetics: Cui et al.

Dow

nloa

ded

by g

uest

on

Apr

il 3,

202

0

Page 5: Single-sperm Determinationofgenetic Gv.globin hormone loci ...Proc. Natl. Acad. Sci. USA86 (1989) 9391 these( valuesoverlap0.0. (Weformconfidenceintervalsby taking 2 SDs on either

Proc. Natl. Acad. Sci. USA 86 (1989) 9393

Table 3. Comparison of model 1 with the various submodelsLikelihoodratio test Pearson's x2

Model Constraints L Value P Value P

1 No constraints -1387.079 6.136 (3) 0.112 aA = aa, aB = ab -1387.938 1.718 (2) 0.42 7.792 (5) 0.173 PA = Pa' PB = Pb -1387.183 0.208 (2) 0.90 6.308 (5) 0.284 aA = aa, aB = ab, PA = Pa, POB = Pb -1388.328 2.498 (4) 0.64 0.9213 (1) 0.34S aA = aa, aB = ab, PA = Pa = PB = Pb -1389.948 5.738 (5) 0.33 3.871 (2) 0.146 aA = aa = aB = atb, PA = Pa, POB = Pb -1392.057 9.956 (5) 0.08 8.222 (2) 0.02

L, maximum log likelihood. Values in parentheses are df. The likelihood ratio test was calculated as 2 x (L1 - Li).

estimates of Table 2. Therefore, we consider the hierarchy ofmodels listed in Table 3. A likelihood ratio test can be usedto compare model i with the general model, model 1. Briefly,under the null hypothesis that model 1 reduces to model i, thestatistic 2(L1 - Li) has an approximate x2 distribution Xdf withdf determined by the difference in numbers of independentparameters. As is evident from Table 3, none of the simplermodels 2-5 offers a significant departure from the generalmodel. These successive reductions ofthe general model maymerely reflect an insufficient sample size to detect intrinsicdifferences among parameters. Model 6 is borderline signif-icant.

Likelihood ratio tests can only be employed to compare asubmodel to a general model. Pearson's x2 statistic forcategorical data provides a test for the overall validity ofeither the general model or a consistent submodel. Thisstatistic is also asymptotically distributed as X2dfunder the nullhypothesis, where df = number of categories - number ofindependent parameters - 1. The Pearson statistics dis-played in Table 3 show that models 1, 2, and 3 fit the dataalthough the P value for model 1 is slightly suspect. Pearson'sstatistic is best approximated by the X~f distribution when theexpected frequency of observations in each of the categoriesis large. Thus, in applying Pearson's statistic to models 4-6,we lump categories by exploiting the symmetries entailed inassuming that the efficiencies a and contamination rates P3 arelocus-specific rather than allele-specific. This means, forexample, that the probability of detecting A-Bb (A, B, b, butnot a) equals the probability of detecting -aBb. Systemati-cally tallying these symmetries leads to the following minimallist of lumped categories: (i) - (ii) ---b and --B-, (iii)--Bb, (iv) -a-- and A---, (v) -a-b and A-B-, (vi) -aB- andA-b, (vii) -aBb and A-Bb, (viii) Aa-, (ix) Aa-b and AaB-,and (x) AaBb. The Pearson statistics for models 4-6 usingthese lumped categories are displayed in Table 2. Models 4and 5 fit the data, but model 6 does not. Thus, we tentativelyidentify model 5 as the most parsimonious submodel consis-tent with the data.

We are grateful for Andy Arnold's help in obtaining the polymor-phic PTH sequence data. We also thank Michael Boehnke for readingthe first draft of our manuscript. This research was supported in partby Grants GM36745 (N.A.), 5T32GM0753-10 (T.M.G.), andDK13983 (H.H.K.) from the National Institutes of Health, GrantCA16042 (K.L.) from the U.S. Public Health Service, and GrantDE-FG03-87ER60548 (K.L.) from the U.S. Department of Energy.

The Perkin-Elmer/Cetus thermal cycler was a generous gift fromCetus Corporation.

1. Li, H., Gyllensten, U. B., Cui, X., Saiki, R. K., Erlich, H. A.& Arnheim, N. (1988) Nature (London) 335, 414-417.

2. Saiki, R. K., Scharf, S., Faloona, F., Mullis, K. B., Horn,G. T., Erlich, H. A. & Arnheim, N. (1985) Science 230, 1350-1354.

3. Mullis, K. B. & Faloona, F. A. (1987) Methods Enzymol. 155,335-350.

4. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi,R., Horn, G. T., Mullis, K. B. & Erlich, H. A. (1988) Science239, 487-491.

5. Vasicek, T. J., McDevitt, B. E., Freeman, M. W., Fennick,B. J., Hendy, G. N., Potts, J. T., Jr., Rich, A. & Kronenberg,H. M. (1983) Proc. Natl. Acad. Sci. USA 80, 2127-2131.

6. Jeffreys, A. J. (1979) Cell 18, 1-10.7. Lange, K., Weeks, D. & Boehnke, M. (1988) Genet. Epidemiol.

5, 471-472.8. Jennrich, R. I. & Moore, R. H. (1975) Proceedings of the

Statistical Computing Section of the American Statistical As-sociation (American Statistical Association, Alexandria, VA),pp. 57-65.

9. Rao, C. R. (1973) Linear Statistical Inference and Its Applica-tions (Wiley, New York), pp. 398-420.

10. Brandriff, B., Gordon, L., Ashworth, L. K., Watchmaker, G.& Carrano, A. V. (1986) in Detection ofChromosomal Abnor-malities in Human Sperm: Genetic Toxicology ofEnvironmen-tal Chemicals, Part B: Genetic Effects and Applied Mutagen-esis (Liss, New York), pp. 469-476.

11. Antonarakis, S. E., Phillips, J. A., III, Mallonee, R. L., Ka-zazian, H. H., Jr., Fearon, E. R., Waber, P. G., Kronenberg,H. M., Ullrich, A. & Meyers, D. A. (1983) Proc. Natl. Acad.Sci. USA 80, 6615-6619.

12. Wong, L. F., Lange, K., Petersen, G. M., Jing, J. S. & Rotter,J. I. (1986) Genet. Epidemiol. 3, Suppl. 1, 185-190.

13. Leppert, M., O'Connell, P., Nakamura, Y., Lathrop, G. M.,Maslen, C., Litt, M., Cartwright, P., Lalouel, J. M. & White,R. (1987) Cytogenet. Cell Genet. 46, 648.

14. Boehnke, M., Arnheim, N., Li, H. & Collins, F. C. (1989) Am.J. Hum. Genet. 45, 21-32.

15. Goradia, T. M. & Lange, K. (1990) Ann. Hum. Genet., in press.16. Weitkamp, L. R., van Rood, J. J., Thorsby, E., Bias, W.,

Fotino, M., Lawler, S. D., Dausset, J., Mayr, W. R., Bodmer,J., Ward, F. E., Seignalet, J., Payne, R., Kissmeyer-Nielsen,F., Gatti, R. A., Sachs, J. A. & Lamm, L. U. (1973) Hum.Hered. 23, 197-205.

17. Elston, R. C., Lange, K. & Namboodiri, K. K. (1976) Am. J.Hum. Genet. 28, 69-76.

18. Lange, K., Page, B. & Elston, R. C. (1975) Am. J. Hum.Genet. 27, 410-418.

19. Bishop, D. T., Falk, C. T. & MacCluer, J. W., eds. (1986)Genet. Epidemiol. 3, Suppl. 1-406.

Genetics: Cui et al.

Dow

nloa

ded

by g

uest

on

Apr

il 3,

202

0