logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost)...

6
Proc. Natl. Acad. Sci. USA Vol. 93, pp. 3471-3476, April 1996 Medical Sciences Logarithm of odds (lods) for linkage in complex inheritance (polygenic linkage) N. E. MORTON Human Genetics, Princess Anne Hospital, Coxford Road, Southampton S016 5YA, United Kingdom Contributed by N. E. Morton, December 4, 1995 ABSTRACT Lod scores provide a method to unify linkage tests based on identity by descent and identity in marker state while permitting selection of the most informative individuals through their disease-related phenotypes and markers in relatives. After parametric lods are reviewed, a nonparametric approach that depends on a single logistic parameter f3 is introduced. Lods for parents tested or unknown are derived, multiple pairwise mapping is presented, and power is shown to be good even for moderately small values of p. Comparison of parametric and nonparametric approaches (yet to be made) will provide for polygenes the efficiency and reliability that lod scores gave to mapping of major loci 40 years ago. Logarithm of odds (lod) scores are the standard for linkage mapping of major genes. "Clearly this method has several advantages, among them its reliability in small as well as large samples, its dependence solely on elementary laws of proba- bility, and the ease with which all kinds of families and pedigrees may be combined" (1). It is conservative when the sample size *is fixed (2), but it minimizes the number of observations required for significance in the sequential test (3). In large sample theory it gives X2, it provides a Bayesian solution when that is appropriate, and the expected value (ELOD) measures information about linkage. The method proposed here gives greatest weight to the most informative data and is a function of a single parameter, in contrast to other methods that use two parameters and weight all observations equally. I shall now extend lods to complex inheritance with no restriction on whether the trait is quantitative or an ordered polychotomy, whether marker testing is restricted to informa- tive phenotypes, or whether parents or other relatives are examined. Parametric Tests A general statement of the theory can be made in terms of two vectors, 'i and Mi, and a matrix Si, where Ti = disease-related phenotypes in ith set of individuals (pairs of relatives, nuclear family, or pedigree), Mi = marker genotypes in i and Si = pedigree structure of i. The likelihood is a function of a scalar 0, the recombination fraction, and a vector fl of genetic parameters. In particular, flo denotes the parameters or their estimates under the null hypothesis Ho (0 = 00 = 0.5) and fil denotes the parameters or their estimates under the alternative hypothesis H1 (0 = 01), in each case allowing for Ai, the ascertainment measure of i. Under incomplete ascertainment this measure is different for f0o and fli. For sex-specific mapping the recombination in males and females (Om, Of) must be distinguished, but this The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 3471 degrades the small sample properties of lods (2, 3) and is generally avoided for linkage detection. Therefore the lod for the ith set is P(ti, Mi Ai, Si; 01, fl1) zi = logP(i, Mi Ai, Si, 0.5, f0o) [1] where P denotes probability. Gene frequencies, dominance, and penetrance (or displacement) for each designated locus that contributes to ti are subsumed by fl. Under the mixed model the genes of small effect are pooled as heritability (4-6). Most linkage studies substitute fl0 for fh (which loses power) or conversely (which inflates the type 1 error). An alternative to Eq. 1 estimates the parameters fl from phenotypes alone under a one-locus or two-locus disease model, ignoring markers, and then incorporates markers under the admixture model of Smith (7), where a is the proportion of families with recombination 01, the remainder having 00. The ascertainment measure is the same for Ho and H1 and may be neglected once fl has been estimated, giving (P(Oi, Mi Si; 01, a, f) Zi = log~ P(i, Mi Si; 0.5, 0, h) ' [2] The hypothesis 01 = 0 is especially interesting, but in general both a and 01 must be estimated, which can easily be done with existing programs by interpolating in the bivariate table. The small sample properties of lods are lost, but significance may be tested as X = (2 In 10) Yzi (8) or by the method of Faraway (9). MacLean et al. (10) proposed an admixture test to increase power under model misspecification. Whether or not this is useful to detect linkage when the number of informative individuals per family is small, it gives a test of heterogeneity. However the parametric test for linkage in complex inher- itance is formulated, a single-locus model without admixture is far from the truth, whether specified a priori or estimated from the data. A two-locus (or mixed) model may be adequate (11). The implementation by Morton et al. (12) for combined segregation and linkage analysis of a quantitative trait or ordered polychotomy includes gametic disequilibrium as cou- pling frequencies but is restricted to nuclear families with pointers. For lack of statistical comparisons there is no con- sensus about parametric models to detect linkage in complex inheritance or whether constraints on the parameters 0, f are advantageous. Nonparametric Tests In genetic epidemiology the term "nonparametric" has ac- quired a specialized meaning, that genetic parameters (0 and fl) are not specified but parameters with no precise biological interpretation (which may be incompletely specified functions of 0 and f) are admitted. This approach is commonly used to simplify complex inheritance (13). Advocates of parametric Abbreviations: ML, maximum likelihood; lod, logarithm of odds; ELOD, expected value of a lod. Downloaded by guest on February 8, 2020

Upload: others

Post on 23-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

Proc. Natl. Acad. Sci. USAVol. 93, pp. 3471-3476, April 1996Medical Sciences

Logarithm of odds (lods) for linkage in complex inheritance(polygenic linkage)

N. E. MORTONHuman Genetics, Princess Anne Hospital, Coxford Road, Southampton S016 5YA, United Kingdom

Contributed by N. E. Morton, December 4, 1995

ABSTRACT Lod scores provide a method to unify linkagetests based on identity by descent and identity in marker statewhile permitting selection of the most informative individualsthrough their disease-related phenotypes and markers inrelatives. After parametric lods are reviewed, a nonparametricapproach that depends on a single logistic parameter f3 isintroduced. Lods for parents tested or unknown are derived,multiple pairwise mapping is presented, and power is shownto be good even for moderately small values of p. Comparisonof parametric and nonparametric approaches (yet to be made)will provide for polygenes the efficiency and reliability that lodscores gave to mapping of major loci 40 years ago.

Logarithm of odds (lod) scores are the standard for linkagemapping of major genes. "Clearly this method has severaladvantages, among them its reliability in small as well as largesamples, its dependence solely on elementary laws of proba-bility, and the ease with which all kinds of families andpedigrees may be combined" (1). It is conservative when thesample size *is fixed (2), but it minimizes the number ofobservations required for significance in the sequential test(3). In large sample theory it gives X2, it provides a Bayesiansolution when that is appropriate, and the expected value(ELOD) measures information about linkage. The methodproposed here gives greatest weight to the most informativedata and is a function of a single parameter, in contrast to othermethods that use two parameters and weight all observationsequally. I shall now extend lods to complex inheritance with norestriction on whether the trait is quantitative or an orderedpolychotomy, whether marker testing is restricted to informa-tive phenotypes, or whether parents or other relatives areexamined.

Parametric Tests

A general statement of the theory can be made in terms of twovectors, 'i and Mi, and a matrix Si, where

Ti = disease-related phenotypes in ith set of individuals (pairsof relatives, nuclear family, or pedigree),

Mi = marker genotypes in iandSi = pedigree structure of i.

The likelihood is a function of a scalar 0, the recombinationfraction, and a vector fl of genetic parameters. In particular,flo denotes the parameters or their estimates under the nullhypothesis Ho (0 = 00 = 0.5) and fil denotes the parametersor their estimates under the alternative hypothesis H1 (0 = 01),in each case allowing for Ai, the ascertainment measure of i.Under incomplete ascertainment this measure is different forf0o and fli. For sex-specific mapping the recombination inmales and females (Om, Of) must be distinguished, but this

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

3471

degrades the small sample properties of lods (2, 3) and isgenerally avoided for linkage detection. Therefore the lod forthe ith set is

P(ti, Mi Ai, Si; 01, fl1)zi = logP(i, Mi Ai, Si, 0.5, f0o) [1]

where P denotes probability. Gene frequencies, dominance,and penetrance (or displacement) for each designated locusthat contributes to ti are subsumed by fl. Under the mixedmodel the genes of small effect are pooled as heritability (4-6).Most linkage studies substitute fl0 for fh (which loses power)or conversely (which inflates the type 1 error).An alternative to Eq. 1 estimates the parameters fl from

phenotypes alone under a one-locus or two-locus diseasemodel, ignoring markers, and then incorporates markers underthe admixture model of Smith (7), where a is the proportionof families with recombination 01, the remainder having 00.The ascertainment measure is the same for Ho and H1 and maybe neglected once fl has been estimated, giving

(P(Oi, Mi Si; 01, a, f)Zi = log~ P(i, Mi Si; 0.5, 0, h) ' [2]

The hypothesis 01 = 0 is especially interesting, but in generalboth a and 01 must be estimated, which can easily be done withexisting programs by interpolating in the bivariate table. Thesmall sample properties of lods are lost, but significance may betested as X = (2 In 10) Yzi (8) or by the method of Faraway (9).MacLean et al. (10) proposed an admixture test to increase powerunder model misspecification. Whether or not this is useful todetect linkage when the number of informative individuals perfamily is small, it gives a test of heterogeneity.However the parametric test for linkage in complex inher-

itance is formulated, a single-locus model without admixtureis far from the truth, whether specified a priori or estimatedfrom the data. A two-locus (or mixed) model may be adequate(11). The implementation by Morton et al. (12) for combinedsegregation and linkage analysis of a quantitative trait orordered polychotomy includes gametic disequilibrium as cou-pling frequencies but is restricted to nuclear families withpointers. For lack of statistical comparisons there is no con-sensus about parametric models to detect linkage in complexinheritance or whether constraints on the parameters 0, f areadvantageous.

Nonparametric Tests

In genetic epidemiology the term "nonparametric" has ac-quired a specialized meaning, that genetic parameters (0 andfl) are not specified but parameters with no precise biologicalinterpretation (which may be incompletely specified functionsof 0 and f) are admitted. This approach is commonly used tosimplify complex inheritance (13). Advocates of parametric

Abbreviations: ML, maximum likelihood; lod, logarithm of odds;ELOD, expected value of a lod.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0

Page 2: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

Proc. Natl. Acad. Sci. USA 93 (1996)

analysis argue that it must be optimal under the correct modeland may be nearly optimal under an approximation. Advocatesof regressive models assume that they are adequate approxi-mations of genetic models. Advocates of nonparametric anal-ysis insist that the parametric approximation is poor and givesmisleading tests of hypotheses. Only statistical trials on realdata can resolve this controversy, although simulation can beheuristic.

Since genetic parameters are not specified and simplicity isthe stated goal, we assume that the ith set consists of aprobandwith phenotype uli and marker genotype M1i, a case withphenotype i2i and marker genotype M2/, and other informa-tion Ri about markers and pedigree structure which togetherwith Mli permit calculation of the probability ofM2/, given that1 and 2 have k marker alleles identical by descent (k = 0, 1, 2).I shall not consider here the extension to multiple markers. LetCk be the probability of k under the null hypothesis of no

linkage between phenotypes and markers (14). Under thealternative hypothesis of an effect on the logistic scale thecorresponding probability is assumed to be rk = ckek11f/Ykck,eke1,where 3 > 0 is a constant measuring the effect of linkage andf is a function of 11 and '2. Then the corresponding lod is

, P(M2i Mi, Ri; f, P1)\Zi -'g[p(M-P Mli, Ri;f,J'[)

Since the probability is conditional on phenotypes, ascertain-ment may be neglected so long as it is through the trait and notthe marker of the case, although selection through markers ofthe proband and other relatives is permitted. Only the mostinformative sets need be typed for the marker if the sample isso large that relatively uninformative sets may be neglected(15). Under H0o the denominator simplifies to P(M2 Ml, Ri) =2k ckP(M2 k, Ml, Ri). Under H1 the numerator becomesE CkP(M2 k, M1, Ri). A general solution requires a computerprogram to calculate P(M2 k, Ml, Ri). Here I give some simpleexamples.

Pairs of Sibs

In the absence of inbreeding, sibs have co = 1/4, cl = 1/2, c2= 1/4, and therefore E cketkf = (1 + epf)2/4. Probabilities whenparental genotypes are known are given in Table 1 under Hoas functions of the ck only. Each combination of mating andproband satisfies the relation

,P(M2 M, R) = Co + cl +C2= 1.M2

Table 1. Pairs of sibs, parental markers known

Proband CaseMating type Parents M1 M2 P(M2IMi, R)

Intercross,3 or 4 alleles ab, cd ac ac c2

ad cl/2bd cobc ci/2

Backcross,2 or 3 alleles aa, bc ab ab C2 + ci/2

ab ac co + Cl/2Intercross,

2 alleles ab, ab aa aa c2ab Clbb co

ab aa cl/2ab co + C2bb ci/2

From these probabilities the corresponding lod may easily bederived. For example, the lod for an ab case with ab parentsand probands is

co +e2fc22Z log (co + clef + c2e2f)(Co + c2)

2(1 + e2)O= log + 22ef+ e2f

For small c this lod is virtually zero, showing that ab, ab sibsfrom a diallelic intercross are (almost) uninformative.Gene frequencies and an assumption about population

structure must be introduced if the marker genotype is un-known for either parent. For this illustration we assumecodominance, random mating, known marker allele frequen-cies pj, and no informative other relatives (Ri = 0). Table 2gives probabilities derived from the three conditional trans-mission matrices (16). Each proband type satisfies the relationEMZP(M2M1I) = 1. Each term is a quadratic function offrequencies of genes observed in the case and the complementof the pooled frequencies of genes in the proband. ThereforeP(M21M1, R) is not in general the same as P(MlIM2, R). It isunnecessary to specify frequencies of alleles not observed inthe proband, since the complement of alleles observed in thecase cancels in the lod.

Finally, we consider one typed parent under the sameassumptions of codominance, random mating, known genefrequencies pj, and no other informative relatives (Table 3).Each combination of proband and parent satisfies the relation2M2P(M2jMl, R) = 1, and each term is a linear function offrequencies of genes observed in the parent and proband andthe complement of these frequencies.

Choice off

So far we have assumed that f is an unspecified function of qland 412, neglecting phenotypes of other relatives for lack of agenetic model which alone would give a relation with crossing-over. Under this constraint it is reasonable to take f = X1X2,where Xi and X2 are deviations of proband and case from thepopulation mean. Readers who find this formulation unac-ceptably arbitrary should use parametric tests, because no knownprinciple can give nonparametric tests stronger credentials.

Since X is positive if above the population mean and negativeotherwise, f increases for phenotypically similar pairs. Rischand Zhang (15) gave examples where extreme discordance ismore informative than extreme concordance, which is difficultto incorporate in a nonparametric model. Tentatively I pro-pose that Xl, X2 be taken from a standardized normal distri-bution when the phenotype is quantitative but that a differentscore be used for an ordered polychotomy, such as the decilesfavored by Risch and Zhang. In that case suppose that anindividual falls in classj with frequency Fj-Fj_ 1, where Fj is thecumulative frequency to its upper threshold. Construct for

Table 2. Pairs of sibs, parental markers unknown

Proband CaseM1 M2 P(M2[M1)aa aa p2Co + paCl + C2

aA 2pa(1 Pa)CO + (1 Pa)CIAA (1 - pa)2CO

ab aa pa2o + paCi/2ab 2papb Co + (Pa + Pb)Cl/2 + C2aA 2pa(1 Pa -pb)CO + (1 - Pa Pb)C1/2AA (1 - Pa - Pb)2Cobb p2co + pbci/2bA 2pb(l - pa - pb)Co + (1 - pa - pb)C1/2

A stands for alleles absent from the proband.

3472 MdclSine:Mro

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0

Page 3: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

Proc. Natl. Acad. Sci. USA 93 (1996) 3473

Table 3. Sib pairs with one parent typedProband Parent Case P(M2\MI, R)

aa aa aa pa + (1 - pa) (C2 + ci/2)aA (1 -pa)(co + cl/2)

aa ab aa pacl/2 + c2ab paCo + (1 + pb)cl/2aA (1 pa pb)CI/2bb PbCObA (1 -pa pb)CO

ab aa aa pa(co + cl/2)ab pb + (1 - pb)(c2 + cl/2)aA (1 Pa - pb)(CO + cl/2)

ab ab aa pa[paco + (1 + pb)cl/2)/(pa + Pb)ab C2 + [2PaPbCO + (pa + p2)Cl/2]/(pa + Pb)bb Pb[PbCO + (1 + pa)Cl/2]/(pa + Pb)aA (1 pa Pb)(paC + pbCI/2)/(pa + Pb)bA (1 - pa Pb)(pbCO + paCI/2)/(pa + Pb)

ab ac aa pacl/2ab pbhl/2 + C2ac PaCo + pcC1/2bc pbco + ci/2aA (1 Pa Pb pc)c/2cA (1 -pa pb pc)CoCC pcCo

A stands for alleles absent from the proband and parent.

each individual normal deviates X corresponding to Fj, Fj_l,and (Fj + Fj-1)/2, omitting indeterminate values in the firstand last class. Take the smallest value of X\X2, unless bothproband and case both fall in the class that includes themedian, then take the smallest absolute value (signed). Thisrule assigns greatest weight to extremely discordant pairs. Forcomparability to affected pairs (below), these values of X1X2may be standardized by division with the mean value foraffected pairs so that ±1 is nearly an extremum.

Pairs of Affected Relatives

Suppose that the sample is restricted to pairs of affectedrelatives not classified by severity. Then without loss of gen-erality we may takef = 1. Under Ho imagine a population thatdiffers from the actual one only in having no linkage to themarker. For a given relationship r and marker locusM the ratioof recurrence risks in the real and idealized population is Ar =kckek3. In particular the relative risk for sibs is As = (1 + 2Ao+ Am)/4, where Ao is the relative risk in children or parents andAm is the relative risk in monozygous twins. Since As = [(1 +

0.6

0.0)

0.

e3)/2]2, our model assumes that each allele acts independently(multiplicatively) on relative risk. A parent heterozygous for alinked marker transmits the same allele to a pair of affectedchildren with probability eP/(1 + e0) and different alleles withprobability 1/(1 + et). Therefore the nonparametric linkagetest is also a transmission test and can be generalized byassigning O3m to the male parent and o8f to the female parent,analogous to Om, Of in parametric linkage. I shall not make thisextension, believing parametric tests to be appropriate oncelinkage is detected.

In terms of relative risks, the identity coefficients specifyingk = 0, 1, 2 marker alleles identical by descent (ibd) for sibs are

o0 = 1/4As, ~1= Ao/2As, and C2 = Am/4As, taking values co =

1/4, cl = 1/2, c2 = 1/4 if /3 = 0. These values fall into the"possible triangle" of Holmans (17), since lj has a maximumof 1/2 at 3 = 0 and declines monotonically to zero, whereasC2 has a minimum of 1/4 at 3 = 0 and increases monotonicallyto 1. These are different from expectations under additivepenetrance, a mathematically convenient but biologically im-plausible model if genes act on a liability or logistic scale (18,19). However, for realistically small values of ,3 there is littledifference (Fig. 1). As 13 approaches zero, the followingapproximations hold:

As 1 + /3,Ao 1 + 3, Am 1 +23

and therefore Am ~ 2As - 1, i-penetrance.

1/2 as for additive

Multiple Markers

Let Zi be the maximal lod for the ith marker at location wicentimorgans (cM) with coefficient 3i, and pmax be the valueof 13 at which 0 = 0. Then (3i//3max)2 o Zmax X (0 0.5)2, whereZmax is the lod at 3max. But Pi/i3max = 0 corresponds to 0 = 0.5,while 3i/[3max = 1 corresponds to 0 = 0. It follows that themaximum likelihood (ML) estimate of Oi is (1 - [3i/P3max)/2,where 3max may be found by quadratic interpolation on the 13i,wi. Evidence of linkage is tested by E Zi(E), where Z(E) is thevalue of Zi corresponding to the expected value of Oi. Althoughthis procedure maps the susceptibility locus to the point where13 = 3max, caution should be used in testing significance. Thereis general concern that multiple pairwise analysis may exag-gerate significance, but this can be avoided by neglectingmarkers other than the flanking pair. Linkage maps created bymultiple pairwise analysis seldom give x2 appreciably greaterthan the nominal degrees of freedom. Considering the depen-dence of sib pairs, errors in gene frequency estimation, igno-rance of the exact mode of inheritance, and other sources of

.S ~z

.4--------- ---------------

.2

O0 1 .l 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

OfFIG. 1. Coefficients of identity for sibs.

Medical Sciences: Morton

0.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0

Page 4: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

Proc. Natl. Acad. Sci. USA 93 (1996)

error, concern about pairwise dependency of loci may beexaggerated, but this must be tested, using the fact that anymultipoint linkage probability expressed in terms of ~k can betransformed in terms of P3.

Extension to More Complex Cases

The joint effects of recombination, relationship, and epistasiscannot be precisely reflected in nonparametric analysis. Eq. 3gives a general solution for a single-marker locus that includesother relatives and/or inbreeding, retaining the "ease withwhich all kinds of families and pedigrees may be combined"that is one of the main advantages of lods. In the absence ofinformation on the rest of the pedigree, pairs of relatives otherthan sibs may be scored by assigning appropriate values of Ckand Ck in Table 2.A more difficult problem is that pairs of relatives are not

independent, yet we cannot handle larger sets without a

parametric model (Eq. 1). However, if markers are determinedonly for extreme phenotypes and their parents as suggested byRisch and Zhang (15), there will be few cases per pedigree.Then dependence of related pairs may not be a seriousproblem (20). With some loss of information the dependencemay be reduced by confining analyses to pairs with a singleproband (15) or even to one pair per family. In any case,confirmation should be sought in other samples. Lods lendthemselves to such sequential analysis (1).

Sources of Error

The utility of lods for mapping major loci depends on the factthat tests significant at the conventional level (Z > 3) are rarelytype 1 errors (21). This indicates robustness (given reasonablecare) with respect to population structure and gene frequencyestimation. We may hope that this will be true for lods undercomplex inheritance, although an alternative test (AffectedPedigree Member) that applies an arbitrary weighting functionto mean similarity is exquisitely sensitive to misspecification ofmarker allele frequencies and weights (22). The null hypothesisdepends only on mendelian transmission of the marker. Mis-specification of the alternative hypothesis reduces power butdoes not affect the type 1 error. The posterior probability thata significant linkage is a type 1 error depends on the level ofsignificance, the prior probability of linkage, and the power todetect linkage if present (1). The last two factors are unspec-ified for complex inheritance, but perhaps the larger numberof contributory loci compensates for reduced power to detectany one of them.

Conduct of the Nonparametric Test

Iff lies in the interval + 1, a standard lod table for 0 < 3< 1should be adequate, analogous to a standard lod table. Valuesof (3 greater than 1 indicate a major gene for which parametricanalysis should be better. In any case, the ML estimate of (3 canbe obtained by conventional iterative methods for finding a

root. Let Z be the corresponding lod for all the data. Haldaneand Smith (2) gave the fixed sample size test

P(Z >A HO) < 10-A,which is true even in small samples under the assumptions ofthe test (random mating, no misclassification of the marker,known gene frequencies, and independent pairs). The corre-

sponding large-sample test is given by Xi = 2(ln 10)Z, which isless conservative and therefore less trustworthy in small samples.

In sequential analysis the data form a succession of samples(3). The test continues with no decision so long as B < Z < A.Linkage is declared significant when Z > A and is rejectedwhen Z < B. Conventional use of sequential analysis requires

preassignment of (3, which simplifies the calculation of ex-

pected sample sizes and power in theoretical work. For appli-cations these desirable properties are less important thanefficient detection of linkage by using the ML estimates of (3.The lod test remains valid and highly efficient (23). For majorloci B is conventionally taken as -2 and A as 3. The latter isat least as strong as a significance level of 0.001. However, theBayesian argument that supports these choices for major loci(1) is not convincing for polygenes, and arguments have beenmade in favor of a smaller value ofA to increase power at theexpense of reliability* or a larger value based on a model inwhich all significant tests are type 1 errors (24). Ignorance ofthe numbers and effects of contributing loci makes any de-parture from convention hard to justify. There has beenadvocacy of staged sampling, with A increasing with thenumber of samples and B not specified (25). This is sequentialanalysis stripped of its elegance and efficiency. Once linkageis judged significant, parametric and heterogeneity tests shouldbe performed. If they support the evidence an attempt to mapthe linked locus will be made.The estimate of j3 depends on gene frequencies, displace-

ment, dominance, recombination and sampling error. If thereis linkage the marker with the highest estimate of (3 is likely tobe near the contributory locus.

Operating Characteristics of the Test

If the maximum value off is 1, the lods from Table 1 for themost informative mating type ab x cd are

z(ac, ac) = log[4e2O/(l + e)2]

z(ac, ad) = log[4eg/(1 + e1)2]

z(ac, bd) = log[4/(1 + e13)2]

for sibs 2 ibd

for sibs 1 ibd

for sibs 0 ibd.

Under the null hypothesis the ELOD is

Eo(z) = Coz(ac, ac) + clz(ac, ad) + c2z(ac, bd).

Under the alternative hypothesis the ELOD is

E1(z) = 4oz(ac, ac) + 1lz(ac, ad) + C2z(ac, bd).

ELODS for other pairs can be expressed in terms of equivalentnumbers of sib pairs (1). Fig. 2 gives the average samplenumber for a sequential test under alternative hypothesesabout P3. Values of (3 as small as 0.25 can be detected with goodpower in a sample of a few hundred sib pairs, and averagesample size is an order of magnitude smaller for (3 = 1. Inpractice larger samples are required because not all pairs arefully informative and the efficiency of sequential analysis isdiminished for samples of fixed size, perhaps by a factor of 2(1). On the other hand, the conventional values ofA and B areconservative, and reasonably strong evidence can be obtainedin smaller samples.

Generalization of the f3 Model

The , model may be generalized by replacing e2p with e3+A,where A = 3 under the simpler model. More generally, A isconstrained to the "possible triangle" for which (3 - 0 and Cr- 1/2 (17, 18). The latter condition requires 2e c 1 + eg+A,or A - ln(2 - e-~). This A model admits dominance on theliability scale. For example, under random mating a raredominant gene with frequency q has a recurrence risk in sibsthat approaches [2q + 2(1/2) + 1]/4. Therefore (3 = ln(1/4q)and A = In 2, which is much less than its value under the (3

*Elston, R. C. (1992) Proceedings of the 6th International BiometricConference, December 7-11, Hamilton, New Zealand, pp. 39-57.

3474 MdclSine:Mro

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0

Page 5: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

Proc. Natl. Acad. Sci. USA 93 (1996) 3475

10(

400oo-300-

u 200-

o100 - 0.01

30-

20-

10 0.0010.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.5 2.0

PfFIG. 2. Operating characteristics of the , model. Average sample number in sequential tests and ELOD for sib pairs from fully informative

matings (A = 3, B = -2).

model, and v approaches (0, 0.5, 0.5), which is not compatible withthe A model. On the other hand, the recurrence risk for a rarerecessive under panmixia is (q2 + 2q + 1)/4, and so ,B = A =

ln(l/q), which conforms to the 3 model. It seems to us unlikelythat dominance on the liability scale is important for polygenes.

Discussion

The search for a method to detect linkage of major loci took25 years, beginning with Bernstein (26) and ending with lodscores (1). In the interval Weiner, Hogben, Haldane, Fisher,Finney, Penrose, and Smith contributed useful ideas (27), butthe efficiency and reliability of lods even in small samples andcomplex pedigrees made them the method of choice, by whichhundreds of disease loci have been localized in dense maps andtherefore many of them have been subsequently cloned andsequenced.

Polygenic linkage has proven to be a more difficult problem.The seminal paper of Penrose (28) considered categoricaltraits of unspecified inheritance determined in pairs of sibs ofunspecified parentage, classified as "like" or "unlike" for eachtrait. All possible pairs were made, treated as independent intesting significance by X2. Later Penrose (29) extended thisapproach to a quantitative trait. The next advance was byHaseman and Elston (30), who introduced parental informa-tion on a codominant marker sufficient to estimate markerkinship Sp for each sib pair and regressed the squared quanti-tative difference on the coefficient of relationship 2qp. This isinefficient, since among similar pairs it does not distinguishpairs near the mean from uncommon pairs.Other approaches of bewildering diversity were developed

as DNA polymorphisms provided a basis for mapping poly-genes. Dependence of sib pairs was explored, leading to theconclusion that full efficiency requires use of all pairs as firstshown by Fisher (31), but nominal significance should beconfirmed by simulation and/or a parametric test on families(20). Following the direction set by Carey and Williamson (32),Risch and Zhang (15) showed that the number of sib pairsrequired to detect linkage can be dramatically reduced bytyping phenotypically extreme pairs, since pairs near the

median are (almost) uninformative. This argues stronglyagainst the Hasemann and Elston method, which weights eachpair equally. In our model the absolute value off is a weightconsistent with the observations of Risch and Zhang (15).More generally, the likelihood that determines a lod implies aweight that is accurately reflected in the ELOD.

Risch (18) was the first to appreciate the superiority ofnonparametric lods over means and regression, but his methodwas restricted to affected pairs. He used probabilities Zk for kalleles identical by descent under the hypothesis of linkage, forwhich we substitute ~k to avoid confusion with lods. Since E2k= 1, there are two free parameters that have been constrainedin various ways. Lod theory is degraded if both are estimated,and not all combinations of the gk are biologically meaningful(for example, ; = 0.5, 0, 0.5). Therefore Holmans (17)introduced the "possible triangle method" that confines esti-mates to biological possibility, although there are still twoindependent (but constrained) parameters and so this modi-fication is only asymptotically valid. It is not applicable torelatives other than sibs, to a quantitative trait (except bytruncation), or to unaffected individuals, and therefore itcannot use the extreme discordant pairs that Risch and Zhang(15) showed to be most informative. Even for affected pairs itdoes not consider severity, which in our model is described byf. Affection is particularly uninformative for traits like atopy orasthma, which are common and inherently quantitative.

Limitations of earlier methods are removed when lods areexpressed in terms of conditional probabilities that are func-tions of a single parameter (3 that accounts for nearly all of thenoncentrality parameter of the possible triangle. Although thedefinition of f and utilization of multipoint markers andinformation from relatives may be improved, we are not farfrom a synthesis that will provide for polygenes the efficiencyand reliability that lod scores gave to mapping of major loci 40years ago.

1. Morton, N. E. (1955) Am. J. Hum. Genet. 7, 277-318.2. Haldane, J. B. S. & Smith, C. A. B. (1947)Ann. Eugen. 14, 10-31.3. Wald, A. (1947) Sequential Analysis (Wiley, New York).4. Morton, N. E. (1967) Am. J. Hum. Genet. 19, 23-34.

Muedical Sciences: Mvorton

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0

Page 6: Logarithm odds (lods) for linkage in complex · from a diallelic intercross are (almost) uninformative. Gene frequencies and an assumption about population structure must be introduced

3476 Medical Sciences: Morton

5. Morton, N. E. & MacLean, C. J. (1974) Am. J. Hum. Genet. 26,489-503.

6. Lalouel, J. M. & Morton, N. E. (1981) Hum. Hered. 31, 312-321.7. Smith, C. A. B. (1963) Ann. Hum. Genet. 27, 175-182.8. Risch, N. (1989) Genet. Epidemiol. 6, 473-480.9. Faraway, J. L. (1993) Genet. Epidemiol. 10, 75-83.

10. MacLean, C. J., Ploughman, L. M., Diehl, S. R. & Kendler, K. S.(1992) Am. J. Hum. Genet. 50, 1259-1266.

11. MacLean, C. J. & Morton, N. E. (1984) Comput. Biomed. Res. 17,471-480.

12. Morton, N. E., Shields, D. C. & Collins, A. (1991) Ann. Hum.Genet. 55, 301-314.

13. Bonney, G. E. (1986) Biometrics 42, 611-625.14. Cotterman, C. W. (1941) Sci. Monthly 53, 227-234.15. Risch, N. & Zhang, H. (1995) Science 268, 1584-1589.16. Li, C. C. & Sacks, L. (1954) Biometrics 10, 347-360.17. Holmans, P. (1993) Am. J. Hum. Genet. 52, 362-374.18. Risch, N. (1990) Am. J. Hum. Genet. 46, 242-253.19. Kruglyak, L. & Lander, E. S. (1995) Am. J. Hum. Genet. 57,

439-454.

Proc. Natl. Acad. Sci. USA 93 (1996)

20. Collins, A. & Morton, N. E. (1995) Hum. Hered. 45, 311-318.21. Rao, D. C., Keats, B. J. B., Morton, N. E., Yee, S. & Lew, R.

(1978) Am. J. Hum. Genet. 30, 516-529.22. Babron, M-C., Martinez, M., Bonaiti-Pellie, C. & Clerget-Dar-

poux, F. (1993) Genet. Epidem. 10, 389-394.23. Collins, A. & Morton, N. E. (1991) Ann. Hum. Genet. 55, 39-41.24. Lander, E. S. & Schork, M. J. (1994) Science 265, 2037-2048.25. Brown, D. L., Gorin, M. B. & Weeks, D. E. (1994) Am. J. Hum.

Genet. 54, 544-552.26. Bernstein, F. (1931) Z. Indukc Abstamm. Vererbunges. 57, 113-

138.27. Morton, N. E. (1995) Genetics 140, 7-12.28. Penrose, L. S. (1935) Ann. Eugen. 6, 133-138.29. Penrose, L. S. (1938) Ann. Eugen. 8, 233-237.30. Haseman, J. K. & Elston, R. C. (1972) Behav. Genet. 2, 3-19.31. Fisher, R. A. (1935) Ann. Eugen. 6, 187-201.32. Carey, G. & Williamson, J. (1991) Am. J. Hum. Genet. 49,

786-796.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 8,

202

0