alternative parameterization of polychotomous models: theory and application to matched case-control...

3
STATISTICS IN MEDICINE, VOL. 11,557-560 (1992) LETTERS TO THE EDITOR ALTERNATIVE PARAMETERIZATION OF POLYCHOTOMOUS MODELS: THEORY AND APPLICATION TO MATCHED by H. Becher, Statistics in Medicine, 10, 375-382 (1991) CASE-CONTROL STUDIES From: Andrew L. Baughman Division of Immunization Center for Prevention Services 1600 Clifton Road N.E. Mailstop E-OS Atlanta, GA 30333, U.S.A. Becher provides an interesting and practical parameterization of polychotomous logistic regression models for matched case-control studies with multiple case or control groups. This alternative parameterization requires augmenting one's original data set so that an equivalent conditional likelihood can be obtained and maximized using standard software for conditional logistic regression. For example, for case-control studies with m controls matched to each case, the alternative parameterization casts the matched (m + 1)-tuplets into a study off matched (m + I)!-tuplets, where each tuplet represents a permutation of the (m + 1) vectors of observed covariates. Table I in Becher shows the augmented data structure containing the (2 + l)! = 6 permutations for a case-control study with two controls matched to each case. I think Dr. Becher's paper illustrates nicely how the alternative parameterization works, but I would like to make three points: 1. The parameterization has been described and illustrated previously by Risch and Tibshirani.' As in Becher, for a study with two distinct control groups matched to a single case group, Risch and Tibshirani' describe the equivalent conditional likelihood in terms of differences between the permut- ations of covariate vectors for the controls and that for the actual covariate vectors observed for the controls. In the augmented data set for the study of matched (m + l)!-tuplets, the case has two covariate vectors of 0s and each of the (m + I)! - 1 controls has two covariate vectors that are the differencesbetween their permuted covariate vectors and those actually observed for the two controls. For this formulation, the augmented data structure for the first set (triplet) in Table I of Becher would look like that below. Case/control Set status Covariates Aside from a change in sign of the estimated regression coefficients, the Becher and the Risch and Tibshirani formulations of the equivalent conditional likelihood will give identical results. 2. The alternative parameterization can also be used to implement the 'pairwise' approach to estimating and testing multiple risk estimates from the polychotomous model, and the parameterization for both approaches handles missing controls.' - 3. A third method that considers the full polytomous conditional likelihood function can be used to fit the polychotomous model.4 This method handles incomplete matched sets, including those missing the case but not the controls. Levin has provided an APL program that implements the a l g ~ r i t h m . ~

Upload: andrew-l-baughman

Post on 06-Jul-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Alternative parameterization of polychotomous models: Theory and application to matched case-control studies

STATISTICS IN MEDICINE, VOL. 11,557-560 (1992)

LETTERS TO THE EDITOR

ALTERNATIVE PARAMETERIZATION OF POLYCHOTOMOUS MODELS: THEORY AND APPLICATION TO MATCHED

by H. Becher, Statistics in Medicine, 10, 375-382 (1991) CASE-CONTROL STUDIES

From: Andrew L. Baughman Division of Immunization Center for Prevention Services 1600 Clifton Road N . E . Mailstop E-OS Atlanta, G A 30333, U.S.A.

Becher provides an interesting and practical parameterization of polychotomous logistic regression models for matched case-control studies with multiple case or control groups. This alternative parameterization requires augmenting one's original data set so that an equivalent conditional likelihood can be obtained and maximized using standard software for conditional logistic regression. For example, for case-control studies with m controls matched to each case, the alternative parameterization casts the matched ( m + 1)-tuplets into a study off matched (m + I)!-tuplets, where each tuplet represents a permutation of the (m + 1) vectors of observed covariates. Table I in Becher shows the augmented data structure containing the (2 + l)! = 6 permutations for a case-control study with two controls matched to each case.

I think Dr. Becher's paper illustrates nicely how the alternative parameterization works, but I would like to make three points:

1. The parameterization has been described and illustrated previously by Risch and Tibshirani.' As in Becher, for a study with two distinct control groups matched to a single case group, Risch and Tibshirani' describe the equivalent conditional likelihood in terms of differences between the permut- ations of covariate vectors for the controls and that for the actual covariate vectors observed for the controls. In the augmented data set for the study of matched (m + l)!-tuplets, the case has two covariate vectors of 0s and each of the (m + I)! - 1 controls has two covariate vectors that are the differences between their permuted covariate vectors and those actually observed for the two controls. For this formulation, the augmented data structure for the first set (triplet) in Table I of Becher would look like that below.

Case/control Set status Covariates

Aside from a change in sign of the estimated regression coefficients, the Becher and the Risch and Tibshirani formulations of the equivalent conditional likelihood will give identical results.

2. The alternative parameterization can also be used to implement the 'pairwise' approach to estimating and testing multiple risk estimates from the polychotomous model, and the parameterization for both approaches handles missing controls.' -

3. A third method that considers the full polytomous conditional likelihood function can be used to fit the polychotomous model.4 This method handles incomplete matched sets, including those missing the case but not the controls. Levin has provided an APL program that implements the a lg~r i thm.~

Page 2: Alternative parameterization of polychotomous models: Theory and application to matched case-control studies

558 LETTERS TO THE EDITOR

1.

2.

3.

4.

5.

REFERENCES

Risch, H. A. and Tibshirani, R. J. Re: ‘Polychotomous logistic regression methods for matched case- control studies with multiple case or control groups’ (Letter), American Journal of Epidemiology, 128, 446448 (1988). Liang, K. Y. and Stewart, W. F. ‘Polychotomous logistic regression methods for matched case-control studies with multiple case or control groups’, American Journal of Epidemiology, 125, 720-730 (1987). Liang, K. Y. and Stewart, W. F. Re: ‘Polychotomous logistic regression methods for matched case-control studies with multiple case or control groups’ (Reply), American Journal of Epidemiology, 128, 448-449 (1988). Levin, B. Re: ‘Polychotomous logistic regression methods for matched case-control studies with multiple case or control groups’ (Letter), American Journal of Epidemiology, 128, 445-446 (1988). Levin, B. ‘Conditional likelihood analysis in stratum-matched retrospective studies with polytomous disease states’, Communication in Statistics, B16, 699-718 (1987).

AUTHORS’ REPLY I thank A. L. Baughman for making some thoughtful comments to my paper and for pointing out that in a letter by Risch and Tibshirani, which missed my attention, a similar method was outlined. In addition, I would like to mention two other applications for which a data augmentation similar to that of Table I in my paper is useful.

Suppose in a matched case-control study one control ( Y = 3) is individually matched to two cases, say, disease A (Y = 1) and disease B (Y = 2). Suppose that there are two independent risk factors XI and X2 with RR(Xl I Y = 1) = RR(Xl I Y = 2) and RR(X2 1 Y = 1) # RR(Xl I Y = 2). Fitting a (bivariate) logistic model as for a 2: 1 matched design is incorrect as only one regression coefficient for factor X2 is estimated which is neither an estimate for log RR(X2 I Y = 1) nor for log RR(Xl I Y = 2).

Fitting the full polychotomous model gives four regression coefficients (one for each covariable/disease combination). This apbroach is correct but not efficient. In order to take the above assumption into account one can fit a model to the data set augmented as follows:

Original data set Case/Control

Augmented data set

Set Status ( Y ) Covariates Set Status ( Y ) Covariates

1 1 XI117 x121 I 0 X I 1 1, X I 2 1 9 X l l Z I x122

1 3 X 1 1 3 r x 1 2 3 1 0 X I 129 X I 2 2 3 XI11 9 X I 2 1 1 2 X I 1 2 7 x122 I 0 X l l l r X 1 2 1 ~ X l 1 3 ~ X 1 2 3

1 0 x1129 x122, x113, x123

1 0 Xl13~X123rXlll~X121

1 1 x113, x123, x112, x122

Using the notation of the original paper, zi6 is the ‘case’ and the corresponding logistic model is equi- valent to equation (6) in my paper, however, the second component of the parameter vector /I3 is set to zero, p3 =(/I31, 0). Then we get - pZ1, - pZ1 +,!?31 and - /?22 as estimates for logRR(X,IY = l), log RR(X, I Y = 2) and log RR(X2), respectively.

Another application is concerned with adjustment for bias in a subset of risk factors in one control group. For example, assume that a case is matched individually to two different controls (one hospital control and one population control). If the distribution of a subset of the risk factors, say, occupational exposure, does not represent the underlying population in one control group, the resulting estimate would be biased if a standard conditional logistic regression model (1 : 2 matching) is performed. It is described in detail in another paper’ how a polychotomous logistic model may be applied in order to adjust for a bias. The data augmentation is also helpful in this case.

H. BECHER Institute of Epidemiology and Biometry

German Cancer Research Centre Heidelberg, Germany

Page 3: Alternative parameterization of polychotomous models: Theory and application to matched case-control studies

LETTERS TO THE EDITOR 559

REFERENCE 1. Becher, H. and Jockel, K.-H. ‘Bias adjustment with polychotomous logistic regression in matched case-

control studies with two control groups’, Biometrical Journal, 32, 801-816 (1990).

ESTIMATING STANDARDIZED PARAMETERS FROM GENERALIZED LINEAR MODELS

by S. Greenland, Statistics in Medicine, 10, 1069-1074 (1991)

From: J . A. Nelder, Department of Mathematics Imperial College, London, SW7 ZBZ, U .K .

A recent paper by Greenland deals with the estimation of standardized rates when the data have been analysed using generalized linear models (GLMs). In a paper in Biometrics,’ Lane and I gave a general framework for prediction following model-fitting with GLMs. Prediction was used in the sense of providing derived quantities to answer what-if questions of the form ‘what would the total mortality be if the age distribution in this city was the same as in the country as a whole? We showed that many kinds of standardization, the analysis of covariance, and questions answered by calibration could be regarded as special cases of the general process of prediction. The method given by Greenland for computing smoothed standardized rates appears to be identical to our marginal prediction. Users of these techniques may be interested to know that our prediction methods have been implemented in Genstat under the PREDICT directive. So far as I know they are not available in any other package.

REFERENCE 1. Lane, P. W. and Nelder, J. A. ‘Analysis of covariance and standardization as instances of prediction’,

Biometrics, 38, 613-621 (1982).

AUTHOR’S REPLY

I must apologize to Lane and Nelder for overlooking their lucid exposition.’ The model-smoothed point estimates in my paper’ are indeed identical to Lane and Nelder’s marginal predictions. My paper adds some general variance and covariance formulae to accompany these marginal predictions, and some Mantel-Haenszel-based estimates of standardized rates and rate differences. I would now add the following comment.

The objective of model-based smoothing is to improve accuracy of estimation. For complex models with more than a few covariates, further accuracy gains are possible by adding a second stage to the model, as in empirical-Bayes models. Here, the original (first-stage) regression parameters are themselves regressed on prior covariates in a second-stage generalized linear model for the prior expectations of the first-stage parameters. The resulting empirical-Bayes estimates of the first-stage parameters can replace the ordinary maximum-likelihood estimates in either covariate-specific (conditional) or standardized (marginal) estima- tion. The empirical-Bayes literature is vast (see Maritz and Lwin3); some recent expositions with epidemi- ologic-regression examples are Louis4 and Greenland.’ It would be valuable to have empirical-Bayes methods available in an accessible computer package.

SANDER GREENLAND Department of Epidemiology

UCLA School of Public Health, Los Angeles, C A 90024-1772, U.S.A.

REFERENCES 1. Lane, P. W. and Nelder, J. A. ‘Analysis of covariance and standardization as instances of prediction’,

2. Greenland, S. ‘Estimating standardized parameters from generalized linear models’, Statistics in Medi-

3. Maritz, J. S. and Lwin, T. Empirical Bayes Methods, (2nd edn), Chapman and Hall, London, 1989.

Biometrics, 38, 613-621 (1982).

cine, 10, 1069-1074 (1991).