interpretation of the strength of relations between variables in log-linear analysis*

14
Interpretation of the strength of relations between variables in log-linear analysis“ FRAN~OIS B~LAND Universite‘ Laval L’analyse des tableaux de contingence multidimensionelle ne se rCsume pas B 1’6tude des seuils de signification de plusieurs modsles. Le modkle choisi par l’analyse log-linCaire propose qu’un certain nombre d’associations et d’interactions sont significatives. Les coeffi- cients des iquations log-1inCaires sont cependant difficiles B interprbter. La normalisation des configurations d’associations et d’interactions d‘un modde log-linCaire permet d’ob- tenir des tableaux rCduits ou les cellules contiennent des statistiques interprktables comme des probabilites. Ces statistiques sont simples a lire et l’importance des relations dhoilCes par le modde s’interprete aisCment. The analysis of multi-dimensional contingency tables does not stop with significance test. But the estimates of the parameters of the log-linear equations are difficult to interpret. We propose herein that the standardization of the chosen model’s associations and interac- tions configurations be used to interpret the strength of relations between variables. This procedure gives simple criteria to weight the associations and interactions of the chosen model. Collapsed tables are readily available and the statistics given by the standardization procedure are interpretable as probabilities. Standardization of estimated frequencies can be a useful tool for understanding the structure of the significant associations and interactions in a log-linear model. They can be used either as statistics for measuring the strength of relationship between variables, or as a device for understanding more fully the meaning of the coefficients in a log-linear model. Log-linear analysis is becoming widely known through recent publications (Davis, 1974; Schuessler, 1978; Goodman, i972a, b, c, 1973, 1979; Haberman, 1978, 1979; for a different view see Gillespie, 1977). Examples of the use of this method, in particular for the analysis of social mobility tables, can be found in the current sociological literature (Baron, 1979; Hauser, 1978,1980; Featherman and Hauser, 1978). Nevertheless, even in the simplest and most straightforward use of log-linear models, problems exist in the interpretation of significant associations and interactions between variables (Swafford, 1980). The coefficients of the * I acknowledge with pleasure the support of the ASOPE research team from UniversitC Laval. I wish to thank an anonymous associate editor of the CRSA for his editorial help. This paper was received in January, 1981 and accepted July, 1981. Rev. canad. SOC. 8~ Anth. / Canad. Rev. SOC. Lk Anth. 20(2) 1983

Upload: francois-beland

Post on 29-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interpretation of the strength of relations between variables in log-linear analysis*

Interpretation of the strength of relations between variables in log-linear analysis“

F R A N ~ O I S B ~ L A N D Universite‘ Laval

L’analyse des tableaux de contingence multidimensionelle ne se rCsume pas B 1’6tude des seuils de signification de plusieurs modsles. Le modkle choisi par l’analyse log-linCaire propose qu’un certain nombre d’associations et d’interactions sont significatives. Les coeffi- cients des iquations log-1inCaires sont cependant difficiles B interprbter. La normalisation des configurations d’associations et d’interactions d‘un modde log-linCaire permet d’ob- tenir des tableaux rCduits ou les cellules contiennent des statistiques interprktables comme des probabilites. Ces statistiques sont simples a lire et l’importance des relations dhoilCes par le modde s’interprete aisCment.

The analysis of multi-dimensional contingency tables does not stop with significance test. But the estimates of the parameters of the log-linear equations are difficult to interpret. We propose herein that the standardization of the chosen model’s associations and interac- tions configurations be used to interpret the strength of relations between variables. This procedure gives simple criteria to weight the associations and interactions of the chosen model. Collapsed tables are readily available and the statistics given by the standardization procedure are interpretable as probabilities.

Standardization of estimated frequencies can be a useful tool for understanding the structure of the significant associations and interactions in a log-linear model. They can be used either as statistics for measuring the strength of relationship between variables, or as a device for understanding more fully the meaning of the coefficients in a log-linear model.

Log-linear analysis is becoming widely known through recent publications (Davis, 1974; Schuessler, 1978; Goodman, i972a, b, c, 1973, 1979; Haberman, 1978, 1979; for a different view see Gillespie, 1977). Examples of the use of this method, in particular for the analysis of social mobility tables, can be found in the current sociological literature (Baron, 1979; Hauser, 1978,1980; Featherman and Hauser, 1978). Nevertheless, even in the simplest and most straightforward use of log-linear models, problems exist in the interpretation of significant associations and interactions between variables (Swafford, 1980). The coefficients of the

* I acknowledge with pleasure the support of the ASOPE research team from UniversitC Laval. I wish to thank an anonymous associate editor of the CRSA for his editorial help.

This paper was received in January, 1981 and accepted July, 1981.

Rev. canad. SOC. 8~ Anth. / Canad. Rev. SOC. Lk Anth. 20(2) 1983

Page 2: Interpretation of the strength of relations between variables in log-linear analysis*

209 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

TABLE I

DISTRIBUTION OF RESPONSES ON THREE VARIABLES (L, H, R) FOR THREE RANDOM SAMPLES OF ELDERLY PEOPLE

S

L R H 1 2 3

1 1 1 104 96 97 1 1 2 42 88 55 1 2 1 24 13 37 1 2 2 18 29 27 2 1 1 62 54 41 2 1 2 50 57 34 2 2 1 5 15 17 2 2 2 27 28 20

NOTE: L = functional characteristics of the dwellings (1, functional; 2, non-functional); R = isolation in the home (1, living with others; 2, living alone); H = wish to stay home (1, wants to stay; 2, wants to move); s = samples (1 and 2, middle-sized cities; 3, neighborhood in a metropolitan area).

log-linear equation, functions of the cross-product ratios, are not easily interpre- ted, even in the bivariate case. We will propose herein the use of a concrete representation of the strength of the significant associations or interactions in a non-saturated log-linear model. Only the simplest case of multivariate contin- gency tables involving sufficient counts of five or more in each cell will be examined (see Table I). Used in different contexts by different authors (Bishop et al., 1975; Mosteller, 1968; Smith, 1976), standardization of table marginals can be easily extended to the multivariate case and to higher order interaction patterns found to be significant in a log-linear model. The standardization is made using the estimated frequencies obtained under a non-saturated model.

A N O T E A B O U T S T A N D A R D I Z A T I O N

Standardization of table marginals consists of fitting them to arbitrary sets of values. Defining data such that the marginals frequencies are equal, isolates the interaction structure of the table. A simple iterative algorithm can be used to obtain the standardization frequencies (Bishop et al. , 1975; Mosteller, 1968; Smith, 1979)’. When independence exists in the table, the values of those frequencies are equal. Differences between the values of the standardized fre- quencies can be interpreted as evidence of association between variables.

In the multivariate case, standardization of observed frequencies will retain the complex interactive structure of the table. The usefulness of this procedure, therefore, is limited. Fienberg has argued that ‘standardization is basically a descriptive technique that has been made obsolete’ by the introduction of log- linear analysis (1977: 5). Or is it?

Page 3: Interpretation of the strength of relations between variables in log-linear analysis*

210 FRANCOIS B ~ L A N D

A log-linear model is defined by a set of sufficient configurations (Bishop et al., 1975) interpretable as the set of sub-tables by which the observed frequencies can be reproduced with sufficient accuracy if the null hypothesis is not rejected by the G‘ criteria. When this is the case, it is tempting to analyse the set of sub-tables using a simple statistic, such as percentages. But only in some cases is an analysis of the set of sub-tables legitimate. This analysis is subject to what has been called Simpson’s paradox (Fienberg, 1977: 44-6; Simpson, 1951). Simply stated this is: in multivariate analysis, the association between two variables (say A and B) is always conditional on the other variables included in the analysis (say C, D, . . . N). Even if there is no interaction between the variables (say no A x B x C inter- action), only in some cases will the association between A and B be the same in the collapsed (A x B) table and in the original table (A x B x C x ... x N ) .

The reading of percentages or proportions in the sub-tables defined by the sufficient configurations would be an easy way out of the complex interpretation of the estimated parameters of a log-linear model. The standardization of the estimated full-table frequencies will yield, for any parameter included in the model, a sub-table easily interpreted in terms of proportions or probabilities. These sub-tables are not subject to Simpson’s paradox since the conditional feature of the association between variables in multivariate table analysis is retained.

A N E X A M P L E

The best way to explain the use and the advantages of this method is by giving an example.

Table I presents data from an evaluation study of the impact of home-care on the elderly’s (65 years of age and over) desire to live at home rather than in an institution (Bdand, 1980a and b). Random samples were drawn from three elderly populations: two were from middle-sized cities and one was from a poor neigh- bourhood in a metropolitan area. The first variable (L) is a dichotomous represen- tation of the interviewees‘ functional status of housing.‘ Functional housing is represented by I, non-functional housing by 2. The second variable ( R ) splits interviewees into two groups - those living with others (I), and those living alone (2 ) . The samples (S) are in columns. The first two samples are from the middle- sized cities, the third is from the neighbourhood in the metropolitan area.

The study seeks to explain the interviewees’ desire or lack of desire to move out of their usual dwelling place to a prosthetic residence for the elderly; it also seeks to infer from the samples the implications for the general population. We can, thus, use a logit model (Bishop, 1969) and apply a G’ statistic. The dichotomous variable H can be understood as a dependent variable, and the L and R variables are the independent variables. The proportion of the ‘stayers’ to the ‘would-be movers’ is, thus, to be explained by the functional characteristics of the actual dwelling (t) and by the social isolation of the interviewees in their home ( R ) . The uniformity of the causal relations between L and R on H is tested on the three samples (S) simultaneously. Logit models are special cases of log-linear models (Bishop et al., 1975; Goodman, 1970; Swafford, 1980). The conclusions reached here apply to log-linear models in general. Logit analysis can be considered as an analogue to regression analysis (Goodman, i972b). It gives rise to a family of log-linear models

Page 4: Interpretation of the strength of relations between variables in log-linear analysis*

211 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

TABLE I1

FOUR MODELS

Degrees of Models G2 freedom ~2~~

1/ [ L R S ] , L H , R H , H S 31.88 9 16.9 2/ [ L R S ] , L H S , R H 9.68 5 11.1 3/ [ L R S ] , L R H , LHS 8.73 4 9.49 4/ [ L R S ] , L R H , L H S , R H S 4.13 2 5.99

which stresses some variables’ capacity to explain the distribution of observations in the categories of another dichotomous variable (for a slightly different approach to logit models, see Theil, 1970).

Only the relevant models are included in Table 11. The terms LRS have been included in each of the models to reproduce the observed frequencies in each of the L X R X S cells in Table I, as suggested by Goodman ( q p b ) , although Bishop (1969) and Fienberg (1977) argue for a less stringent procedure. Model z seems to be the most parsimonious given the G’ criteria. The path leading from model 4 to model 2 by way of model 3 leaves out at each step the term not included in the preceding model using the G’. In the first place, the RHS term is excluded as G&S: 8.73 - 4.13 = 4.6 with 4 - 2 = z degrees of freedom. In the second place, the LRH term is excluded as G Z R H : 9.68 - 8.73 = .95 with I degree of freedom. It can be seen by comparing model 2 and model I that the LHS term is highly significant. Model 2 shows that isolation in the home has a similar effect, in each sample, on the wish to stay home, but that the effect (on those wishes) of the functional characteristics of the dwelling varies for each sample, introducing some intricacies into the model.

Standardizing for the RH term yields a 2 X 2 table (Table 111). The results obtained take into account the fact that variable L is present in Table I and that the model is tested on three samples simultaneously. In short, as we shall see later i t i s not the bivariate sub-table formed by the R and H variables that has been standardized.

The interpretation of Table 111 is straightforward. There are 59 chances from a hundred that an interviewee not living alone will want to stay home. In the general case, the standardized proportions can be interpreted as conditional probabilities or as transitional probabilities where appropriate (Mosteller, 1968). Here the propor- tions have been computed on the rows of Table III. With such a presentation of the results, the differences of the effect of the independent variable R on the dependent variable H is readily seen.3

In model 2, the effect of L on H varies for each sample. Thus standardizing for the LH term yields three tables, one for each sample. The effect of L on His almost negligible in sample 2 , but very important in sample 1. Sample 3 lies somewhere in between. Here again, in the general case, the standardized proportions can be interpreted as conditional probabilities or transition probabilities as appropriate.

The interpretation of the effects pattern is thus straightforward in Tables 111 and IV. The same results will be obtained with polytomous variables. Instead of the

Page 5: Interpretation of the strength of relations between variables in log-linear analysis*

212 FRAN~OIS BELAND

~~~

TABLE I11

STANDARDIZED FREQUENCIES AND PROPORTIONS FOR THE RH TERMS

Variable H

1 2 Total

Variable R 1/ 303.10(58.29) 216.90(41.71) 520(100.00) 2/ 216.90(41.71) 303.10(58.29) 520(100.00)

NOTE: proportions in parentheses

~~

TABLE IV

STANDARDIZED FREQUENCIES AND PROPORTIONS FOR THE LH TERMS

AISample I-Variable H

1 2 Total ~~

Variable L 1/ 106.15(61.24) 67.18(38.76) 173.33(100.00) 21 67.18(38.76) 106.15(61.24) 173.33(100.00)

1 2 Total

Variable L 1/ 88.37(50.98) 84.97(49.02) 173.33(100.00) 2/ 84.97(49.02) 88.37(50.98) 173.33(100.00)

CISample 3-Variable H

1 2 . Total

Variable L 1/ 95.44(55.06) 77.89(44.94) 173.33( 100.00) 2/ 77.89(44.94) 95.44(55.06) 173.33(100.00)

complicated pattern of odds in the logit model (Swafford, 1980: 681) standardiza- tion produces variation of the conditional probabilities of being in one category of the ‘dependent’ variable, given a category of the ’independent’ variable.

C O M P A R I S O N OF THE S T A N D A R D I Z E D FREQUENCIES A N D THE COEFFICIENTS OF A LOGIT E Q U A T I O N

Model 2 in Table XI can be written in equation form where the logits (i.e. the logarithm of the proportion of the ‘stayers’ to the ’would-be movers’) are explained by variables R similarly in each sample and by variable L differently for each sample:

where are logits to be explained, y i are indicators of the proportions of

Page 6: Interpretation of the strength of relations between variables in log-linear analysis*

213 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

TABLE V

VALUES FOR THE COEFFICIENTS OF EQUATIONS 1 AND 7

Some coefficients Equation 2 in equation 7

Y fls ,0637 Y E S ,1121

Y ES -.1619 Y fES -.0970 Y BS .Of305 Y E S -.0151 YfES ,2287 Y!ES .0196

Y fES ,1016 YlRlH .1673

'stayers' (k = I) on 'would-be movers' (k = 2) in the three samples (I = I, 2 or 3), -y$.. are coefficients indicating the effect of L (i = I or 2) on H for each sample and -yiRH are coefficients giving the strength of the effect of R ( j = I or 2) on H . The vahes of the coefficients are given in Table v. There are simple and straightforward relations between the standardized proportions or frequencies of Tables 111 and IV

and the coefficients of Table v. In this example, -yFF is a function of the cross-product ratio computed from

Table 111:

303.10 x 303.10 ' = .1673 = -yfp

216.90 X 216.90 1 In the same manner, each of the -y$' coefficients can be obtained from Tables rva, b and c:

106.15 X 106.15 f = .2287 = -y:z

67.18 X 67.18 1 88.37 X 88.37 '

= .0196 = -y:z 84.94 x 84.94 1

(3)

(4)

Thus the standardized frequencies are directly used to obtain the coefficients in a logit model, or more generally, in any log-linear model.

These results seem to contradict common knowledge about the relation of variables in log-linear or logit models. The relationship between some sub-set Z(l) of a set of Z variables is conditional on the remaining Z,,, variables not included in Z(ll. In this way, a bivariate relation (say R H ) cannot be generally reproduced by the collapsed R x H table obtained from the original table (say the L x R x H x S table). However, here, with the standardization procedure, the relations between

Page 7: Interpretation of the strength of relations between variables in log-linear analysis*

214 FRANCOIS B ~ L A N D

TABLE VI

STANDARDIZED FREQUENCIES OBTAINED FROM THE R x H TABLE

Variable H

I L

Variable R 1/300.34(454) 214.66(326) 2/219.66(111) 300.34(149)

NOTE: Observed frequencies in parentheses

the variables are reproduced by a collapsed table: a bivariate table for the effect of R on H and three bivariate tables for the effect of L on H . This strictly depends on the way the standardization is made. As the conditional feature of the relations between variables is retained in the process, the coefficients in a log-linear model can be transformed into simple cross-classification tables.

In Table VI, the collapsed R X H table has been standardized. The equivalent of y!: in equation 2 is computed:

300.24 x 300.24 ‘ = .1564 f y!?

219.66 X 219.66 1 It can be seen that the result diverges from the 7:: coefficient.

coefficients obtained from each sample. It can be shown (Goodman, 1973) that yf; = y::’ + yfr and that y; = y:’ + 7:. Thus, equation I becomes:

Equation I can be written in another form. The -yF$ coefficients are the yf

In this way the equation commonly used in logit models emerges. The model takes on the familiar look of a hierarchical model. Of course, the interpretation of the y:;’ coefficients differs from that given to the yi; coefficients. yf;’ are indicators of the differences of the effect of L on H between each sample, rather than a measure of the LH effect in each sample. Comparing Table vIIa with Table vnb illustrates the meaning of each of the coefficients and gives an idea of how to interpret second-order interaction in log-linear models in general. Table vIIa reproduces Tables Iva, b and c. The statistics in Table vnb can be interpreted as indicators of the variation of LH between each sample. The statistics in Table vna represent the variation of LH in each sample. In Table vnb, the third sample shows almost no variations, contrary to the results obtained in Table vna. This indicates that the effect of L on H in that sample is near the mean effect of L on H for the three samples under study. There are more chances in the second sample to observe high frequencies in cells Li x Hz (the cell of the frequencies of variable L, category I, and variable H , category z), than in the two other samples, as both Tables vIIa and vnb show. Finally, the first sample’s results override the overall sample means for cells Li x H i and L2 x H2, as shown in both tables vIIa and vnb.

Tables vIIa and wIb provide different readings of the same data. Table vua gives

Page 8: Interpretation of the strength of relations between variables in log-linear analysis*

215 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

TABLE VII

COMPARISON OF STANDARDIZATION OF LH GIVEN S A N D OF LHS

S

L H 1 2 3

AIStandardization of LH given S 1 1 106.15 88.37 95.44 1 2 67.18 83.97 77.84 2 1 67.18 84.97 77.89 2 2 106.15 88.37 95.44

S

L H 1 2 3

Elstandardization of L H S 1 1 96.34 78.29 85.37 1 2 76.99 95.04 87.97 2 1 76.99 95.04 87.97 2 2 96.34 78.29 85.37

the actual standardized frequencies of L x H variables for each sample. Table vIIb shows the differences in the effect of L on H observed between samples.

coefficients (Table v) of equation 2 and the standardized frequencies in Table vIrb. For sample i :

There is also a direct relationship between the

For sample 2:

For sample 3 :

Once again, the standardized frequencies give a concrete image of the significant effects found in a model.

S O M E OTHER C O M P A R I S O N S

The y:? coefficient in equation I can be computed using formula 6 in Goodman (1973).

where the Fikil are the estimated frequencies.

Page 9: Interpretation of the strength of relations between variables in log-linear analysis*

216 FRANCOIS B ~ L A N D

TABLE VIII

THE CONDITIONAL CROSS-PRODUCT RATIO R x H

Sample

1 1 104.39 56.13 94.30 54.71 101.53 42.95 1 2 41.62 55.87 89.70 52.29 50.47 32.06 2 1 23.61 10.87 14.70 14.29 32.47 15.05 2 2 18.38 21.13 27.30 88.71 31.53 21.94 Ratios 1.95 1.95 1.95 1.95 1.95 1.95 $: = In (1.95)' .17 .17 .17 .17 .17 .17

In Table ~ I I I , the cross-product ratios, F,, X F,,/F,, X F,,, are computed for each level of variable L and each sample. From the cross-product ratios, the value of -yf? is easily obtained. This shows a fundamental characteristic of the coefficients in log-linear models or logit models in the multivariate case: the values of the coefficients are the same in each of the categories of the variables left out of the coefficient but included in the original table of counts. It is this characteristic which is the basis of the results of the standardization of marginals.

For the R on H effect, the categories of L in each of the samples are standardized. For the L x S on H effect, the categories of R are standardized. The standardization procedure keeps only the interaction components in a contingency table. When they are equal along some variables, the table of standardized frequencies is collapsed. Thus, in Table VIII each of the rows will be equal once the categories of the variable L are standardized for each sample. Shrinking along L and S occurs and the results in Table 111 are obtained.

ONE P O S S I B L E D I F F I C U L T Y

We have seen that when a second order interaction has to be interpreted (say L H S ) , the effect of one variable (say L ) on the other (say H ) varies for each level of the third (here, the samples S). Either the associations between two variables given the categories of a third variables, or the differences between the associations are observed. In some cases it might be difficult to interpret the association between two variables given the categories of a third variables. An example will illustrate the case.

Let's say that model [ L R S ] , L H S , LRH had been retained instead of model 2, Table 11. Clearly the effect of L on H varies for each sample and under each of the categories of variable R, even though those variations are independent of each other. Standardization of the effect of L on H would produce a L x R x H X S table which is quite difficult to read. Nothing has been gained in the process, as a table similar in size to the original must be interpreted. In this case it might be better to interpret the standardized frequencies of L H S and L R H , in terms of the differences of the effect of L on H between samples and between level of R.

Page 10: Interpretation of the strength of relations between variables in log-linear analysis*

217 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

~ ~~~

TABLE IX

STANDARDIZED FREQUENCIES OF THE EFFECT OF L O N H IN MODEL [LRS], LHS, LRH

Samples

1 2 3

L H R1 R2 Rl Rz R1 R 2

1 1 52.40 55.40 43.48 46.47 46.70 49.80 1 2 34.27 31.27 43.18 40.00 39.97 56.82 2 1 34.27 31.27 43.18 40.00 39.97 36.82 2 2 52.40 55.40 43.48 46.47 46.70 49.85

Also, for the researcher who wants to gain a good grasp of his data, standardiza- tion of LH and of LHS for comparison with the logit equation coefficients might be helpful. Table IX gives the standardized frequencies for the effect of L on H in model [ L R S ] , L H S , LRH. From this table all of the relations between the variables which are useful for understanding the L x H effect can be reproduced.

The effect of L x H for each level of R is easily obtained, conditional on the sample distribution. For sample I :

52.40 x 31.27 x 31.27 x 52.40 ' = -.0368 = -y+!J

55.40 x 34.27 x 34.27 x 55.40 I For sample 2 :

43.48 x 40.00 x 40.00 x 43.48 Q = -.0368 = -y+!:

In [ 4 6 4 X 43.18 X 43.18 X 46.47 1 For sample 3 :

46.70 X 36.82 X 36.82 X 46.70 Q = .0368 = y:!:

49.85 x 39.97 x 39.97 x 49.85 1 In the same manner, the effect of L on H for each sample can be obtained.

Computing +$only:

Contrary to the y,l,H,coefficient from equation i which requires only four statis- tics (standardized frequencies) for computation, the yk,H, coming from model [ L R S ] , L H S , LRH requires eight statistics for computation. The addition of four statistics here comes from the fact that LH is also conditional on R . It is seen from equation 15 that the statistics 52.4 and 55.4 are included in the computation of -y$z. The statistic 52.4 is the standardized frequency of cell L , x HI for sample I, conditional on R , and 55.4 is the standardized frequency for the same cell, same sample, but conditional on R,.

Page 11: Interpretation of the strength of relations between variables in log-linear analysis*

218 F R A N ~ O I S BELAND

TABLE X

STANDARDIZED FREQUENCIES FOR THE [CAS], CH, AH, S H MODE1

H

C ~~~

1 2 3 4 Total

C X H effect

2/ with functional disability ,247 ,202 ,245 ,305 ,0999

1/ without functional disability ,253 ,298 .255 ,195 1,001

H

A 1 2 3 4 Total

A X H effect

2/ does not have help at home ,217 ,229 .282 ,271 0.999 1/ has help at home ,283 ,271 ,218 ,229 1.001

H

S 1 2 3 4 Total

Differences between categories of H for each S 1 ,284 ,259 ,233 ,225 1.001 2 ,184 ,242 ,234 ,341 1.001 3 ,283 ,250 ,283 ,184 1.000

Model: [CAS], CH, AH, S H G2: 31.36 d l . : 21 P [ G 2 ] : .06

A N E X A M P L E W I T H A P O L Y T O M O U S D E P E N D E N T V A R I A B L E

Let’s say that three variables have been observed in three samples of elderly people: the physical functional disability (C), the help received at home (A), and the wish (H) either to stay at home (category I), to live in a high-rise apartment building for the aged (category z), or to be institutionalized either in a nursing home (category 3) or in an hospital (category 4).

The results of the analysis of the contingency table are displayed in Table x. The preferred living arrangement is predicted by the stated functional disability and by the help received at home. But the distributions of the preferred arrangements vary by sample, though the effects of each of the C and A variables are the same in each of those samples.

Each of these effects is ascertained in Table x. It is seen that functional disability does not have an effect on the wish to stay at home or to go into a nursing home, though the wish to be institutionalized in a hospital is a third higher when there is some functional disability (.195/.305). The effect of A on H is seen to distinguish between the home and the apartment living arrangement on the one hand, and the two forms of institutionalization on the other hand. As for the differences of the

Page 12: Interpretation of the strength of relations between variables in log-linear analysis*

219 THE STRENGTH OF RELATION5 BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

preferred living arrangements between samples, the hospital is more popular in the second sample than elsewhere.

For simplicity sake the associative feature of the C x H and A x H effects and the S x H differences can be stated in terms of high or low, big or small effects. But more precise statement could be made. For example, the probability of being in state H4 given state C2 is .305, but this probability is .rgg for those in state C i .

Thus, the analysis of the contingency table is simplified using standardized frequencies. A concrete representation of the associative and interactive structure of the table is achieved using the technique described here.

S U M M A R Y

Standardization of estimated frequencies is a useful procedure for studying associations and interaction in log-linear analysis. They give a simple, straightfor- ward and almost intuitive meaning to the otherwise untractable log-linear coeffi- cients. They can be applied both to the general log-linear models and to logit analysis. It is as easy to handle dichotomous as a polytomous category. Also, though not developed here, tables with structural zeros are admissible, happily so since coefficients are not produced by the usual iterative proportional fitting algo- rithm for estimation of log-linear models.

N O T E S

1 The algorithm for the computation of the standardized frequencies is widely available (Bishop et al., 1975; Mosteller, 1968; Smith, 1976). The results of the analysis in our tables have been obtained by ANOMHI (BCland, 1980~) . ANOMHI is an interactive pro- gram written in APL. It is available at the computer facilities of UniversitC Laval through the semi-public library 350. A Load 350 ANOMHI will give access to the program. A booklet of instruction is available from LABRAPS, FacultC des Sciences de l’Education, UniversitC Laval, C I K - ~ P ~ for a nominal fee. The program covers almost all of the material included in Bishop et al., Discrete Multivariate Analysis (1975). and in the two Haberman’s volumes: Analysis of Qualitative Data, I and II (1978 and 1979).

z A functional housing does not contain obstacles to the circulation of residents, as indoor staircase, provides enough heating in winter, has enough room and is equipped with full kitchen and toilet facilities.

3 In a z x z table, the standardized proportions will always add to a hundred on both rows and columns. This will not be the case if there are different numbers of categories in the variables.

R E F E R E N C E S

Baron, J.N. 1971 ’Indianapolis and beyond: a structural model of occupational mobility across genera-

BCland, F. 198oa MCthodologie pour 1’Cvaluation des programmes socio-sanitaires: le cas des ser-

vices ii domicile pour personnes igkes. QuCbec, UniversitC Laval, Laboratoire de re- cherches sociologiques, cahier 15

tions.’ American Journal of Sociology 85: 815-39

Page 13: Interpretation of the strength of relations between variables in log-linear analysis*

220 F R A N ~ O I S B ~ L A N D

1980b Une enquOte sur les personnes igCes de trois villes du QuPbec. QuPbec: Gouverne-

1980c ANOHMI, un programme en APL pour l’analyse log-linbaire des tableaux de contin-

Bishop, Y.M.M. 1969 ’Full contingency tables, logits and split contingency tables.’ Biometrics 25 : 119-28 Bishop, Y.M.M., S.E. Fienberg, and P. W. Holland 1975 Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press Davis, J.A. 1974 ‘Hierarchical models for significance tests in multivariate contingency tables : an

ment du QuCbec

gence. QuCbec, Les Cahiers D’ASOPE, volume VIII, LABRAPS, UniversitP Lava1

exegesis of Goodman recent papers.’ Pp. 189-231 in M. Costner, Sociological method- ology 1973-74. San Francisco: Jossey-Bass

Featherman, P. L. and R. H. Hauser 1978 Opportunity and change. New York: Academic Press Fienberg, S. 1977 The analysis of cross classified categorical data. Cambridge: MIT Press Gillespie, M.W. 1971 ’Log-linear techniques and the regression analysis of dummy dependent variables.’

Goodman, L.A. 1970 ’The multivariate analysis of qualitative data: interactions among multiple classifica-

i g p a ‘A general model for the analysis of survey.’ American Journal of Sociology 77:

i972b ‘A modified regression approach to the analysis of dichotomous variables.’ Ameri- can Sociological Review 37: 28-45

1972~ ’Some multiplicative models for the analysis of cross-classified data.’ Pp. 649-96 in Proceedings of the sixth Berkeley Symposium on mathematical statistics and proba- bility. Berkeley: University of California Press

Journal of Sociology 78: 1135-91

kinds of cross-classification tables. ‘ American Journal of Sociology 84: 804-19

Sociological methods and research 6: 103-22

tions.’ Journal of the American Statistical Association 65: 226-56

1035-86

1973 ‘Causal analysis of data from panel studies and other kinds of surveys.’ American

1979 ’Multiplicative models for the analysis of occupational mobility tables and other

Haberman, S. J. 1978 Analysis of qualitative data, vol. I. New York: Academic Press 1979 Analysis of qualitative data, vol. 11. New York: Academic Press Hauser, R.M. 1978 ‘A structural model of the mobility table.’ Social forces 56: 919-53 1979 ’Some exploratory methods for modeling mobility tables and other cross-classified

data.’ Pp. 413-58 in K.F. Schuessler, Sociological methodology 1980. San Fran- cisco: Jossey Bass

Mosteller, F. 1968 ‘Association and estimation in contingency tables. ’ Journal of the American statisti-

Schuessler, K.F. 1978 Sociological methodology 1979. San Francisco: Jossey Bass Simpson, E.H.

cal association 64: 1-28

Page 14: Interpretation of the strength of relations between variables in log-linear analysis*

221 THE STRENGTH OF RELATIONS BETWEEN VARIABLES IN LOG-LINEAR ANALYSIS

1951 'The interpretation of interaction in contingency tables.' Journal of Royal Statistical

Smith, K. W. 1979 'Marginal standardization and table shrinking: aids in the traditional analysis of

Swafford, M. 1980 'Three parametric techniques for contingency table analysis: a non-technical com-

Theil, H. 1970 'On the estimation of relationships involving qualitative variables. ' American Jour-

Social Services B, 13: 238-41

contingency tables. ' Social forces, 54: 669-93

mentary. ' American Sociological Review 45 : 664-90

nal of Sociology 76: 103-54