c onfounders and interactions: an introduction

113
Confounders and Interactions: An Introduction Manoranjan Pal Indian Statistical Institute Kolkata, India [email protected] [email protected] 1

Upload: kieu

Post on 23-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

C onfounders and Interactions: An Introduction. Manoranjan Pal Indian Statistical Institute Kolkata, India [email protected] [email protected]. An Example. Data were collected from some students in department of an university on the following variables: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: C onfounders  and  Interactions: An Introduction

1

Confounders and Interactions: An Introduction

Manoranjan PalIndian Statistical Institute

Kolkata, [email protected]

[email protected]

Page 2: C onfounders  and  Interactions: An Introduction

2

An Example• Data were collected from some students in department of an

university on the following variables:– No. of times visited theatre per month (z) – Scores in the final examination (y)

• The simple correlation coefficient (ryz) between y and z was calculated to be 0.20 which was significant because the sample size was moderately large.

• The same experiment was repeated for other departments in the university. Every time it was positive and significant.

• Interpretation: As you visit theatre more and more, your result will improve. An interpretation which was hard to believe.

Page 3: C onfounders  and  Interactions: An Introduction

3

An Example (Continued)

• Statisticians were puzzled. After a long investigations they found that who visited theatre more are more intelligent students. So they need less time to study and thus spend more time on other things.

• From the same set of students in the department experiments were carried out to find the IQ of the students (x). The results of the computation were as follows:

rxy = 0.8, ryz = 0.2 and rxz = 0.6.

• Still the paradox was not solved.

Page 4: C onfounders  and  Interactions: An Introduction

4

Solution - 1

• One statistician suggested the following:– Let us fix IQ and take correlation coefficient

between x and z for each IQ. • It was not practicable as such. Sample size was

too less for such experiment. • Sample size was increased and the correlation

coefficient between x and z was found for each IQ.

• Each time the value was negative, but different.

Page 5: C onfounders  and  Interactions: An Introduction

5

Solution - 2

• The effect of x from both y and z was eliminated and the correlation coefficient between y and z was found. It was negative.

• How do we eliminate the effect of x?– We assume that linear relations exists between these variables,

i.e., y = a + b x and z = c + d x (apart from the errors in the equations). The regressions were fitted and the residuals of y and z were found and then the correlations were found between the residuals. This is the correlation coefficient between y and z after eliminating the effect of x and this was negative.

– This is known as the partial correlation coefficient.

Page 6: C onfounders  and  Interactions: An Introduction

6

Discussions• Fortunately, it is not necessary to do all these steps to find out the

partial correlation coefficient. We can use the following formula:

• The result is ryz.x ≈ – 0.58. It is clearly a negative value.• Solution 1 gives different values of the estimates of the

correlation coefficients. • If we assume that the correlation coefficient is same for each

stratum (i.e., fixed value of x) then the estimates will be more or less close and close to – 0.58 for this example.

• If x, y and z is a trivariate normal distribution then theoretically the value of the correlation coefficient will be same for each x.

• Thus Solution 1 does not need any distributional assumptions but gives multiple answers whereas solution 2 is unique but valid under restrictive assumptions.

Page 7: C onfounders  and  Interactions: An Introduction

7

Partial Correlation to Regression

• Correlations and regression coefficients are related. In the equation y = a + b x, b is positive if and only if rxy is positive. Testing for significance of b is same as testing for significance of rxy.

• In the equation y = a + b x + c z, c is positive if and only if ryz.x is positive. Testing for significance of c is same as testing for significance of ryz.x.

• If we want to find the relation between y and z; and the variable x has effect on both then we should take both the variables as regressors and proceed.

• This is why the regression coefficients in a multiple linear regression are known as partial regression coefficients.

• x is called the confounding variable. Not all such variables are confounding variables. The confounding variable should be the true cause of variation of the explained variable.

Page 8: C onfounders  and  Interactions: An Introduction

8

Another Illustration of Confounding • Diabetes is associated with hypertension.• Does diabetes cause hypertension?• Does hypertension causes diabetes?

• Another way in which diabetes and hypertension may be related is when both variables are caused by FACTOR X. For hypertension and diabetes, Factor X might be obesity.

• We should not conclude that diabetes causes hypertension. In fact, they had no true causal relationship. We should rather say that:

• The relationship between hypertension and diabetes is confounded by obesity. Obesity would be termed as a confounding variable in this relationship.

Page 9: C onfounders  and  Interactions: An Introduction

9

Confounders are true causes of disease.

Page 10: C onfounders  and  Interactions: An Introduction

10

Definition of Confounding• A confounder:

– 1) Is associated with exposure– 2) Is associated with disease– 3) Is NOT a consequence of exposure (i.e. not

occurring between exposure and disease)

Page 11: C onfounders  and  Interactions: An Introduction

11

MEDIATING VARIABLE (SYNONYM: INTERVENING

VARIABLE)  EXPOSURE MEDIATOR DISEASE  AN EXPOSURE THAT PRECEDES A MEDIATOR IN A CAUSAL CHAIN IS CALLED AN ANTECEDENT

VARIABLE.

Page 12: C onfounders  and  Interactions: An Introduction

12

Mediation

• A mediation effect occurs when the third variable (mediator, M) carries the influence of a given independent variable (X) to a given dependent variable (Y).

• Mediation models explain how an effect occurred by hypothesizing a causal sequence.

• .

Page 13: C onfounders  and  Interactions: An Introduction

13

Confounding Vs. Mediation

• Exposure occurs first and then Mediator and outcome, and conceptually follows an experimental design).

• Confounders are often demographic variables that typically cannot be changed in an experimental design. Mediators are by definition capable of being changed and are often selected based on flexibility.

Page 14: C onfounders  and  Interactions: An Introduction

14

Another Example: No Confounding

Page 15: C onfounders  and  Interactions: An Introduction

15

A Different Example

• A group of scientists wanted to find the effect of IQ and the time spent on studying for examination on the result of examination. The linear model taken by them was

yt = α + xt+ zt + et .

• They fitted the data and the fitting was good. However, one of the scientists noticed that the residuals did not show random pattern when the data were arranged in increasing order of values of IQ. Then they started investigating the behaviour of the data more closely. They could do so because the sample size was large.

• They fixed the value of IQ at different points and plotted the scatter diagram of result against study hours. Every time the scatter diagram showed linear relation, but the slope changed every time the value of IQ was changed. And surprisingly, it had a systematic increasing pattern as the value of IQ increased.

Page 16: C onfounders  and  Interactions: An Introduction

16

The Revised Model

• Now look at the model again

yt = α + xt+ zt + et .

• We interpret as the change in the value of y on the average as the value of x is increased by one unit keeping the value of z fixed. But why should the value of change as the value of z is increased to some other fixed value. Ideally the intercept parameter, α, should absorb zt and thus the intercept term should change and not the slope parameter.

• It means that the selection of model was wrong. If changes/increases as z increases then is not a constant. We may take to ( + zt) and get

yt = α + ( + zt)xt+ zt + et ,

and get yt = α + xt+ zt + xtzt + et .

• This phenomenon is known as the interaction effect between x and z. It is symmetric. One may arrive at the same by varying coefficient of zt appropriately.

Page 17: C onfounders  and  Interactions: An Introduction

17

No interaction Vs. Interaction• No Interaction: Disease increases with age and this association

is the same for both, male and female. • Interaction: gender interacts with age if the effect of age on

disease is not the same in each gender.

• .

Page 18: C onfounders  and  Interactions: An Introduction

18

Examples• Aspirin protects against heart attacks, but only in men

and not in women. We say then that gender moderates the relationship between aspirin and heart attacks, because the effect is different in the different sexes. We can also say that there is an interaction between sex and aspirin in the effect of aspirin on heart disease.

• In individuals with high cholesterol levels, smoking produces a higher relative risk of heart disease than it does in individuals with low cholesterol levels. Smoking interacts with cholesterol in its effects on heart disease.

Page 19: C onfounders  and  Interactions: An Introduction

19

The Implications• The implication is that, when x or z is increased there is an additional

change in the expected value of y apart from the linear effect.

• If x is increased by one unit for fixed z then the change in y is +zt instead of only, and if z is increased by one unit for fixed x then the change in y is +xt . If both x and z are increased by one unit then the change in y is ++ xt+zt+.

• For binary variables taking only 0 and 1 values the corresponding changes in y are , and ++ respectively assuming that x and z both were in position 0. This is clear from the following table:

Expected values of y at different values of x and z

.

ZX

0 1

0 α α + 1 α + α + + +

Page 20: C onfounders  and  Interactions: An Introduction

20

The Implications

• Since y measures the effect i.e., disease, say, of exposures x and/or z, the number of cases of y in each stage will reflect the same. The odds ratios will be different.

• Interaction between two variables (with respect to a response variable) is said to exist when the association between one of these variables (may be called the exposure variable) and the response variable (generally measured by the odds ratio or relative risk) is different at different levels of the other exposure variable.

• For example, the odds ratio that measures the association between cigarette smoking and lung cancer may be smaller among individuals who consume large quantities of beta carotene in their food when compared to the analogous odds ratio among persons who consume little or no beta carotene in their food.

Page 21: C onfounders  and  Interactions: An Introduction

21

THE INTERACTING OR EFFECT-MODIFYING VARIABLE IS ALSO KNOWN AS A

MODERATOR VARIABLE 

MODERATOR 

EXPOSURE DISEASE A moderator variable is one that moderates or modifies the way in which the exposure and the disease are related. When an exposure has different effects on disease at different values of a variable, that variable is called a modifier.

Page 22: C onfounders  and  Interactions: An Introduction

22

Methods to reduce confounding

– during study design:• Randomization• Restriction• Matching

– during study analysis:• Stratified analysis• Mathematical regression

Page 23: C onfounders  and  Interactions: An Introduction

23

Randomized controlled trial

• Randomized controlled trial: A method where the study population is divided randomly in order to mitigate the chances of self-selection by participants or bias by the study designers. Before the experiment begins, the testers will assign the members of the participant pool to their groups, using a randomization process such as the use of a random number generator.

• For example, in a study on the effects of exercise, the conclusions would be less valid if participants were given a choice if they wanted to belong to the control group which would not exercise or the intervention group which would be willing to take part in an exercise program. The study would then capture other variables besides exercise, such as pre-experiment health levels and motivation to adopt healthy activities. From the observer’s side, the experimenter may choose candidates who are more likely to show the results the study wants to see or may interpret subjective results (more energetic, positive attitude) in a way favorable to their desires.

Page 24: C onfounders  and  Interactions: An Introduction

24

Case-Control Studies• In a case-control study the researcher retrospectively determines which

individuals were exposed to the agent or treatment or the prevalence of a variable in each of the study groups. The researcher assigns confounders to both groups, cases and controls, equally. For example if somebody wanted to study the cause of myocardial infarct and thinks that the age is a probable confounding variable, each 67 years old infarct patient will be matched with a healthy 67 year old "control" person. In case-control studies, matched variables most often are the age and sex.

• Drawback: Case-control studies are feasible only when it is easy to find controls, i.e., persons whose status vis-à-vis all known potential confounding factors is the same as that of the case's patient: Suppose a case-control study attempts to find the cause of a given disease in a person who is 1) 45 years old, 2) African-American, 3) from Alaska, 4) an avid football player, 5) vegetarian, and 6) working in education. A theoretically perfect control would be a person who, in addition to not having the disease being investigated, matches all these characteristics and has no diseases that the patient does not also have — but finding such a control would be an enormous task.

Page 25: C onfounders  and  Interactions: An Introduction

25

An Hypothetical Example

Page 26: C onfounders  and  Interactions: An Introduction

26

Cohort studies• Cohort studies: A group of people is chosen who do not have the outcome of

interest (for example, myocardial infarction). The investigator then measures a variety of variables that might be relevant to the development of the condition. Over a period of time the people in the sample are observed to see whether they develop the outcome of interest (that is, myocardial infarction).

– Internal Controls: In single cohort studies those people who do not develop the outcome of interest are used as internal controls.

– External Controls: Where two cohorts are used, one group has been exposed to or treated with the agent of interest and the other has not, thereby acting as an external control.

• A degree of matching is also possible in cohort studies, creating a cohort of people who share similar characteristics and thus all cohorts are comparable in regard to the possible confounding variable. For example, if age and sex are thought to be confounders, only 40 to 50 years old males would be involved in a cohort study that would assess the myocardial infarct risk in cohorts that either are physically active or inactive.

• Drawback: In cohort studies, the over-exclusion of input data may lead researchers to define too narrowly the set of similarly situated persons for whom they claim the study to be useful. Similarly, "over-stratification" of input data within a study may reduce the sample size in a given stratum to the point.

Page 27: C onfounders  and  Interactions: An Introduction

27

Double blinding

• Double blinding conceals from the trial population and the observes the experiment group membership of the participants. By preventing the participants from knowing if they are receiving treatment or not, the placebo effect should be the same for the control and treatment groups. By preventing the observers from knowing of their membership, there should be no bias from researchers treating the groups differently or from interpreting the outcomes differently.

Page 28: C onfounders  and  Interactions: An Introduction

28

Stratification

• Stratification: As in the example above, physical activity is thought to be a behaviour that protects from myocardial infarct; and age is assumed to be a possible confounder. The data sampled is then stratified by age group – this means, the association between activity and infarct would be analyzed per each age group. If the different age groups (or age strata) yield much different risk ratios, age must be viewed as a confounding variable. There exist statistical tools, among them Mantel–Haenszel methods, that account for stratification of data sets.

Page 29: C onfounders  and  Interactions: An Introduction

29

Stratification of Confounding Variable

• While ascertaining association between 2 factors, we have Exposure and disease

– Both Discrete: 2 levels of exposure/disease: 2x2 table– Both Discrete: More levels of exposure/disease: r x c– Level of disease continuous and exposure discrete or continuous: Usual

regression– Level of disease discrete and exposure discrete or continuous: Regression, but

needs special attention

• A 3rd variable is considered: May be considered as an additional regressor variable or one may use stratification

– Repeat analysis within every level of that variable– E.g. gender, age, breed, farm etc.

• Stratification solves the problem of confounding as well as interaction

Page 30: C onfounders  and  Interactions: An Introduction

30

The Problem with Stratification as a Solution to Confounding

• Stratification sometimes may cause bias. Consider the situation of a pair of dice, die A and die B. Of course, you know that they must be independent. In other words, if you roll one, it tells you nothing about the roll of the other. What if we stratify upon the sum of the dice?

• What happens if we stratify? Let’s look in the stratum where the sum is, for example, 7. In this stratum, if we know A (say, 1) then we know B. If A is 3, B must be 4.

• Earlier, we said that A and B were independent. Now, however, once we stratify upon the sum, if we know A, we know B. We have induced a relationship between A and B that otherwise did not exist.

Page 31: C onfounders  and  Interactions: An Introduction

31

Holding the Extraneous Variable Constant

• For example, if you want to control for gender using this strategy, you would only include females in your research study (or you would only include males in your study). If there is still a relationship between the variables say motivation and test grades, you will be able to tell that the relationship is not due to gender because you have made it a constant (by only including one gender in your study).

Page 32: C onfounders  and  Interactions: An Introduction

32

Statistical Control

• Statistical Control: It's based on the following logic: examine the relationship between the variables at each level of the control/extraneous variable; actually, the computer will do it for you, but that’s what it does.

• One type of statistical control is called partial correlation. This technique shows the correlation between two quantitative variables after statistically controlling for one or more quantitative control/extraneous variables.

• A second type of statistical control is called ANCOVA (or analysis of covariance). This technique shows the relationship between the variables after statistically controlling for one or more quantitative control/extraneous variables.

Page 33: C onfounders  and  Interactions: An Introduction

33

LOGISTIC REGRESSION

A Note Compiled 

by

 MANORANJAN PAL

ECONOMIC RESEARCH UNITINDIAN STATISTISTICAL INSTITUTE

203 BARRACKPUR TRUNK ROADKOLKATA – 700 108

Page 34: C onfounders  and  Interactions: An Introduction

34

Characteristics

Qualitative(Attribute)

Dichotomous Discrete

Quantitative(Variable)

Binary Variables(0 or 1)

(Dummy Variables)

Set of Binary Variables

ContinuousPolychotomous

Page 35: C onfounders  and  Interactions: An Introduction

35

Binary Dependent Variable

• In this case the dependent variable takes only one of two values for each unit/individual.

• Often individual economic agent must choose one out of two alternatives as follows: 

– A household must decide whether to buy or rent a suitable dwelling;– A consumer must choose which of two types of shopping areas to

visit.– A person must choose one of two modes of transportation available;– A person must decide whether or not to attend college.

Page 36: C onfounders  and  Interactions: An Introduction

36

The Linear Probability Model (LPM)

yi = 1 if an event A occurs = 0 if the event does not occur

Suppose the probability that it occurs is Pi. Then

E(yi) = 1× Pi + 0×(1 – Pi)= Pi.We assume that Pi depends on the explanatory variable xi, which is a vector. Thus

yi = Pi + ei = xi' + ei, i = 1, 2, …, T. …(01)

Where T is the size of the sample. For a given xi, we now have,---------------------------------------yi ei Pr(ei)--------------------------------------- …(02)1 1 - xi' xi'0 - xi' 1 - xi'---------------------------------------

Page 37: C onfounders  and  Interactions: An Introduction

37

Problems with LPMs

• E(yi)= Pi = xi' may not be within the unit interval

• Var(ei) = (-xi')2 (1- xi') + (1- xi')2 (xi')= (xi') (1- xi')= (Eyi) (1-Eyi) Introduces heteroscedasticity

• ei takes only two values (-xi') and (1- xi') Normality assumption is violated

However,

• E(ei) = (1 - xi') (xi') + (- xi') (1 - xi') = 0 The only solace

Page 38: C onfounders  and  Interactions: An Introduction

38

GLS Estimation of LPMThus all T observations are written as

y = X + e.It follows that the covariance matrix of e is

Cov(e) = E(ee') = ,where is a diagonal matrix with ith diagonal element Eyi(1-Eyi). If the number of choice outcomes yi observed for each xi', say ni, is just one. That is ni = 1. In that case, feasible GLS can be carried out by estimating it by OLS which, though inefficient, is consistent and constructing to be a diagonal matrix with element

Since is diagonal, feasible GLS is easily applied using WLS (Weighted Least Squares). That is, multiplication of each observation on the dependent and independent variables by the square root of the reciprocal of the variance of the error yields a transformed model, OLS estimation of which produces feasible GLS estimates. Caution: Weighted GLS estimation in this case does not have an intercept term.

Page 39: C onfounders  and  Interactions: An Introduction

39

The Problem with GLS Estimation of LPM

While this estimation procedure is consistent, an obvious difficulty exists. If

xi' falls outside the (0,1) interval, the matrix has negative or undefined

elements on its diagonal. If this occurs one must modify either by deleting

the observations for which the problem occurs or setting the value of xi' ‘ to

0.01 or 0.99, say, and proceeding accordingly. While this does not affect the

asymptotic properties of the feasible GLS procedure, it is clearly an

awkward position to be in, especially since predictions based on the feasible

GLS estimates, = xi' , may also fall outside the (0,1) interval.

Page 40: C onfounders  and  Interactions: An Introduction

40

The Case of Repeated Observations

Let ni 1. The sample proportion of the number of occurrences of the event is pi = yi/ni, where yi is the number of successes out of ni. Since E(pi) = Pi = x'i, the model can be rewritten as

pi = Pi + ei = xi' + ei, i = 1,2, …, T,

where ei is now the difference between pi and its expectation Pi. The full set of T observations is then written as p = X + e.

Since the sample proportions pi are related to the true proportions Pi by pi = Pi + ei, i = 1,2, …, T,

the error term ei has zero mean and variance Pi(1-Pi)/ni, the same as the sample proportion based on ni Bernoulli trials.

Page 41: C onfounders  and  Interactions: An Introduction

41

Estimation Under the Case of Repeated Observations

The covariance matrix of e is

and the appropriate estimator for is

the GLS estimator. If the true proportions Pi are not known the a feasible GLS estimator is

Page 42: C onfounders  and  Interactions: An Introduction

42

Some Alternative Estimations

Page 43: C onfounders  and  Interactions: An Introduction

43

Questionable Value of R2 as a Measure of Goodness of Fit

• The conventionally computed R2 is of limited value in the dichotomous response models. To see why, consider the following figure. Corresponding to a given X, Y is either 0 or 1. Therefore, all the Y values will either lie along the X axis or along the line corresponding to 1. Therefore, generally no LPM is expected to fit such a scatter well. As a result, the conventionally computed R2 is likely to be much lower than 1 for such models. In most practical applications the R2 ranges between 0.2 to 0.6. R2 in such models will be high, say, in excess of 0.8 only when the actual scatter is very closely clustered around points A and B (say), for in that case it is easy to fix the straight line by joining the two points A and B. In this case the predicted yi will be very close to either 0 or 1.

• Thus, use of the coefficient of determination as a summary statistic should be avoided in models with qualitative dependent variable.

Page 44: C onfounders  and  Interactions: An Introduction

44

LPM: The case of High R2

Page 45: C onfounders  and  Interactions: An Introduction

45

The difficulty with the linear probability modelUnfortunately, the predictor obtained from feasible GLS estimation can fall outside the zero-one interval. To ensure that the predicted proportion of successes will fall within the unit interval, at least over a range of xi of interest, one may employ inequality restrictions of the form 0 xi' or the number of repetitions ni must be large enough so that the sample proportion pi is a reliable estimate of the probability Pi. The situation is illustrated in the following figure for the case when xi' = 1 + 2xi2.

0 Figure 1 : Linear and non-linear probability models.

Page 46: C onfounders  and  Interactions: An Introduction

46

The difficulty with the linear probability model

• As we have seen, the LPM is plagued by several problems, such as (1) nonnormality of ui, (2) heteroscedasticity of ui, (3) possibility of values lying outside the 0–1 range, and (4) the generally lower R2 values. Some of these problems are surmountable. For example, we can use WLS to resolve the heteroscedasticity problem or increase the sample size to minimize the non-normality problem. By resorting to restricted least-squares or mathematical programming techniques we can even make the estimated probabilities lie in the 0–1 interval.

• But even then the fundamental problem with the LPM is that it is not logically a very attractive model because it assumes that P i = P(y=1|x)

increases linearly with x, that is, the marginal or incremental effect of x remains constant throughout. This seems patently unrealistic. In reality one would expect that Pi is nonlinearly related to xi.

Page 47: C onfounders  and  Interactions: An Introduction

47

Alternatives to LPM

As an alternative to the linear probability model, the probabilities Pi must assume a nonlinear function of these explanatory variables. In the next sections two particular nonlinear probability models are discussed – the cumulative density functions of normal and logistic random variablesTwo kinds of estimation procedures are applied – feasible GLS when repeated observations are available and ML when ni = 1, or is small.

Page 48: C onfounders  and  Interactions: An Introduction

48

Probit and Logit Models

Two choices of the nonlinear function Pi = g(xi) are the cumulative density functions of normal and logistic random variables. The former gives rise to the probit model and the latter to the logit model. The logit model is based on the logistic cumulative distribution (CDF) functions.

Page 49: C onfounders  and  Interactions: An Introduction

49

The Logit Model

Page 50: C onfounders  and  Interactions: An Introduction

50

The Logit Model

Page 51: C onfounders  and  Interactions: An Introduction

51

The Logit Model

Page 52: C onfounders  and  Interactions: An Introduction

52

An Interpretive NoteFinally, we note the interpretation of the estimated coefficients in logit model. Estimated coefficients do not indicate the increase in the probability of the event occurring given a one unit increase in the corresponsing independent variable. Rather, the coefficients reflect the effect of a change in an independent variable upon 1n(pi/(1 - pi)) for the logit model. The amount of the increase in the probability depends upon the original probability and thus upon the initial values of all the independent variables and their coefficients. This is true since pi = F(x'i) and pi/xij=f(x'i). j' where f(.) is the pdf associated with F(.). For the logit model

Page 53: C onfounders  and  Interactions: An Introduction

53

ML Estimator of Logit Model

When the number of repeated observations on the choice experiment ni is small and pi can not be reliably estimated using the sample proportion, then ML estimation of the logit model can be carried out. If pi is the probability that the event A occurs on the ith trial of the experiment then the random variable yi' which is one if the event occurs but zero otherwise, has the probability function

Consequently, if T observations are available then the likelihood function is

.

Page 54: C onfounders  and  Interactions: An Introduction

54

ML Estimator of Logit Models

The logit model arises when pi is specified to be given by the logistic CDF evaluated at x'i. If F(x'i) denotes the CDFs evaluated at x'i, then the likelihood function (L) for the model is

and the log L is

The first order conditions of the maximum will be non-linear, so ML estimates must be obtained numerically.

Page 55: C onfounders  and  Interactions: An Introduction

55

ML Estimator of Logit Model

Page 56: C onfounders  and  Interactions: An Introduction

56

ML Estimator of Logit Model

Page 57: C onfounders  and  Interactions: An Introduction

57

ML Estimator of Logit Model

Page 58: C onfounders  and  Interactions: An Introduction

58

ML Estimator of Logit Models

Using these derivatives and the recursive relation, ML estimates can be obtained given some initial estimates. The choice of the initial estimates do not matter since it can be shown (Dhrymes, P.J.(1978), Introductory Econometrics, NY, Springer-Verlag, pp. 344-347) that the matrix of second partials 21nL/′ is negative definite for all values of . Consequently, the NR procedure will converge, ultimately, to the unique ML estimates regardless of the initial estimates. Computationally, of course, the choice does matter since the better the initial estimates the fewer iterations required to attain the maximum of the IF. While several alternatives for initial estimates exist, one can simply use the OLS estimates of obtained by regressing yi on the explanatory variables.

Page 59: C onfounders  and  Interactions: An Introduction

59

Tests of Hypothesis

Usual tests about individual coefficients and confidence intervals can be constructed from the estimate of the asymptotic covariance matrix, the negative of the inverse of the matrix of second partials evaluated at the ML estimates, and relying on the asymptotic normality of the ML estimator.

The Hypothesis HO : 2 = 3 = … = k = 0

can be easily carried out using the likelihood ratio (LR) procedure since the value of the log F under the hypothesis is easily attained analytically. If n is the number of successes (y i = 1) observed in the T observations, then the maximum value of the log LF under the null hypothesis H0 is

.)ln()()ln()(ln

TnTnT

TnnwL

Page 60: C onfounders  and  Interactions: An Introduction

60

Tests of Hypothesis

If the hypothesis is true, then asymptotically

has a χ2k-1 distribution, where 1nl is the value of log LF evaluated at . Acceptance

of this hypothesis would, of course, imply that the explanatory variables have no effect on the probability of A occurring. In this case the probability that yi = 1 is estimated by

= n/T,

which is simply the sample proportion.

.

)](1)([ln2ln2

nLwLl

~

Page 61: C onfounders  and  Interactions: An Introduction

61

Measuring Goodness of FitThere is a problem with the use of conventional R2–type measures when the explained variable y takes only two values. The predicted values are probabilities and the actual values y are either 0 or 1. For the linear probability model and the logit model we have Σy = Σ , as with the linear regression model, if a constant term is also estimated. For the probit model there is no such exact relationship.

.

Page 62: C onfounders  and  Interactions: An Introduction

62

Measuring Goodness of Fit

Page 63: C onfounders  and  Interactions: An Introduction

63

SUMMARY AND CONCLUSIONSThe purpose of this presentation is to show how qualitative, or dummy, variables using values of 1 and 0 can be introduced into regression models alongside quantitative variables. The dummy variables are essentially a data classifying device in that they divide a sample into various subgroups based on qualities, or attributes (sex, marital status, race, religion, etc.).

We have considered a model for situations in which the outcomes of an experiment, the dependent variable, takes only two values.

For the binary choice model the appropriate estimation technique depends upon the nature of the sample data that are available. If repeated observations exist on individual decision makers, a feasible GLS estimation procedure can be used. If only one or a few observations exist for each decision maker, ML estimation is possible, that relate the choice probabilities to the unknown parameters in a nonlinear way.

Page 64: C onfounders  and  Interactions: An Introduction

64

Logistic and Poisson Regression Models

Manoranjan PalIndian Statistical Institute

Page 65: C onfounders  and  Interactions: An Introduction

65

An Example

Exposed Non-exposedDeaths 18,000 9,500Person-years 900,000 950,000

The Incidence Rates are:

I1 = 18,000/900,000 = 0.02 deaths per person-year.

I0 = 9,500/950,000 = 0.01 deaths per person-year.

RR = I1/I0 = 2.00.

The incidence rate is double in the exposed case to that of the non-exposed case.

Page 66: C onfounders  and  Interactions: An Introduction

66

The Regression Model

• We can achieve the same result by using a regression model. We define a dichotomous exposure variable (X1) as:

X1 = 0 if non-exposed X1 = 1 if exposed

Rate (I) Exposure (X1) 0.01 0 0.02 1

We want to model the rate (I) as a function of exposure (X1). One possibility is:

I = b0 +b1X1 (+ e).

but this is less convenient statistically. Because the predicted value of I may be outside the range of [0,1] and so on.

Page 67: C onfounders  and  Interactions: An Introduction

67

An Alternative Regression Model

It is more convenient to fit the model:ln(I) = b0 +b1X1 (+ e).

We could fit the model using simple linear regression (least squares). However, the least-squares approach does not handle Poisson or dichotomous outcome variables well, as they are not normally distributed. Instead, the model parameters are estimated by the method of maximum likelihood.

Page 68: C onfounders  and  Interactions: An Introduction

68

Estimation of RR from the Model

The Equation: ln(I) = b0 +b1X1 (+ e).

Exposed: E(ln(I| X1=1) = ln(I1) = b0 + b1.

Non-exposed: E(ln(I| X1=0) = ln(I0) = b0.

ln(I1) – ln(I0) = ln(I1/I0) = (b0+b1) – (b0) = b1.

RR = I1/I0 = .

b1 = ln(RR): The regression coefficient gives log of RR value

Page 69: C onfounders  and  Interactions: An Introduction

69

Estimation of Confidence Interval

The 95% CI for ln(RR) is:Ln(RR) ± 1.96[SE(ln(RR)] = b1+1.96 SE(b1).

If b1 = 0.693 and SE(b1) = 0.124 then

RR = = 2.00.95 % lower confidence limit = e0.693-1.96×0.124 = 1.63and95 % upper confidence limit = e0.693+1.96×0.124 = 2.45.

Page 70: C onfounders  and  Interactions: An Introduction

70

Discussions

• This general approach can be used in a variety of situations.• For cohort studies, we fit the Poisson model

ln(I) = b0 +b1X.

This is Poisson data, and we use Poisson regression to estimate the rate ratio.

• For case-control studies we fit the model

This is logit data and we use logistic regression to estimate the odds ratio.

Page 71: C onfounders  and  Interactions: An Introduction

71

Confounding

• We can use the same approach to control for potential confounding variables:

ln(I) = b0 + b1X1 + b2X2.

where, X1 = 0 if non-exposed = 1 if exposed

and X2 = 0 if Age < 50 = 1 if Age ≥ 50.

Page 72: C onfounders  and  Interactions: An Introduction

72

Confounding

• Then in the exposed groupE(ln(I| X1=1) = ln(I1) = b0 + b1 + b2X2,

• and in the non- exposed group E(ln(I| X1=0) = ln(I0) = b0 + b2X2.

• Thus, ln(I1/I0) = (b0+b1 + b2X2) – (b0 + b2X2) = b1. RR = I1/I0 = .• and we proceed as before.

Page 73: C onfounders  and  Interactions: An Introduction

73

Multiple Levels

• We can also represent multiple categories of exposure (or a confounder): Suppose we have four levels of exposure: none, low, medium and high.

• We need three variables to represent four levels of exposure:ln(I) = b0 + b1X1 + b2X2 + b3X3.

where, X1 = 1 if low exposure, = 0 otherwise;

X2 = 0 if medium exposure, = 0 otherwise

X2 = 0 if high exposure, = 0 otherwise • We can thus estimate the risk for each level relative to the lowest level

of exposure.

Page 74: C onfounders  and  Interactions: An Introduction

74

Interaction (Joint Effects)

• Suppose that we wish to derive the effect of Smoking and use of Asbestos on the incidences of Cancer.

• The usual model (without an interaction term) is:

• ln(I) = b0 + b1X1 + b2X2

where X1 and X2 stands for asbestos and smoking respectively. However, to get the above table, we need to fit the following model:

• ln(I) = b0 + b1X1 + b2X2 + b3X1X2.

Page 75: C onfounders  and  Interactions: An Introduction

75

The Joint Effect

• This can be used to derive the following:Group Χ1 Χ2 Model RR

Asbestos only 1 0 b0+b1

Smoking only 0 1 b0+b2

Both 1 1 b0+b1+b2+b3

• Thus, the joint effect is obtained by

Page 76: C onfounders  and  Interactions: An Introduction

76

Testing the Joint Effect

• Note that if b3= 0 then the joint effect is just . Thus, b3provides a test for interaction. However, it is important to emphasize that b3only provides a test for a departure from the mulitplicative assumptions of the model. It does not test for a departure from additivity.

The confidence interval for the joint effect can be calculated using the following:

Page 77: C onfounders  and  Interactions: An Introduction

77

An Alternative Model

• There is a much easier way to get the same results. Just define three new variables as follows:

X1 = 1 if asbestos but not smoking = 0 otherwiseX2 = 1 if smoking but not asbestos = 0 otherwiseX3 = 1 if both asbestos and smoking = 0 otherwise

• Then fit ln(I) = b0 + b1X1 + b2X2 + b3X3.

• This will give us the separate and joint effects directly without any need to consider Variance covariance matrix.

Page 78: C onfounders  and  Interactions: An Introduction

78

Cohort Study Vs. Case Control Study

Cohort Study Case Control Study

Numerator Cases Cases

Denominator Person-Years Controls

Effect Estimate Rate Ratio Odds Ratio

Modeling Poisson Regression Logistic Regression

Model ln(I) = b0 + b1X1 + b2X2 + …

Page 79: C onfounders  and  Interactions: An Introduction

79

POISSON REGRESSION WITH MULTIPLEEXPLANATORY VARIABLES

Manoranjan PalIndian Statistical Institute

Page 80: C onfounders  and  Interactions: An Introduction

80

Poisson Regression Model

• The Poisson regression model is a technique used to describe count data as a function of a set of predictor variables. In the last two decades it has been extensively used both in human and in veterinary Epidemiology to investigate the incidence and mortality of chronic diseases. Among its numerous applications, Poisson regression has been mainly applied to compare exposed and unexposed cohorts and to evaluate the clinical course of ill subjects.

Page 81: C onfounders  and  Interactions: An Introduction

81

Introduction• Poisson regression analysis is a technique which

allows to model dependent variables that describe count data. It is often applied to study the occurrence of small number of counts or events as a function of a set of predictor variables, in experimental and observational study in many disciplines, including Economy, Demography, Psychology, Biology and Medicine.

Page 82: C onfounders  and  Interactions: An Introduction

82

Applications

• The Poisson regression model may be used as an alternative to the Cox model for survival analysis, when hazard rates are approximately constant during the observation period and the risk of the event under study is small (e.g., incidence of rare diseases). For example, in ecological investigations, where data are available only in an aggregated form (typically as a count), Poisson regression model usually replaces Cox model, which cannot be easily applied to aggregated data.

• Finally, some variants of the Poisson regression model have been proposed to take into account the extra-variability (overdispersion) observed in actual data, mainly due to the presence of spatial clusters or other sources of autocorrelation.

Page 83: C onfounders  and  Interactions: An Introduction

83

Measures of Occurrence in Cohort Studies: Risk and Rate

• The definition of rate may be derived from the general relationship linking the risk to the follow up time:

… (1).

• Variable λ represents the rate of the outcome onset in the cohort and it may be considered as a measure of the “speed” of their occurrence. In many instances, especially for rare diseases in observational cohorts, λ may be considered approximately as a constant. Moreover, when the rate is small, the following useful approximation may be applied:

• .

Page 84: C onfounders  and  Interactions: An Introduction

84

Risk and Rate

• It may be noted that for low values of λt, λ represents a mean rate, while λ(t) represents an instant rate, often called hazard rate.

• λ may be estimated by the ratio between the observed events O and the corresponding sum of follow up times m, named “person-time at risk”.

• An RR estimate may be obtained by the corresponding rate ratio as follows:

• where λ1 and λ2 represent the rates estimated in the exposed and unexposed sub-cohorts, respectively.

Page 85: C onfounders  and  Interactions: An Introduction

85

Poisson Distribution• The variability of a rate estimate and the comparison between rates need some

assumptions about the probability distribution, which is assumed to generate the observed rates. When rare events are considered, a Poisson distribution may be assumed:

• where μ is an unknown parameter, that may be estimated by the observed events O. In the Poisson distribution function, parameter μ represents both the expected number of events and the variance of their estimate. Accordingly, the variance of an estimate of a rate may be obtained as follows:

• .

Page 86: C onfounders  and  Interactions: An Introduction

86

Variance of Rate Ratio• Under the null hypothesis of no association between the outcome (events)

and the factor under study (exposure, medications, etc.), an RR estimate may be assumed to follow approximately a log-normal distribution with expected value of 1. Accordingly, statistical inference about a rate ratio may be performed by the estimate of the variance of its logarithm, which needs the separate estimate of the variance of the two rates:

• Applying the Delta method, such estimate may be obtained by the following equation:

• .

Page 87: C onfounders  and  Interactions: An Introduction

87

Confidence IntervalConfidence intervals of an RR estimate, obtained via a rate ratio, may be obtained by the following equation:

where O1 and O2 are the observed events in the two sub-cohorts and Zα/2 = 1.96 for α=0.05 (useful to obtain 95% confidence intervals).

Page 88: C onfounders  and  Interactions: An Introduction

88

• In the exposed sub-cohort the estimated rate is:

• while the corresponding estimate for the exposed is:

Exposure Number of Cases Person - yearsExposed 108 44870Unexposed 51 21063

Table 1. Results of a Hypothetical Observational Cohort Study

Page 89: C onfounders  and  Interactions: An Introduction

89

Results of a Hypothetical Observational Cohort Study

Finally, the estimate of RR is:

The 95% confidence interval of the estimated RR will be:

The confidence interval includes the expected value under the null hypothesis of no effect of the association (i.e., RR=1), then in the cohort under study no evidence emerges of an association between the exposure and the risk of the disease onset (p > 0.05). A similar result may be obtained by the Poisson regression model.

Page 90: C onfounders  and  Interactions: An Introduction

90

Rate Ratio Estimate via Poisson Regression Model

• As above briefly illustrated, the numerator of a rate for a rare disease may be considered as a realization of a Poisson variable with an unknown parameter μ. As a consequence, the relation between the rate and the variable under study (e.g., exposures or treatments) may be investigated by a Poisson model, which is a regression model belonging to the GLM class (Generalized Linear Models).

where:

and g is called “the link function”.

Page 91: C onfounders  and  Interactions: An Introduction

91

Table 2: An Example of Confounding in an Observational Cohort Study

All individuals(pooled cohort) Stratum 1 - Males Stratum 2 - Females

No. of cases

Person years

No. of cases

Person years

No. of cases

Person years

Exposed 108 44870 Exposed 30 3218 Exposed 78 41652Unexposed 51 21063 Unexposed 44 11699 Unexposed 7 9364

gRRT = 0.99 (0.71;1.4) gRR1 = 2.5 (1.6;3.9) gRR2 = 2.5 (1.2;5.4)

A simple example of confounding by a dichotomous variable (gender) is illustrated in Table 2, using the same data reported in aggregated form in Table 1.

All individuals(pooled cohort) Stratum 1 - Males Stratum 2 - Females

No. of cases

Person years

No. of cases

Person years

No. of cases

Person years

Exposed 108 44870 Exposed 30 3218 Exposed 78 41652Unexposed 51 21063 Unexposed 44 11699 Unexposed 7 9364

gRRT = 0.99 (0.71;1.4) gRR1 = 2.5 (1.6;3.9) gRR2 = 2.5 (1.2;5.4)

Page 92: C onfounders  and  Interactions: An Introduction

92

Table 3: Example of Effect Modifying or Interaction in an Observational Cohort Study

All individuals(pooled cohort) Stratum 1 - Males Stratum 2 - Females

No. of cases

Person years

No. of cases

Person years

No. of cases

Person years

Exposed

391 769309 Exposed

189 478383 Exposed

202 290926

Unexposed

119 358341 Unexposed

78 242043 Unexposed

41 116298

gRRT = 1.5 (1.2;1.9) gRR1 = 1.2 (0.94;1.6) gRR2 = 2.0 (1.4;2.8)

A simple example of interaction between a variable of exposure and an effect modifier, both expressed on a dichotomous scale, is provided in Table 3.

Page 93: C onfounders  and  Interactions: An Introduction

93

Discussions

• In the pooled cohort (Table 3), an association between the exposure and the risk of the disease onset seems to emerge, the corresponding RR being statistically significantly higher than 1, as it is evident from the corresponding 95% confidence interval which does not include such a value. However, after stratifying by gender, different RR emerge comparing males and females (RR=1.2 and RR=2.0, respectively). In conclusion, data in Table 3 suggest an interaction between sex and exposure, indicating that females are probably more susceptible than males to the exposure effect.

Page 94: C onfounders  and  Interactions: An Introduction

94

Interaction in Poisson Regression Model

• In the presence of interaction, separated estimate of RR by each group (stratum) of the effect modifier should be produced. However, different RR may be observed, especially in small cohorts, simply due to the sample variability. To check for the presence of interaction, some formal statistical tests have been developed, including the use of Poisson regression models with (at least) one interaction variable among the predictors.

• where M is the effect modifier and E is the exposure, both considered as binary variables for didactic purposes.

Page 95: C onfounders  and  Interactions: An Introduction

95

Estimation of RR in Poisson Model with Interaction

The two RR estimates in each M stratum may be obtained by the above equation, in fact, when M=0:

and when M =1:

It may be noted that when β3 equals 0, the two RR estimates by M stratum are equals, then M cannot be considered as an effect modifier. As a consequence, interaction may be checked testing the statistical significance of the β3 coefficient by some test commonly employed in GLM (Likelihood ratio, Wald or Score test).

Page 96: C onfounders  and  Interactions: An Introduction

96

Negative Binomial

Manoranjan PalIndian Statistical Institute

Page 97: C onfounders  and  Interactions: An Introduction

97

• This part of the presentation has been taken from:

“Poisson-Based Regression Analysis of Aggregate Crime Rates”, by D. Wayne Osgood, Journal of

Quantitative Criminology, Vol. 16, No. 1, pp. 21 – 43, 2000.

Page 98: C onfounders  and  Interactions: An Introduction

98

Poisson

• The Poisson distribution characterizes the probability of observing any discrete number of events (i.e., 0, 1, 2, . . .), given an underlying mean count or rate of events, assuming that the timing of the events is random and independent. For instance, the Poisson distribution for a mean count of robberies 4.5 would describe the proportion of times that we should expect to observe any specific count of robberies (0, 1, 2, . . .) in a neighbourhood, if the ‘‘true’’ (and unchanging) annual rate for neighbourhood were 4.5, if the occurrence of one robbery had no impact on the likelihood of the next, and if we had an unlimited number of years to observe.

Page 99: C onfounders  and  Interactions: An Introduction

99

Limiting Cases of Poisson Distribution

• When the mean arrest count is low, as is likely for a small population, the Poisson distribution is skewed, with only a small range of counts having a meaningful probability of occurrence.

• As the mean count grows, the Poisson distribution increasingly approximates the normal. The Poisson distribution has a variance equal to the mean count. Therefore, as the mean count increases, the probability of observing any specific number of events declines and a broader range of values have meaningful probabilities of being observed.

Page 100: C onfounders  and  Interactions: An Introduction

100

An Example

• If our interest is in per capita crime rates, say, rather than in counts of offenses, then we have to translate the Poisson distribution of crime counts into distributions of crime rates. Given a constant underlying mean rate of 500 crimes per 100,000 population, population sizes of 200, 600, 2000, and 10,000 would produce the mean crime counts of 1, 3, 10, and 50. For the population of 200, only a very limited number of crime rates are probable (i.e., increments of 500 per 100,000), but those probable rates comprise an enormous range. As the population base increases, the range of likely crime rates decreases, even though the range of likely crime counts increases. The standard deviation around the mean rate shrinks as the population size increases.

Page 101: C onfounders  and  Interactions: An Introduction

101

The Basic Poisson Regression Model• The basic Poisson regression model is:

• Equation (1) is a regression equation relating the natural logarithm of the mean or expected number of events for case i, to the linear function of explanatory variables Equation (2) indicates that the probability of the observed outcome for this case, follows the Poisson distribution (the right-hand side of the equation) for the mean count from Eq. (1). Thus, the expected distribution of crime counts, and corresponding distribution of regression residuals, depends on the fitted mean count. The role of the natural logarithm in Eq. (1) is comparable to the logarithmic transformation of the dependent variable that is common in analysis of aggregate crime rates. In both cases, the regression coefficients reflect proportional differences in rates.

Page 102: C onfounders  and  Interactions: An Introduction

102

Altering the Basic Poisson Regression Model

• Next we must alter the basic Poisson regression model so that it provides an analysis of per capita crime rates rather than counts of crimes. If λi is the expected number of crimes in a given aggregate unit, then λi/ni would be the corresponding per capita crime rate, where ni is the population size for that unit. With a bit of algebra, we can derive a variation of Eq. (1) that is a model of per capita crime rates:

• Thus, by adding the natural logarithm of the size of the population at risk to the regression model of Eq. (1), and by giving that variable a fixed coefficient of one, Poisson regression becomes an analysis of rates of events per capita, rather than an analysis of counts of events. .

Page 103: C onfounders  and  Interactions: An Introduction

103

Poisson Regression Vs. OLS Regression

• In the expected distribution of observed crime rates around the fitted mean crime rates produced by Eq. (3), the standard deviation is inversely proportional to the square root of the population size. Thus, Poisson regression analysis explicitly addresses the heterogeneous residual variance that presented a problem for OLS regression analysis of crime rates.

Page 104: C onfounders  and  Interactions: An Introduction

104

Overdispersion and Variations on the Basic Poisson Regression Model

• Reason 1: The basic Poisson regression model is appropriate only if the probability model of Eq. (2) matches the data. Equation (2) requires that the residual variance be equal to the fitted values, λi, which is plausible only if the assumptions underlying the Poisson distribution are fully met by the data. One assumption is that λi is the true rate for each case, which implies that the explanatory variables account for all of the meaningful variation among the aggregate units. If not, the differences between the fitted and true rates will inflate the variance of the residuals. It is very unlikely that this assumption will be valid, for there is no more reason to expect that a Poisson regression will explain all of the variation in the true crime rates than to expect that an OLS regression would explain all variance other than error of measurement.

Page 105: C onfounders  and  Interactions: An Introduction

105

Overdispersion and Variations on the Basic Poisson Regression Model

• Reason 2: Residual variance will also be greater than λi if the assumption of independence among individual crime events is inaccurate. Dependence will arise if the occurrence of one offense generates a short-term increase in the probability of another occurring. For aggregate crime data, there are many potential sources of dependence, such as an individual offending at a high rate over a brief period until being incarcerated, multiple offenders being arrested for the same incident, and offenders being influenced by one another’s behavior. These types of dependence would increase the year-to-year variability in crime rates for a community beyond λi , even if the underlying crime rate were constant.

Page 106: C onfounders  and  Interactions: An Introduction

106

A Way Out

• For these two reasons, ‘‘overdispersion’’ in which residual variance exceeds λi is ubiquitous in analyses of crime data. Applying the basic Poisson regression model to such data can produce a substantial underestimation of standard errors of the b’s, which in turn leads to highly misleading significance tests.

• We use the negative binomial regression model, which is the best known and most widely available Poisson-based regression model that allows for overdispersion. Negative binomial regression combines the Poisson distribution of event counts with a gamma distribution of the unexplained variation in the underlying or true mean event counts, λi. This combination produces the negative binomial distribution, which replaces the Poisson distribution of Eq. (2).

Page 107: C onfounders  and  Interactions: An Introduction

107

The Negative Binomial Distribution• The formula for the negative binomial is

• where Γ is the gamma function (a continuous version of the factorial function), and φ is the reciprocal of the residual variance of underlying mean counts, α.

• With α equal to zero, we have the original Poisson distribution. As α increases, the distribution becomes more decidedly skewed as well as more broadly dispersed. Even for a moderate α of 0.75, the change from the Poisson is dramatic: From 5.0% of cases having zero crimes and 1.2% having eight or more crimes when α = 0, it would increase to 20.8% and 8.8% of cases respectively when α = 0.75.

Page 108: C onfounders  and  Interactions: An Introduction

108

Poisson Vs. Negative Binomial Regression

• In negative binomial regression (as in almost all Poisson-based regression models), the substantive portion of the regression model remains Eq. (1) for crime counts or Eq. (3) for per capita crime rates. Thus, though the response probabilities associated with the fitted values differ from the basic Poisson regression model, the interpretation of the regression coefficients does not.

Page 109: C onfounders  and  Interactions: An Introduction

109

An Example

• Table I presents descriptive statistics for all measures. During this 5-year period, there were 1212 arrests of juveniles for robbery in this sample of counties. The distribution of arrest rates is highly skewed, with zero robbery arrests of juveniles recorded in 52% of the counties, while the highest annual arrest rates were slightly less than 400 per 100,000.

Page 110: C onfounders  and  Interactions: An Introduction

110

Example (Contd.)

• Poisson-based models do not assume homogeneity of variance. Instead, residual variance is expected to be a function of the predicted number of offenses, which is in turn a function of population size. Furthermore, even though a logarithmic transformation is inherent in Poisson-based regression, observed crime rates of zero present no problem. Unlike the preceding OLS analyses of log crime rates, Poisson-based regression analyses do not require taking the logarithm of the dependent variable. Instead, estimation for these models involves computing the probability of the observed count of offenses, based on the fitted value for the mean count.

Page 111: C onfounders  and  Interactions: An Introduction

111

Conclusions

• Using Poisson-based regression models of offense counts to analyze per capita offense rates is an important advance for research on aggregate crime data. Standard analytical approaches require that data be highly aggregated across either offense types or population units. Otherwise offense counts are too small to generate per capita rates that have appropriate distributions and sufficient accuracy to justify least-squares analysis.

• Poisson-based regression models give researchers an appropriate means for more finegrained analysis. Poisson-based models are built on the assumption that the underlying data take the form of nonnegative integer counts of events. This is the case for crime rates, which are computed as offense counts divided by population size. In our example analysis of juvenile arrest rates for robbery, the Poisson-based negative binomial model provides a very good fit to the data, while OLS analyses produce outliers and require arbitrary choices that have a striking impact on results.

Page 112: C onfounders  and  Interactions: An Introduction

112

Conclusions

• Poisson-based regression models enable researchers to investigate a much broader range of aggregate data.

• The reason they are appropriate is that they recognize the limited amount of information in small offense counts. The price one must pay in this trade off is that the smaller the offense counts, the larger the sample of aggregate units needed to achieve adequate statistical power.

Page 113: C onfounders  and  Interactions: An Introduction

113

Thank You