multiple regression 3 sociology 5811 lecture 24 copyright © 2005 by evan schofer do not copy or...

Multiple Regression 3

Sociology 5811 Lecture 24

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Announcements

• Schedule: – Today: More multiple regression

• Interaction terms, nested models

– Next Class: Multiple regression assumptions and diagnostics

• Reminder: Paper deadline coming up soon!• Questions about the paper?

Review: Dummy Variables

• Question: How can we incorporate nominal variables (e.g., race, gender) into regression?

• Option 1: Analyze each sub-group separately– Generates different slope, constant for each group

• Option 2: Dummy variables– “Dummy” = a dichotomous variables coded to

indicate the presence or absence of something– Absence coded as zero, presence coded as 1.

Review: Dummy Variables• Strategy: Create a separate dummy variable for

all nominal categories• Example: Race

– Coded 1=white, 2=black, 3=other (like GSS)

• Make 3 dummy variables:– “DWHITE” is 1 for whites, 0 for everyone else– “DBLACK” is 1 for Af. Am., 0 for everyone else– “DOTHER” is 1 for “others”, 0 for everyone else

• Then, include two of the three variables in the multiple regression model.

Dummy Variables: Interpretation• Visually: Women = blue, Men = red

INCOME

100000800006000040000200000

HA

PP

Y

10

9

8

7

6

5

4

3

2

1

0

Overall slope for all data points

Note: Line for men, women have same slope… but one is

high other is lower. The constant differs!

If women=1, men=0: The constant (a) reflects

men only. Dummy coefficient (b) reflects

increase for women (relative to men)

Coefficientsa

9.666 1.672 5.780 .000

2.476 .111 .517 22.271 .000

6.282E-02 .397 .004 .158 .874

-2.666 1.117 -.055 -2.388 .017

1.114 1.777 .014 .627 .531

(Constant)

EDUC

INCOM16

DBLACK

DOTHER

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: PRESTIGEa.

Dummy Variables: Interpretation

• Ex: Job Prestige

• DBLACK coefficient of -2.66 indicates that African Americans have, on average, 2.6 points less job prestige compared to the reference group (in this case, white respondents).

Dummy Variables

• Comments:

• 1. Dummy coefficients shouldn’t be called slopes– Referring to the “slope” of gender doesn’t make sense– Rather, it is the difference in the constant (or “level”)

• 2. The contrast is always with the nominal category that was left out of the equation– If DFEMALE is included, the contrast is with males– If DBLACK, DOTHER are included, coefficients

reflect difference in constant compared to whites.

Interaction Terms

• Question: What if you suspect that a variable has a totally different slope for two different sub-groups in your data?

• Example: Income and Happiness– Perhaps men are more materialistic -- an extra dollar

increases their happiness a lot– If women are less materialistic, each dollar has a

smaller effect on income (compared to men)

• Issue isn’t men = “more” or “less” than women– Rather, the slope of a variable (income) differs across

groups

Interaction Terms

• Issue isn’t men = “more” or “less” than women– Rather, the slope of a variable coefficient (for income)

differs across groups

• Again, we want to specify a different regression line for each group– We want lines with different slopes, not parallel lines

that are higher or lower.

Interaction Terms• Visually: Women = blue, Men = red

INCOME

100000800006000040000200000

HA

PP

Y

10

9

8

7

6

5

4

3

2

1

0

Overall slope for all data points

Note: Here, the slope for men and women

differs.

The effect of income on happiness (X1 on Y)

varies with gender (X2). This is called an

“interaction effect”

Interaction Terms

• Examples of interaction:– Effect of education on income may interact with type

of school attended (public vs. private)• Private schooling has bigger effect on income

– Effect of aspirations on educational attainment interacts with poverty

• Aspirations matter less if you don’t have money to pay for college

• Question: Can you think of examples of two variables that might interact?

• Either from your final project? Or anything else?

Interaction Terms

• Interaction effects: Differences in the relationship (slope) between two variables for each category of a third variable

• Option #1: Analyze each group separately• Look for different sized slope in each group

• Option #2: Multiply the two variables of interest: (DFEMALE, INCOME) to create a new variable– Called: DFEMALE*INCOME– Add that variable to the multiple regression model.

Interaction Terms

• Consider the following regression equation:

iiii eINCDFEMbINCOMEbaY *21

• Question: What if the case is male?

• Answer: DFEMALE is 0, so b2(DFEM*INC) drops out of the equation– Result: Males are modeled using the ordinary

regression equation: a + b1X + e.

Interaction Terms

• Consider the following regression equation:

iiii eINCDFEMbINCOMEbaY *21

• Question: What if the case is female?

• Answer: DFEMALE is 1, so b2(DFEM*INC) becomes b2*INCOME, which is added to b1

– Result: Females are modeled using a different regression line: a + (b1+b2) X + e

– Thus, the coefficient of b2 reflects difference in the slope of INCOME for women.

Interpreting Interaction Terms

• Interpreting interaction terms:

• A positive b for DFEMALE*INCOME indicates the slope for income is higher for women vs. men– A negative effect indicates the slope is lower– Size of coefficient indicates actual difference in slope

• Example: DFEMALE*INCOME. Observed b’s:– Income: b = .5– DFEMALE * INCOME: b = -.2

• Interpretation: Slope is .5 for men, .3 for women.

Coefficientsa

8.855 1.744 5.076 .000

2.541 .118 .531 21.563 .000

6.636E-02 .396 .004 .167 .867

4.293 4.193 .088 1.024 .306

-.576 .332 -.149 -1.735 .083

(Constant)

EDUC

INCOM16

DBLACK

BL_EDUC

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: PRESTIGEa.

Interpreting Interaction Terms• Example: Interaction of Race and Education

affecting Job Prestige:

DBLACK*EDUC has a negative effect (nearly significant). Coefficient of -.576 indicates that the slope of education and job

prestige is .576 points lower for Blacks than for non-blacks.

Continuous Interaction Terms

• Two continuous variables can also interact

• Example: Effect of education and income on happiness– Perhaps highly educated people are less materialistic– As education increases, the slope between between

income and happiness would decrease

• Simply multiply Education and Income to create the interaction term “EDUCATION*INCOME”

• And add it to the model.


• How do you interpret continuous variable interactions?

• Example: EDUCATION*INCOME: Coefficient = 2.0

• Answer: For each unit change in education, the slope of income vs. happiness increases by 2– Note: coefficient is symmetrical: For each unit

change in income, education slope increases by 2

• Dummy interactions effectively estimate 2 slopes: one for each group

• Continuous interactions result in many slopes: Each value of education*income yields a different slope.=


• Interaction terms alters the interpretation of “main effect” coefficients

• Including “EDUC*INCOME changes the interpretation of EDUC and of INCOME

• See Allison p. 166-9

– Specifically, coefficient for EDUC represents slope of EDUC when INCOME = 0

• Likewise, INCOME shows slope when AGE=0

– Thus, main effects are like “baseline” slopes• And, the interaction effect coefficient shows how the slope

grows (or shrinks) for a given unit change.

Dummy Interactions

• It is also possible to construct interaction terms based on two dummy variables– Instead of a “slope” interaction, dummy interactions

show difference in constants• Constant (not slope) differs across values of a third variable

– Example: Effect of of race on school success varies by gender

• African Americans do less well in school; but the difference is much larger for black males.

Dummy Interactions

• Strategy for dummy interaction is the same: Multiply both variables– Example: Multiply DBLACK, DMALE to create

DBLACK*DMALE– Then, include all 3 variables in the model– Effect of DBLACK*DMALE reflects difference in

constant (level) for black males, compared to white males and black females

• You would observe a negative coefficient, indicating that black males fare worse in schools than black females or white males.

Interaction Terms

• Comments:

• 1. If you make an interaction you should also include the component variables in the model:– A model with “DFEMALE * INCOME” should also

include DFEMALE and INCOME• There are rare exceptions. But when in doubt, include them

• 2. Sometimes interaction terms are highly correlated with its components

• That can cause problems (multicollinearity – which we’ll discuss next week).

Interaction Terms

• 3. Make sure you have enough cases in each group for your interaction terms– Interaction terms involve estimating slopes based on

sub-groups in your data (e.g., black females). • If you there are hardly any black females in the dataset, you

can have problems.

Interaction Terms

• 4. Interaction terms are confusing at first… but they are VERY important– Example: Race, class, gender. – Most sociologists argue that they operate interactively

• The experience of black lower-class females is different from black upper-class females or white lower-class females

• Interaction terms are a powerful way of identifying such intersections in quantitative data

– In short: Make the effort to consider how variables interact… it is a very useful way of thinking.

Nested Models

• It is common to conduct a series of multiple regressions, not just one– Each model adds variables or sets of variables

• This is useful for several reasons:– 1. To show how coefficients change when new

variables are added• Which suggests which variables mediate others

– 2. To examine whether additional variables increase model fit (r-square).

Nested Models

• Question: Do the new variables substantially improve the model?

• Idea #1: Look for increase in Adjusted R-Square• That gives you a sense of whether new variables “improve”

the model

• Idea #2: Conduct a formal F-test– Recall that F-tests allow comparisons of variance

(e.g., SSbetween to SSwithin)– This allows you to test the hypothesis that the added

variables improve the model overall.

Nested Models

• F-tests require “nested models”– Models are the same, except for addition of new

variables– You can’t compare totally different models this way

)1()1(

)()(

222

122

122

)1)(( 212

KNR

KKRRF KNKK

• Tests following Hypotheses:

• H0: Two models have the same R-square

• H1: Two models have different R-square

Nested Models

• SPSS can conduct an F-test between two regression models

• A significant F-test indicates:– The second model (with additional variables) is a

significant improvement (in R-square) compared to the first.

multiple regression 3 sociology 5811 lecture 24 copyright © 2005 by evan schofer do not copy or...

Documents

variable income

dummy variablesstrategy

dummy variablescomments

dummy variablesdummy

dummy variablesquestion

nominal variables

separate dummy variable

dummy coefficient b