analysis of interaction effects

Analysis of Interaction Effects

James JaccardNew York University

Will cover the basics of interaction analysis, highlighting multiple regression based strategies

Overview

Will discuss advanced issues and complications in interaction analysis. This treatment will be somewhat superficial but hopefully informative

Conceptual Foundations of Interaction Analysis

Causal Theories

Most (but not all) theories rely heavily on the concept of causality, i.e., we seek to identify the determinants of a behavior or mental state and/or the consequences of a behavior or environmental/mental state

I am going to ground interaction analysis in a causal framework

Causal Theories

Causal theories can be complicated, but at their core, there are five types of causal relationships in causal theories

Direct Causal Relationships

A direct causal relationship is when a variable, X, has a direct causal influence on another variable, Y:

X Y


Frustration Aggression+


Frustration Aggression+

Quality of Relationship with Mother

Adolescent Drug Use

-

Indirect Causal Relationships


An indirect causal relationship is when a variable, X, has a causal influence on another variable, Y, through an intermediary variable, M:

M YX


Quality ofRelationshipwith Mother

AdolescentDrug Use

AdolescentSchool Work

Ethic

Spurious Relationship

Spurious Relationship

A spurious relationship is one where two variables that are not causally related share a common cause:

C

YX

Bidirectional Causal Relationships


A bidirectional causal relationship is when a variable, X, has a causal influence on another variable, Y, and that effect, Y, has a “simultaneous” impact on X:

YX


Quality of Relationship with Mother

Adolescent Drug Use

Moderated Causal Relationships


A moderated causal relationship is when the impact of a variable, X, on another variable, Y, differs depending on the value of a third variable, Z

Z

YX


Treatment vs. No Treatment

Depression

Gender


Treatment vs. No Treatment

Depression

Gender

Exp Negative Peers

Drug Use

Quality of Parent-Adolescent Relationship


The variable that “moderates” the relationship is called a moderator variable.

Z

YX

Causal Theories

We put all these ideas together to build complex theories of phenomena. Here is one example:

Quality ofRelationshipwith Mother

AdolescentDrug Use

AdolescentSchool Work

Ethic

Time MotherSpends with

Child

Gender

Interaction Analysis

Interactions, when translated into causal analysis, focus on moderated relationships

When I encounter an interaction effect, I think:

Z

YX

Key step in interaction analysis is to identify the focal independent variable and the moderator variable.

Sometimes it is obvious – such as with the analysis of a treatment for depression on depression as moderated by gender


Gender

DepressionTreat vs Control

Sometimes it is not obvious – such as an analysis of the effects of gender and ethnicity on the amount of time an adolescent spends with his or her mother


Statistically, it matters not which variables take on which role. Conceptually, it does.

Gender

Time SpentEthnicity

The Statistical Analysis of Interactions

Omnibus tests – I do not use these

Some Common Practices

Hierarchical regression – I use sparingly

Focus on unstandardized coefficients - we tend to stay away from standardized coefficients in interaction analysis because they can be misleading and they do not have “clean” mathematical properties

Y = a + b1 X + e

A “Trick” We Will Use: Linear Transformations

Satisfaction = a + b1 Grade + e

Satisfaction = 12 + -.50 Grade + e

Y = a + b1 X + e




Satisfaction = 9 + -.50 (Grade – 6) + e

Y = a + b1 X + e




Satisfaction = 9 + -.50 (Grade – 6) + e

“Mean centering” is when we subtract the mean

Will focus on four cases:

Categorical IV and Categorical MV

Assume you know the basics of multiple regression and dummy variables in multiple regression


Continuous IV and Categorical MV

Categorical IV and Continuous MV

Continuous IV and Continuous MV


Y = Relationship satisfaction (0 to 10)

X = Gender (female = 1, male = 0)

Z = Grade (6th = 1, 7th = 0)

6th 7th

Female 8.0 7.0

Male 7.0 4.0


6th 7th

Female 8.0 7.0

Male 7.0 4.0

Three questions:

Is there a gender difference for 6th graders?

Is there a gender difference for 7th graders?

Are these gender effects different?


6th 7th

Female 8.0 7.0

Male 7.0 4.0

Gender effect for 6th grade: 8 – 7 = 1


6th 7th

Female 8.0 7.0

Male 7.0 4.0




6th 7th

Female 8.0 7.0

Male 7.0 4.0



Interaction contrast: (8-7) – (7– 4) = -2

Y = a + b1 Gender + b2 Grade + b3 (Gender)(Grade)


6th 7th

Female 8.0 7.0

Male 7.0 4.0

Y = 4.0 + 3.0 Gender + b2 Grade + -2.0 (Gender)(Grade)

Y = a + b1 Gender + b2 Grade + b3 (Gender)(Grade)


6th 7th

Female 8.0 7.0

Male 7.0 4.0

Y = 4.0 + 3.0 Gender + b2 Grade + -2.0 (Gender)(Grade)

Flipped: Y = 7.0 + 1.0 Gender + b2 Grade + 2.0 (Gender)(Grade)

Extend to groups > 2 (add 8th grade)


6th 7th

Female 8.0 7.0

Male 7.0 4.0

Inclusion of covariates

How to generate means and tables



X = Time spent together (in hours)

Z = Gender (female = 1, male = 0)





For females: b = 0.33

For males: b = 0.20

Three questions:

Are the effects different: 0.33 – 0.20






For males: b = 0.20

Y = a + b1 Gender + 0.20 Time + 0.13 (Gender)(Time)






For males: b = 0.20

Y = a + b1 Gender + 0.20 Time + 0.13 (Gender)(Time)

Flipped: Y = a + b1 Gender + 0.33 Time + -0.13 (Gender)(Time)


Do not estimate slopes separately; use flipped reference group strategy

Extend to groups > 2 (use grade as example)


Study conducted in Miami with bi-lingual Latinos



Ad language: Half shown ad in Spanish (0) and half in English (1)




Latino identity: 1 = not at all, 7 = strong identify




Latino identity: 1 = not at all, 7 = strong identify

Outcome = Attitude toward product (1 = unfavorable, 7 = unfavorable)

Hypothesized moderated relationship

Common Analysis Form: Median Split

Many researchers not sure how to analyze this, so use median split for continuous moderator variable and conduct ANOVA

Why this is bad practice….


Identity Mean English – Mean Spanish

1 1.502 1.003 0.504 0.005 -0.506 -1.007 -1.50


Identity Mean English – Mean Spanish

1 1.502 1.003 0.504 0.005 -0.506 -1.007 -1.50

Y = a + b1 Ad language + b2 Identity + b3 Ad X Identity


In order to make intercept meaningful, subtracted 1 from Latino Identity measure, so ranged from 0 to 6

Y = a + b1 Ad language + b2 Identity + b3 Ad X Identity


Mean attitude for Spanish ad for Latino ID = 1 is 3.215



Mean difference for Latino ID = 1 is 1.707 (p < 0.05)



Mean attitude for English ad for Latino ID = 1 is 4.922

Mean difference for Latino ID = 1 is 1.707 (p < 0.05)


Identity Mean English Mean Spanish Difference

1 4.922 3.215 1.707*2 3 4 5 6 7



1 4.922 3.215 1.707*2 4.915 3.662 1.253*

3 4 5 6 7



1 4.922 3.215 1.707*2 4.915 3.662 1.253*

3 4.908 4.108 0.800*

4 5 6 7



1 4.922 3.215 1.707*2 4.915 3.662 1.253*

3 4.908 4.108 0.800*

4 4.901 4.555 0.346*

5 4.895 5.002 -0.107

6 4.888 5.449 -0.561*

7 4.882 5.896 -1.014*

(Common practice, Mean = 3, SD = 1.2; Show R program)


Y: Child anxiety (0 to 20)

X: Parent anxiety (0 to 20)

Z: Parenting behavior: Control (0 to 20)


Y: Child anxiety (0 to 20)

X: Parent anxiety (0 to 20)

Z: Parenting behavior: Control (0 to 20)

Control b for Y onto X

7 .10 8 .20 9 .30 10 .40 11 .50 12 .60 13 .70


Control b for Y onto X

7 .10 8 .20 9 .30 10 .40 11 .50 12 .60 13 .70

Y = a + b1 Control + 0.10 PA + 0.10 (Control)(PA)

(Common practice versus regions of significance)

(Why we include component parts)

Advanced Topics

Three Way Interactions


Identify focal independent variable

Identify first order moderator variable

Identify second order moderator variable

Grade

SatisfactionGender

Ethnicity


Grade 7 Grade 8

Female 6.0 6.0

Male 5.0 4.0

IC1 = (6-5) - (6-4) = -1

IC = (6-5) – (6-4) = -1

European American


Grade 7 Grade 8

Female 6.0 6.0

Male 5.0 4.0

IC1 = (6-5) - (6-4) = -1

Grade 7 Grade 8

Female 6.0 6.0

Male 6.0 6.0

IC = (6-5) – (6-4) = -1 IC = (6-6) – (6-6) = 0

European American Latinos


Grade 7 Grade 8

Female 6.0 6.0

Male 5.0 4.0

IC1 = (6-5) - (6-4) = -1

Grade 7 Grade 8

Female 6.0 6.0

Male 6.0 6.0

IC = (6-5) – (6-4) = -1 IC = (6-6) – (6-6) = 0

European American Latinos

TW = [(6-5) – (6-4)] - [(6-6) – (6-6)] = -1


G7 (1) G8 (0)

Female (1) 6.0 6.0

Male (0) 5.0 4.0

IC1 = (6-5) - (6-4) = -1

G7 (1) G8 (0)

Female (1) 6.0 6.0

Male (0) 6.0 6.0IC = (6-5) – (6-4) = -1 IC = (6-6) – (6-6) = 0

European American (1) Latinos (0)

Y = 6.0 + 0 Gender + b2 Grade + b3 Ethnic + 0 (Gender)(Grade)

+ b5 (Gender)(Ethnic) + b6 (Grade)(Ethnic) + -1 (Gender)(Grade)(Ethnic)

TW = [(6-5) – (6-4)] - [(6-6) – (6-6)] = -1

Modeling Non-Linear Interactions

Y = α + β1 X + β2 Z + ε


β1 = α’ + β3 Z + β4 Z2

Y = α + β1 X + β2 Z + ε

Substitute right hand side for β1:


β1 = α’ + β3 Z + β4 Z2

Y = α + (α’ + β3 Z + β4 Z2) X + β2 Z + ε

Y = α + β1 X + β2 Z + ε

Substitute right hand side for β1:


β1 = α’ + β3 Z + β4 Z2

Y = α + (α’ + β3 Z + β4 Z2) X + β2 Z + ε

Expand:

Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε


Re-arrange terms:

Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε

Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε


Re-arrange terms:

Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε

Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε

Re-label and you have your model:

Y = α + β1 X + β2 Z + β3 XZ + β4 XZ2 + ε


Re-arrange terms:

Y = α + α’X + β3 XZ + β4 XZ2 + β2 Z + ε

Y = α + α’X + β2 Z + β3 XZ + β4 XZ2 + ε

Re-label and you have your model:

Y = α + β1 X + β2 Z + β3 XZ + β4 XZ2 + ε

Use centering strategy to isolate effect of X on Y (β1 ) at any given value of Z; also consider modeling intercept

Exploratory Interaction Analysis

Exploratory Interaction Analysis

Use program in R

Y = Tenured or not (using MLPM)

X = Number of articles published

Y = α + β1 X + ε

Z = Number of years since hired

X COEFFICENT AND M VALUES

N M Value X Slope

478 1.000 .000475 2.000 .002457 3.000 .007408 4.000 .007330 5.000 .009246 6.000 .008166 7.000 .005115 8.000 .00974 9.000 .01148 10.000 .001

Regression Mixture Modeling

BI = α + β1 Aact + β2 PN + β3 PBC + ε

Mixture Regression

But, in reality, we probably are mixing heterogeneous population segments with different coefficients characterizing the segments

When we regress Y onto a set of predictors, we assume that people are drawn from a single population with common linear coefficients

With “mixed” populations, the overall regression analysis can characterize neither segment very well and lead to sub-optimal inferences and intervention strategies

Mixture Regression

Another Example of Aggregation Bias

Mixture Regression

Aact

IntentionSN

PBC

Latent Class X

A four class model fits data best (entries are linear coefficients)

Segment 1 (42%): .33 .02 .01 -.01

Mixture Model for Heavy Episodic Drinking

Segment 2 (17%): .10 .29 .30 .01

Segment 3 (21%): .30 .29 .05 .04

Segment 4 (20%): .48 .09 .25 -.03

Aact SN DN PBC

Interaction Analysis and Establishing Generalizability

It is common for people to conclude that an effect “generalizes” in the absence of a statistically significant interaction effect

Problem is that we can never accept the null hypothesis of a zero interaction contrast

Generalizability

Example with RCT of obesity treatment and gender

Solution: Adopt the framework of equivalence testing

Step 1: Specify a threshold value that will be used to define functional equivalence

Step 2: Specify the range of functional equivalence

Generalizability

Step 3: Calculate the 95% CI for the interaction contrast

Step 4: Determine if the CI is completely within the range of functional equivalence

Measurement Error

It is well known that measurement error can bias parameter estimates in multiple regression. This holds with vigor for interaction analysis

One approach to dealing with measurement error in general is to use latent variable modeling

Measurement Error

D1

Depression

e1

Measurement Error

D1

Depression

e1

Measurement Error

D1 D2 D3

Depression

e1 e2 e3

Latent Variable Regression

e3

Z

X

Support

X1 X2 X3

X1Z1

X2Z2

X3Z3

Y

Z2 Z3Z1

Y1 Y2

e2e1

e8

e7

e9

e4 e5 e6

e10 e11

d3

There are a about a half a dozen approaches to how best to model latent variable interactions (e.g., quasi-maximum likelihood; Bayesian). I recommend the approach developed by Herbert Marsh as a good balance between utility and complexity, coupled with Huber-White sandwich estimators for robustness

Latent Variable Regression

Latent variable regression using multiple group analysis

e3

Z

X

X1 X2 X3

Y

Z2 Z3Z1

Y1 Y2

e2e1

e4 e5 e6

e7 e8

d3

Multi-Group Modeling in SEM

Assumption Violations

If assumptions of normality or variance homogeneity are suspect

Huber-White sandwich estimators

Assumption Violations

Use approaches with robust standard errors

Be careful of outlier resistant robust methods

Bootstrapping

Rand Wilcox work with smoothers

Thank God It Has Ended!

analysis of interaction effects

Documents

causal analysis

direct causal influence

types of causal relationships

line of text

intermediary variable

moderator variable

focal independent variable

basics of interaction