effect estimation with latent variables

18
Effect Estimation with Latent Variables Bengt Muth´ en Professor Emeritus, UCLA Mplus [email protected] & Tihomir Asparouhov Mplus Presentation at the Data Science in the Social and Behavioral Sciences Virtual Opening Workshop on SEMs, DAGs, and Causal Inference, January 11, 2021 We thank Noah Hastings for expert assistance. Bengt Muth´ en & Tihomir Asparouhov Latent Variable Effect Estimation 1/ 18

Upload: others

Post on 21-Oct-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effect Estimation with Latent Variables

Effect Estimation with Latent Variables

Bengt MuthenProfessor Emeritus, UCLA

[email protected]

&

Tihomir AsparouhovMplus

Presentation at the Data Science in the Social and BehavioralSciences Virtual Opening Workshop on SEMs, DAGs, and Causal

Inference, January 11, 2021

We thank Noah Hastings for expert assistance.

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 1/ 18

Page 2: Effect Estimation with Latent Variables

Outline

Quick overview - a roadmap for further readings (slides posted at the Mpluswebsite www.statmodel.com):

Mediation analysis in the Mplus software using counterfactually-definedeffectsFocus on effect estimation with latent variables:

Latent continuous constructs (factors) as mediatorsLatent categorical variables (finite mixture modeling) used forcomplier-average causal effects (CACE)Latent centering and latent variable interactions in multilevelmediation modelingRandom effects used for propensity scores, moderators, andmediators in multilevel time series analysis of intensivelongitudinal data

Thoughts on dissemination of new techniques and further steps

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 2/ 18

Page 3: Effect Estimation with Latent Variables

Latent Variable Modeling in the Mplus Software

Mplus - Statistical Analysis with Latent Variables (since 1998)

Not an SEM program per se but often used for SEM

Many variable types: Continuous, binary, ordinal, nominal, count, censored,skew-normal, skew-t

Latent variables as a common theme:

Continuous latent variables: Factors, random effectsCategorical latent variables: Mixtures

In addition to factor analysis (EFA, CFA) and SEM, the following analysistypes are integrated using a general SEM latent variable model structure:

Finite mixture modeling (e.g. latent class analysis, hidden Markovmodeling, latent transition analysis)Multilevel modeling (twolevel, threelevel, cross-classified)Survival analysis (discrete- and continuous-time)Time series analysis (N=1, multilevel)

Effort to make complex methods easy to understand and use while allowingflexible analysis in a general modeling framework

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 3/ 18

Page 4: Effect Estimation with Latent Variables

Mediation Analysis: Counterfactual Effects in Mplus

Options for counterfactually-defined effects in mediation analysis:

Effects: CDE, TNDE, PNDE, TNIE, PNIEVanderWeele & Vansteelandt (2009). Statistics and Its InterfaceValeri & VanderWeele (2013). Psychological Methods

Mediators: Continuous, binary, ordinal, nominal, and latentOutcomes: Continuous, binary, count, two-part, and latentParametric models: Linear, probit, logit, Poisson, negative binomialEstimators: ML, Weighted Least Squares, BayesSensitivity analysis for mediator-outcome confounding (Imai, 2010)

The counterfactually-defined effects are causal only when a set of assumptionsare fulfilled - a strength of the approach is that it tells us how to define effects

Muthen & Asparouhov (2015). Causal effects in mediation modeling: Anintroduction with applications to latent variables. Structural EquationModeling: A Multidisciplinary Journal

Muthen, Muthen & Asparouhov (2016). Regression And Mediation AnalysisUsing Mplus. www.statmodel.com/Mediation

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 4/ 18

Page 5: Effect Estimation with Latent Variables

Effect Estimation with Strongly Non-Normal Outcomes:Strong Floor Effect, Two Types of Effects

1.15

1.45

1.75

2.05

2.35

2.65

2.95

3.25

3.55

3.85

4.15

4.45

4.75

5.05

5.35

5.65

5.95

6.25

6.55

6.85

WITHDRAW

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

Cou

nt

30%

Two-part (semi-continuous) modeling (Duan et al., 1983 in JBES;Olsen-Schafer, 2001 in JASA; Muthen, Muthen & Asparouhov, 2016)

Define a binary variable Ui where Ui = 1 refers to being above the zero floorand πi its probability,

probit(πi) = κ0 +κ1 Mi +κ2 Xi +κ3 MXi +κ4 Ci, (1)

log Yi|Ui=1 = β0 +β1 Mi +β2 Xi +β3 MXi +β4 Ci + εyi, (2)

Mi = γ0 + γ1 Xi + γ2 Ci + εmi. (3)

Binary outcome need not be rare as in VanderWeele & Vansteelandt (2010) inAJE using approximate odds ratio effects

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 5/ 18

Page 6: Effect Estimation with Latent Variables

Latent Mediator: Factor Measured by Multiple Indicators

harms

juv

breaks takes fights

fm

x

xz

z

SEM model diagram convention:Boxes: Observed variablesCircles: Latent variablesArrows: Regressions, residuals,

covariances.

Muthen, Muthen & Asparouhov (2016).

Randomized intervention trial in Baltimore public schools: X is Grade 1intervention aimed at reducing aggressive-disruptive behavior in the classroom

Mediator FM: Avoiding measurement error by a factor measured in Grade 5 byteacher-rated items: Harms others, Breaks things, Takes property, Fights

Outcome Y: Juvenile court record (binary variable; not a rare event)

Sensitivity to model specifications? Continuous or ordinal factor indicators,normality assumption, probit or logit, conditional independence, XMinteraction, M-Y confounder?

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 6/ 18

Page 7: Effect Estimation with Latent Variables

Latent Moderator: Randomized Trials with Non-Complianceand Complier-Average Causal Effects (CACE)

ucz

x y

Mplus: 2-class mixture analysis of compliers and non-compliers using ML

Latent class membership is known in the treatment group (binary U observed;perfect indicator), unknown for controls (U missing)

Treatment (assignment) effect is zero in non-compliance class (exclusionrestriction)

CACE compares treatment vs control group outcome for complier class

Angrist, Imbens & Rubin (1996). JASA: IVBalke & Pearl (1997). JASA: BoundsLittle & Yau (1998). Psychological Methods: ML mixturesJo (2002). Journal of Educational and Behavioral Statistics: ML mixturesSobel & Muthen (2012). Biometrics: ML mixtures

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 7/ 18

Page 8: Effect Estimation with Latent Variables

Multilevel Mediation Analysis

Causal inference oriented papers:

Hong & Raudenbush (2006). Evaluating Kindergarten retention policy:A case study of causal inference for multilevel observational data. JASA

Propensity score approachVanderWeele (2010). Direct and indirect effects for neighborhood-basedclustered and longitudinal data. Sociological Methods & Research

Counterfactually-defined effects

SEM oriented papers:

Preacher, Zyphur & Zhang (2010-2016) papers on SEM multilevelmediation in Psychological Methods and Structural Equation ModelingAsparouhov & Muthen (2019). Latent variable centering of predictorsand mediators in multilevel and time-series models. Structural EquationModelingAsparouhov & Muthen (2020). Bayesian estimation of single andmultilevel models with latent variable interactions. Structural EquationModeling

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 8/ 18

Page 9: Effect Estimation with Latent Variables

Multilevel Modeling with Latent Variables

Multilevel regression (i is individual, j cluster):

Observed variable centering:

Yij = αj +βw (Xij−X.j)+ εij,

αj = β0 +βb X.j +δj

Contextual effect βb−βw (Raudenbush & Bryk, 2002; p. 140)

Latent variable centering: Replace X.j by Xbj from the latent variabledecomposition Xij = Xwij +Xbj, Yij = Ywij +Ybj to avoid bias

Within level : Ywij = βw Xwij + εij,

Between level : Ybj = β0 +βb Xbj +δj

Mediation analysis example: latent variable centering in a 2-1-1 model whereT is a treatment variable on the between level and Mij = Mwij +Mbj,

Within level : Ywij = β1 Mwij +β2 Mwij Tj + εij

Between level : Ybj = β0y +β3 Mbj +β4 Mbj Tj +β5 Tj +δyj

Mbj = β0m +β6 Tj +δmj

Discussed in VanderWeele (2010) but without centering, no distinctionbetween Mw and Mb, no contextual effect

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 9/ 18

Page 10: Effect Estimation with Latent Variables

Multilevel Modeling with Latent Variables:Returning to the Baltimore Randomized Intervention Trial

Multilevel: students are nested within classrooms

Intervention is classroom based, that is, a between-level variable

Outcome Y: a juvenile court record (binary variable)

The binary Y case not covered in the multilevel counterfactual literature

Mediation analysis: latent variable centering in a 2-1-1 model where T is atreatment variable on the classroom level - and Y is a binary variable

Student level : Y∗wij = β1 Mwij +β2 Mwij Tj + εij (probit regression)

Classroom level : Ybj = β0y +β3 Mbj +β4 Mbj Tj +β5 Tj +δyj

Mbj = β0m +β6 Tj +δmj

Counterfactual effect formulas use probit expressions similar to thesingle-level case but with different variance

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 10/ 18

Page 11: Effect Estimation with Latent Variables

Multilevel Time Series Analysisof Intensive Longitudinal Data

Frequent observations, large T: Daily diary data, ecological momentaryassessments, experience sampling methods, wearables

Within level = time, between level = individual. Variation in within levelparameters across individuals can be characterized by many random effects(continuous latent variables): Mean (level), variance, auto-correlation, slopes,amplitude

Asparouhov, Hamaker & Muthen (2018). Dynamic structural equationmodels. Structural Equation Modeling: A Multidisciplinary Journal(introduced in Mplus 2017)Hamaker et al. (2018). At the frontiers of modeling intensivelongitudinal data... Multivariate Behavioral ResearchSchultzberg & Muthen (2018). Number of subjects and time pointsneeded for multilevel time series analysis: A simulation study ofdynamic structural equation modeling. Structural Equation ModelingMcNeish & Hamaker (2020). A primer on two-level dynamic structuralequation models for intensive longitudinal data in Mplus. Psych MethodsMore references and short course videos at:http://www.statmodel.com/TimeSeries

Opportunities for causal inferenceBengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 11/ 18

Page 12: Effect Estimation with Latent Variables

A Multilevel Time Series Application: VAR Modeling

Within

Between

Decomposition

ZtYt

Yb

Ytw Zt

w

Yt-1w

t-1ZwZt

w

Ytw

Zb Yb ZbZZφ YZφ ZYφ YYφ

ZZφ

YZφZYφ

YYφ

Vector auto regressive modeling (lag 1)

Dynamic structural equation modeling (DSEM, Asparouhov et al., 2018)in Mplus uses a latent variable decomposition (avoids dynamic panelbias referred to as Nickell’s bias, Nickell, 1981)

“Decomposing into within and between ensures that

at the within level we no longer have to worry about time invariantbetween person confoundingat the between level we do not have to worry about confounding due totemporal within person fluctuations” (Hamaker et al., 2018, 2021)

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 12/ 18

Page 13: Effect Estimation with Latent Variables

VAR Modeling Continued

Within

Between

Decomposition

ZtYt

Yb

Ytw Zt

w

Yt-1w

t-1ZwZt

w

Ytw

Zb Yb ZbZZφ YZφ ZYφ YYφ

ZZφ

YZφZYφ

YYφ

Time series version of RI-CLPM (Hamaker et al., 2015 in Psych Methods) butwith more random effects; see also Usami et al. (2019) in Psych Methods

Applications: dyadic interactions among couples (Bolger & Laurenceau, 2013;book on ILD), stress and alcohol consumption (Lieu & West, 2015 in JP),religion and mental health (VanderWeele et al., 2016 in Soc Psychiatry Epi),positive and negative affect (Hamaker et al., 2018 in MBR; N≈200, T≈ 100)

Causal interpretation tempting with patterns seen in long time series with closemeasurements - but threats to causal inference (VanderWeele et al., 2016):marginal structural models (Robins, 1999), inverse-probability weighting

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 13/ 18

Page 14: Effect Estimation with Latent Variables

Effect Estimation in Multilevel Time Series Analysis:Electricity Tariffs, Propensity Score Matching

1

51

101

151

201

251

301

351

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

B2a_

Y1, m

ean

401

451

501

551

601

651

701

751

801

851

901

951

100

1

105

1

B2a_Y2, mean (autocorr = 0.821(0.030))

Propensity scores

estimated from

pre-intervention

random effects to

evaluate post

intervention outcomes

(Schultzberg, 2019).

Within (time)

Between (firms)

tx

ytyt-1

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 14/ 18

Page 15: Effect Estimation with Latent Variables

Effect Estimation in Multilevel Time Series Analysis:Randomized Studies

Experience sampling method (ESM): T= 60 pre-intervention, 60 post (eachperiod has 10 beeps/day via digital wristwatch for 6 days), N=119

Geschwind et al. (2011). Mindfulness training increases momentary positiveemotions and reward experience in adults vulnerable to depression: Arandomized controlled trial. Journal of Consulting and Clinical Psychology

Muthen et al. (2020). In preparation: DSEM with daily cycles in positive affectmodeled by sine-cosine curve with random effects

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 15/ 18

Page 16: Effect Estimation with Latent Variables

Effect Estimation in Multilevel Time Series Analysis:Randomized Studies Cont’d

Intervention interacting with pre-intervention random effects in influencingpost-intervention random effects (for whom is the intervention effective?)

Within

Between

yt-1pre yt

pre yt+1post yt+2

post

tx

Random effects as treatment effect moderators and outcomes

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 16/ 18

Page 17: Effect Estimation with Latent Variables

Effect Estimation in Multilevel Time Series Analysis:Random Effects as Mediators

Smoking cessation study (Shiffman), N=230, T ≈ 150

Smoking urge measured over time; random prompts 5 times/day for a month;binary final outcome: Quit

female

age

urge

logv

phiquit

+

_

+_

_

s

syx

Between

_

Higher smoking urge level gives lower quit probability

Higher smoking urge autocorrelation gives higher quit probability

Higher smoking urge residual variance gives lower quit probability

Higher smoking urge trend slope gives lower quit probability

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 17/ 18

Page 18: Effect Estimation with Latent Variables

Conclusions

The talk has focused on modeling and estimation, not effect identification,DAGs, or evaluation of causal assumptions

Opportunities for more flexible modeling such as using splines

Opportunities for more causal inference research

Bringing it all together - will the resulting technology be easy enough tounderstand and apply by substantive researchers with limited methodologicaltraining or will there be a need to rely on statistical experts/consultants?

How do researchers respond to and master new methods and software? Mplussupport questions has provided 22 years of experience

Bengt Muthen & Tihomir Asparouhov Latent Variable Effect Estimation 18/ 18