‘structural equation modeling can best be described as a class of methodologies that seeks to...

‘Structural equation modeling can best be described as a class of methodologies that seeks to represent hypotheses about summary statistics derived from empirical measurements in terms of a smaller number of “structural” parameters defined by a hypothesized underlying model. ... Structural equation modeling represents a melding of factor analysis and path analysis into one comprehensive statistical methodology.’

(Kaplan, 2009;p. 1 & 3)

Why Structural Equation Modeling?

• Measurement moves from exploratory to confirmatory factor analysis

• Testing of complex “path” models

• Simultaneous consideration of both measurement and prediction

(Kelloway, 1998; p. 2-3)

Structural Equation Model[l]ing Structural Equation Model[l]ing (SEM)(SEM)

Approach to SEM Approach to SEM (Kaplan, 2009; p. 9)(Kaplan, 2009; p. 9)

ModificationAssessmentof fit

Estimation

Sample andMeasures

ModelSpecification

Theory

Discussion

A Typical SEM ‘Problem’A Typical SEM ‘Problem’

Appearance

Taste

Portion Size

Food Quality

Friendly Employees

Competent Employees

Courteous Employees

Service Quality

Satisfaction

= directly observable variable = latent or unobservable or construct or concept or factor

Correlation

Regression weights

Loadings

Latent and Observable VariablesLatent and Observable Variables

1 2 43

RLV

X1 X2 X3 X4

1 2 43

FLV

X1 X2 X3 X4

The indicators are considered to be influenced, affected or caused by the underlying LV … a change in the LV will be reflected in a change in all indicators …there is a correspondence between the LV and its indicators (i.e., the indicators are seen as empirical surrogates for a LV).

The underlying assumption is that the LV theoretically exists, rather than being constructed, and it is manifested through its indicators. High correlation is expected amongst the indicators.

The ’s are correlations

The indicators are viewed as causing rather than being caused by the underlying LV … a change in the LV is not necessarily accompanied by a change in all its indicators, rather if any one of the indicators changes, then the latent variable would also change.

FLVs represent emergent constructs that are formed from a set of indicators that may or may not have theoretical rationale. Inter-dependence amongst the indicators is not desired. Low correlation is expected amongst the indicators.

The ’s are regression weights

5

RLVs FLVs

Validity (the attribute exists and variations in the attribute produce variation in the measurement; measures what it is supposed to measure)

Content validity √ √

Face validity √ √

Convergent validity √ x

Discriminant validity √ x

Criterion, concurrent and predictive validity √ √

Reliability (degree to which a measure is free from random error; consistency)

Test retest reliability, alternative form and scorer √ √

Internal consistency (split half and Cronbach’s alpha) √ x

Composite reliability √ x

Confirmatory FA √ x

Multicollinearity Desirable Not desirable

Exogenous (independent) and Exogenous (independent) and Endogenous (dependent) Endogenous (dependent) VariablesVariables

1

x1 x2 x3

2

x4 x5 x6

1

y1 y2 y3

2

y4 y5 y6

Structural or Inner model: “Theoretically” grounded relationships between LVs

Measurement or Outer model: Relationship between the observed variables and their LVs

error1

1

error2

1

Mediation and ModerationMediation and Moderation

Partial Moderation

Education Satisfaction

error 1

1

Mediation

Income

error 2

1

Education

error 1

1

Income

Satisfaction

error 2

1

Education

Income

Full Moderation

Satisfaction

error 1

1

Covariance (CB-SEM) and Partial Covariance (CB-SEM) and Partial Least Squares (PLS-SEM)Least Squares (PLS-SEM)

“ML [CBSEM] is theory-oriented and emphasises the transition from exploratory to confirmatory analysis. PLS is primarily intended for causal-predictive analysis in situations of high complexity but low theoretical information”

(Joreskog & Wold, 1982; 270)

“Such covariance-based SEM (CBSEM) focuses on estimating a set of model parameters so that the theoretiacl covariance matrix implied by the system of structural equations is as close as possible to the empirical covariance matrix observed within the estimation sample. ... Unline CBSEM PLS analysis does not work with latent variables, and estimates model parameters to maximize the variance explained for all endogenous constructs in the model through a series of ordinary least squares (OLS) regressions.”

(Reinartz, Haenlein & Jenseler, 2009; 332)

CBSEM (LISREL, AMOS, EQS, Mplus)

PLS (SmartPLS, PLS Graph, XLSTAT)

Theory Strong ‘Flexible’

Distribution assumptions Multivariate normality Non-parametric

Sample size Large (at least 200) Small (30-100)

Analytical focus Confirming theoretically assumed relationships

Prediction and/or identification of

relationships between constructs

Number of indicators per construct Depending on aggregation (or parceling); ideally 4+

One or more (see consistency at large)

Indicators to construct Mainly reflective (can use MIMIC for formative)

Both reflective and formative

Improper solutions/factor indeterminacy (unique solution)

Depends on model Always identified

Type of measurement Interval or ration (otherwise need PRELIS)

Categorical to ratio

Complexity of model Large models (>100 indicators) problematic

Can deal with large models

CBSEM (Lisrel, AMOS, EQS, Mplus)

PLS (SmartPLS, PLS Graph, XLSTAT)

Parameter estimates Consistent if no estimation problems and confirmation

of assumptions

Consistency at large (when indicators of each construct and sample size

reach infinity)

Correlations between constructs Can be modeled Cannot be modeled

Correlations between errors Can be modeled Cannot be modeled

Assessment of measurement model Available Available

Estimation Structural model independent of

measurement model

Structural and measurement model

estimated simultaneously

Goodness-of-fit measures Available for both for the overall model and for individual constructs

Limited for overall model but available for individual

constructs

Statistical testing of estimates Benchmarks from normal distribution

Inference requires re-sampling (jackknife or bootstrap) thus the term

‘soft’ modeling

LV scores Not directly estimated Part of the analytical approach

Higher order constructs Possible to test Possible to test

Response based segmentation Available Available

HBAT DATA CUSTOMER SURVEY - HBAT is a manufacturer of paper products. Data from 100 randomly selected customers were collected on the

following variables. Classification Variables/EmporographicsX1 - Customer Type: Length of time a particular customer has been buying from HBAT

(1 = Less than 1 year; 2 = Between 1 and 5 years; 3 = Longer than 5 years)X2 - Industry Type: Type of industry that purchases HBAT’s paper products

(0 =Magazine industry; 1 = Newsprint industry)X3 - Firm Size: Employee size

(0 = Small firm, fewer than 500 employees; 1 = Large firm, 500 or more employees)X4 - Region: Customer location

(0 = USA/North America; 1 = Outside North America)X5 - Distribution System: How paper products are sold to customers

(0 = Sold indirectly through a broker; 1 = Sold directly) Perceptions of HBATEach respondent’s perceptions of HBAT on a set of business functions were measured on a graphic rating scale, where a 10 centimetre line

was drawn between the endpoints labelled “Poor” (for 0) and “Excellent” (for 10).X6 - Product qualityX7 – E-Commerce activities/Web siteX8 – Technical supportX9 – Complaint resolutionX10 – AdvertisingX11 – Product lineX12 – Salesforce imageX13 – Competitive pricingX14 – Warranty and claimsX15 – New productsX16 – Ordering and billingX17 – Price flexibilityX18 – Delivery speed Outcome/Relationship VariablesFor variables X19 to X21 a similar to the above scale was employed with appropriate anchors.X19 – SatisfactionX20 – Likelihood of recommendationX21 – Likelihood of future purchaseX22 – Percentage of current purchase/usage level from HBATX23 – Future relationship with HBAT (0 = Would not consider; 1 = Would consider strategic alliance or partnership)

Conceptual ModelConceptual Model

Delivery speed

Complain resolve

Order & billing

Advertising

Product quality

E-commerce

Sales image

Competitive price

Price flexibility

Product line

Customerinterface

Value formoney

Market presence

Purchase level

Satisfaction Recommend

Re-purchase

CB-SEMCB-SEM

Method ... Maximum Likelihood (ML) ... Generalised Least Squares (GLS) ... Asymptotically Distribution Free (ADF) ...

Sample size ... Multivariate normality ... Outliers ... Influential cases ... Missing cases ...

Computer programs ... LISREL ... AMOS ... EQS ... MPlus ... Stata ... SAS ...

14

Confirmatory Factor AnalysisConfirmatory Factor Analysis

Normality and Influential CasesNormality and Influential Cases

The χ2 goodness of fit statistic between the observed and estimated covariance matrices [HO : There is no significant difference between the two matrices, HA : There is significant difference between the two matrices]… therefore, ideally we want to retain the null hypothesis …

Measurement ModelMeasurement Model

Unstandardised estimates (loadings) … can use C.R. to test whether the estimate is significant [HO = The estimate is not significantly different from zero; HA = The estimate is significantly [higher – or lower … depending on theory] than zero]

Standardised estimates (loadings) … the size of the estimate provides an indication of convergent validity … should be at least > 0.50 and ideally > 0.70

ReliabilityReliability

The most commonly reported test is Cronbach’s alpha ... but due to a number of concerns recently report composite reliability ...

Customer interface : (Σ λ)2 =(.949+.919+.799)2

Σvar(ε) = (1-.949)2+ (1-.919)2+ (1-.799)2

which results in a composite reliability of 0.92

Value for money = 0.69

Market presence = 0.8319

λ is the standardised estimate (loading)Var(ε) = 1- λ2

)var()(

)(2

2

c

ValidityValidity

Convergent validity ... tested by Average Variance Extracted with a benchmark of 0.50 ... Customer interface = 0.79; Value for money = 0.40; Market presence = 0.64

Discriminant validity ... off diagonal bivariate correlations should be notably lower that diagonal which represents the Square Root of AVE

20

)var(2

2

AVE

Custo.Int VFM Market pr.

Cust.Int. .889

VFM .616 .632

Market Pr. .270 -.061 .800

Goodness of FitGoodness of Fit

There are many indices ... some of the more commonly reported are (should also consider sample size and number of variables in the model): The χ2 goodness of fit statistic (ideally test should not be

significant … however, rarely this is the case and therefore often overlooked)

Absolute measures of fit: GFI > .90 and RMSEA < .08 Incremental measures of fit (compared to a baseline model

which is usually the null model which assumes all variables are uncorrelated): AGFI > .80, TLI > .90 and CFI > .90.

Parsimonious fit (relates model fit to model complexity and is conceptually similar to adjusted R2): normed χ2 values of χ2 :df of 3:1, PGFI and PNFI higher values and Akalike information criteria (AIC) smaller values

21

Summary solution after removing Summary solution after removing x17x17

23

Structural ModelStructural Model

Model Model Re-specification/ModificationRe-specification/Modification

Look at residuals. (benchmark 2.5)

Look at modification indices (relationship not in the model that if added will improve overall model χ2 value ... MI > 4 indicate improvement)

Testing improvement in goodness: Δχ2 = 15.84 – 5.07 = 10.77; Δdf = 7-6 = 1

Testing improvement in goodness: Δχ2 = 15.84 – 5.07 = 10.77; Δdf = 7-6 =1

Higher OrderHigher Order

PLS-SEMPLS-SEM

Method ... Ordinary Least Squares (OLS) ... Sample size ... Multivariate normality ... Bootstrap ... Jackknife ... Missing cases ...

Computer programs ... SmartPLS ... PLS-GUI ... VisualPLS ... XLSTAT-Pls ... WarpPLS ... SPAD-PLS ...

33

Measurement and Structural Measurement and Structural ModelsModels

An indicator should load high with the hypothesised latent variable and low with the other latent variables

For each block in the model with more than one manifest variable the quality of the measurement model is assessed by means of the communality index. The communality of a latent variable is interpreted as the average variance explained by its indicators (similar to R2).

The redundancy index computed for each endogenous block, measures the portion of variability of the manifest variables connected to an endogenous latent variable explained by the latent variables directly connected to the block.

Testing significanceTesting significance

Since PLS makes no assumptions (e.g., normality) about the distribution of the parameters, either of the following res-sampling approaches are employed.

Bootstrapping -

• k samples are created of size n in order to obtain k estimates for each parameter.

• Each sample is created by sampling with replacement from the original data set.

• For each of the k samples calculate the pseudo-bootstrapping value.

• Calculate the mean of the pseudo-bootstrapping values as a proxy for the overall

“population” mean.

• Treat the pseudo-bootstrapping values as independent and randomly distributed and calculate their standard deviation and standard error.

• Use the bootstrapping t-statistic with n-1 degrees of freedom (n = number of samples) to test the null hypothesis (significant of loadings, weights and paths).

• Jack-knifing -

• Calculate the parameter using the whole sample.

• Partition the sample into sub-samples according to the deletion number d.

• A process similar to bootstrapping is followed to test the null hypotheses

Recommendation – Use bootstrapping with ‘Individual sign changes’ option, k = number of valid observations and n = 500+

37

Model EvaluationModel Evaluation

Predictive Power - R2: The interpretation is similar to that employed under traditional multiple regression analysis, i.e. indicates the amount of variance explained by the model. Examination of the change in R2 can help to determine whether a LV has a substantial effect (significant) on a particular dependent LV.

The following expression provides an estimate of the effect size of f2 and, using the guidelines provided by Cohen (1988), interpret an f2 of .02, .15 and .35 as respectively representing small, medium and large effects.

41

2

222

1 included

excludedincluded

R

RRf

42

Removing the customer interface → purchase pathway results in an R2 of purchase of .585 ... f2 = (.664-.585)/(1-.664) = 0.23 which is a medium to large effect.

In addition to the significance we should also examine the relevance of pathways ... > .20 coefficient

Predictive Relevance - Q2 [Stone, 1974; Geisser, 1975]: This relates to the predictive sample reuse technique (PSRT) that represents a synthesis of cross-validation and function fitting. In PLS this can be achieved through a blindfolding procedure that “… omits a part of the data for a particular block of indicators during initial parameter estimation and then attempts to estimate the omitted part of the data by using the estimated parameters”.

Q2 = 1 – ΣE/ΣO

Where: ΣE = Sum square of prediction error [Σ(y – ye)2] for omitted data and ΣO = Sum square of observed error [Σ(y – y)2] for remaining data

Q2 > 0 implies that the model has predictive relevance while Q2 < 0 indicates a lack of predictive relevance.

43

Higher OrderHigher Order

45

If the number of indicators for each of your two constructs are approximately equal can use the method of repeated manifest variables. The higher order factor that represents the two first order constructs is created by using all the indicators used for the two first order constructs.

ReadingsReadings

Arbuckle, J. L. and Wothke, W. (1999), Amos 4.0 User’s Guide. Chicago:Small Waters Corporation

Byrne, B.M. (2010), Structural Equation Modeling with AMOS, 2nd ed., New York:Toutledge

Hair, J. F., Anderson, R. E., Tatham, R.L. and Black, W. C. (1998), Multivariate Data Analysis, 5th ed., New Jersey:Prentice Hall

Hair, J.F., Hult, G.T.M., Ringle, C.M. and Sarstedt, M. (2014), A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), London:Sage Publ.

Kaplan, D. (2009), Structural Equation Modeling: Foundations and Extensions, 2nd ed., London:Sage Publ.

Vinzi, V.E., Chin, W.W., Hensler, J. and Wang (eds) (2010), Handbook of Partial Least Squares, London:Springer

47

‘structural equation modeling can best be described as a class of methodologies that seeks to...

Documents

indicators changes

correlationsthe indicators

set of indicators

underlying lv

structural equation

path analysis

sem kaplan

confirmatory analysis