‘structural equation modeling can best be described as a class of methodologies that seeks to...
TRANSCRIPT
‘Structural equation modeling can best be described as a class of methodologies that seeks to represent hypotheses about summary statistics derived from empirical measurements in terms of a smaller number of “structural” parameters defined by a hypothesized underlying model. ... Structural equation modeling represents a melding of factor analysis and path analysis into one comprehensive statistical methodology.’
(Kaplan, 2009;p. 1 & 3)
Why Structural Equation Modeling?
• Measurement moves from exploratory to confirmatory factor analysis
• Testing of complex “path” models
• Simultaneous consideration of both measurement and prediction
(Kelloway, 1998; p. 2-3)
Structural Equation Model[l]ing Structural Equation Model[l]ing (SEM)(SEM)
Approach to SEM Approach to SEM (Kaplan, 2009; p. 9)(Kaplan, 2009; p. 9)
ModificationAssessmentof fit
Estimation
Sample andMeasures
ModelSpecification
Theory
Discussion
A Typical SEM ‘Problem’A Typical SEM ‘Problem’
Appearance
Taste
Portion Size
Food Quality
Friendly Employees
Competent Employees
Courteous Employees
Service Quality
Satisfaction
= directly observable variable = latent or unobservable or construct or concept or factor
Correlation
Regression weights
Loadings
Latent and Observable VariablesLatent and Observable Variables
1 2 43
RLV
X1 X2 X3 X4
1 2 43
FLV
X1 X2 X3 X4
The indicators are considered to be influenced, affected or caused by the underlying LV … a change in the LV will be reflected in a change in all indicators …there is a correspondence between the LV and its indicators (i.e., the indicators are seen as empirical surrogates for a LV).
The underlying assumption is that the LV theoretically exists, rather than being constructed, and it is manifested through its indicators. High correlation is expected amongst the indicators.
The ’s are correlations
The indicators are viewed as causing rather than being caused by the underlying LV … a change in the LV is not necessarily accompanied by a change in all its indicators, rather if any one of the indicators changes, then the latent variable would also change.
FLVs represent emergent constructs that are formed from a set of indicators that may or may not have theoretical rationale. Inter-dependence amongst the indicators is not desired. Low correlation is expected amongst the indicators.
The ’s are regression weights
5
RLVs FLVs
Validity (the attribute exists and variations in the attribute produce variation in the measurement; measures what it is supposed to measure)
Content validity √ √
Face validity √ √
Convergent validity √ x
Discriminant validity √ x
Criterion, concurrent and predictive validity √ √
Reliability (degree to which a measure is free from random error; consistency)
Test retest reliability, alternative form and scorer √ √
Internal consistency (split half and Cronbach’s alpha) √ x
Composite reliability √ x
Confirmatory FA √ x
Multicollinearity Desirable Not desirable
Exogenous (independent) and Exogenous (independent) and Endogenous (dependent) Endogenous (dependent) VariablesVariables
1
x1 x2 x3
2
x4 x5 x6
1
y1 y2 y3
2
y4 y5 y6
Structural or Inner model: “Theoretically” grounded relationships between LVs
Measurement or Outer model: Relationship between the observed variables and their LVs
error1
1
error2
1
Mediation and ModerationMediation and Moderation
Partial Moderation
Education Satisfaction
error 1
1
Mediation
Income
error 2
1
Education
error 1
1
Income
Satisfaction
error 2
1
Education
Income
Full Moderation
Satisfaction
error 1
1
Covariance (CB-SEM) and Partial Covariance (CB-SEM) and Partial Least Squares (PLS-SEM)Least Squares (PLS-SEM)
“ML [CBSEM] is theory-oriented and emphasises the transition from exploratory to confirmatory analysis. PLS is primarily intended for causal-predictive analysis in situations of high complexity but low theoretical information”
(Joreskog & Wold, 1982; 270)
“Such covariance-based SEM (CBSEM) focuses on estimating a set of model parameters so that the theoretiacl covariance matrix implied by the system of structural equations is as close as possible to the empirical covariance matrix observed within the estimation sample. ... Unline CBSEM PLS analysis does not work with latent variables, and estimates model parameters to maximize the variance explained for all endogenous constructs in the model through a series of ordinary least squares (OLS) regressions.”
(Reinartz, Haenlein & Jenseler, 2009; 332)
CBSEM (LISREL, AMOS, EQS, Mplus)
PLS (SmartPLS, PLS Graph, XLSTAT)
Theory Strong ‘Flexible’
Distribution assumptions Multivariate normality Non-parametric
Sample size Large (at least 200) Small (30-100)
Analytical focus Confirming theoretically assumed relationships
Prediction and/or identification of
relationships between constructs
Number of indicators per construct Depending on aggregation (or parceling); ideally 4+
One or more (see consistency at large)
Indicators to construct Mainly reflective (can use MIMIC for formative)
Both reflective and formative
Improper solutions/factor indeterminacy (unique solution)
Depends on model Always identified
Type of measurement Interval or ration (otherwise need PRELIS)
Categorical to ratio
Complexity of model Large models (>100 indicators) problematic
Can deal with large models
CBSEM (Lisrel, AMOS, EQS, Mplus)
PLS (SmartPLS, PLS Graph, XLSTAT)
Parameter estimates Consistent if no estimation problems and confirmation
of assumptions
Consistency at large (when indicators of each construct and sample size
reach infinity)
Correlations between constructs Can be modeled Cannot be modeled
Correlations between errors Can be modeled Cannot be modeled
Assessment of measurement model Available Available
Estimation Structural model independent of
measurement model
Structural and measurement model
estimated simultaneously
Goodness-of-fit measures Available for both for the overall model and for individual constructs
Limited for overall model but available for individual
constructs
Statistical testing of estimates Benchmarks from normal distribution
Inference requires re-sampling (jackknife or bootstrap) thus the term
‘soft’ modeling
LV scores Not directly estimated Part of the analytical approach
Higher order constructs Possible to test Possible to test
Response based segmentation Available Available
HBAT DATA CUSTOMER SURVEY - HBAT is a manufacturer of paper products. Data from 100 randomly selected customers were collected on the
following variables. Classification Variables/EmporographicsX1 - Customer Type: Length of time a particular customer has been buying from HBAT
(1 = Less than 1 year; 2 = Between 1 and 5 years; 3 = Longer than 5 years)X2 - Industry Type: Type of industry that purchases HBAT’s paper products
(0 =Magazine industry; 1 = Newsprint industry)X3 - Firm Size: Employee size
(0 = Small firm, fewer than 500 employees; 1 = Large firm, 500 or more employees)X4 - Region: Customer location
(0 = USA/North America; 1 = Outside North America)X5 - Distribution System: How paper products are sold to customers
(0 = Sold indirectly through a broker; 1 = Sold directly) Perceptions of HBATEach respondent’s perceptions of HBAT on a set of business functions were measured on a graphic rating scale, where a 10 centimetre line
was drawn between the endpoints labelled “Poor” (for 0) and “Excellent” (for 10).X6 - Product qualityX7 – E-Commerce activities/Web siteX8 – Technical supportX9 – Complaint resolutionX10 – AdvertisingX11 – Product lineX12 – Salesforce imageX13 – Competitive pricingX14 – Warranty and claimsX15 – New productsX16 – Ordering and billingX17 – Price flexibilityX18 – Delivery speed Outcome/Relationship VariablesFor variables X19 to X21 a similar to the above scale was employed with appropriate anchors.X19 – SatisfactionX20 – Likelihood of recommendationX21 – Likelihood of future purchaseX22 – Percentage of current purchase/usage level from HBATX23 – Future relationship with HBAT (0 = Would not consider; 1 = Would consider strategic alliance or partnership)
Conceptual ModelConceptual Model
Delivery speed
Complain resolve
Order & billing
Advertising
Product quality
E-commerce
Sales image
Competitive price
Price flexibility
Product line
Customerinterface
Value formoney
Market presence
Purchase level
Satisfaction Recommend
Re-purchase
CB-SEMCB-SEM
Method ... Maximum Likelihood (ML) ... Generalised Least Squares (GLS) ... Asymptotically Distribution Free (ADF) ...
Sample size ... Multivariate normality ... Outliers ... Influential cases ... Missing cases ...
Computer programs ... LISREL ... AMOS ... EQS ... MPlus ... Stata ... SAS ...
14
Confirmatory Factor AnalysisConfirmatory Factor Analysis
15
Normality and Influential CasesNormality and Influential Cases
The χ2 goodness of fit statistic between the observed and estimated covariance matrices [HO : There is no significant difference between the two matrices, HA : There is significant difference between the two matrices]… therefore, ideally we want to retain the null hypothesis …
Measurement ModelMeasurement Model
Unstandardised estimates (loadings) … can use C.R. to test whether the estimate is significant [HO = The estimate is not significantly different from zero; HA = The estimate is significantly [higher – or lower … depending on theory] than zero]
Standardised estimates (loadings) … the size of the estimate provides an indication of convergent validity … should be at least > 0.50 and ideally > 0.70
ReliabilityReliability
The most commonly reported test is Cronbach’s alpha ... but due to a number of concerns recently report composite reliability ...
Customer interface : (Σ λ)2 =(.949+.919+.799)2
Σvar(ε) = (1-.949)2+ (1-.919)2+ (1-.799)2
which results in a composite reliability of 0.92
Value for money = 0.69
Market presence = 0.8319
λ is the standardised estimate (loading)Var(ε) = 1- λ2
)var()(
)(2
2
c
ValidityValidity
Convergent validity ... tested by Average Variance Extracted with a benchmark of 0.50 ... Customer interface = 0.79; Value for money = 0.40; Market presence = 0.64
Discriminant validity ... off diagonal bivariate correlations should be notably lower that diagonal which represents the Square Root of AVE
20
)var(2
2
AVE
Custo.Int VFM Market pr.
Cust.Int. .889
VFM .616 .632
Market Pr. .270 -.061 .800
Goodness of FitGoodness of Fit
There are many indices ... some of the more commonly reported are (should also consider sample size and number of variables in the model): The χ2 goodness of fit statistic (ideally test should not be
significant … however, rarely this is the case and therefore often overlooked)
Absolute measures of fit: GFI > .90 and RMSEA < .08 Incremental measures of fit (compared to a baseline model
which is usually the null model which assumes all variables are uncorrelated): AGFI > .80, TLI > .90 and CFI > .90.
Parsimonious fit (relates model fit to model complexity and is conceptually similar to adjusted R2): normed χ2 values of χ2 :df of 3:1, PGFI and PNFI higher values and Akalike information criteria (AIC) smaller values
21
22
Summary solution after removing Summary solution after removing x17x17
23
Structural ModelStructural Model
Model Model Re-specification/ModificationRe-specification/Modification
Look at residuals. (benchmark 2.5)
Look at modification indices (relationship not in the model that if added will improve overall model χ2 value ... MI > 4 indicate improvement)
Testing improvement in goodness: Δχ2 = 15.84 – 5.07 = 10.77; Δdf = 7-6 = 1
Testing improvement in goodness: Δχ2 = 15.84 – 5.07 = 10.77; Δdf = 7-6 =1
Higher OrderHigher Order
PLS-SEMPLS-SEM
Method ... Ordinary Least Squares (OLS) ... Sample size ... Multivariate normality ... Bootstrap ... Jackknife ... Missing cases ...
Computer programs ... SmartPLS ... PLS-GUI ... VisualPLS ... XLSTAT-Pls ... WarpPLS ... SPAD-PLS ...
33
34
Measurement and Structural Measurement and Structural ModelsModels
An indicator should load high with the hypothesised latent variable and low with the other latent variables
For each block in the model with more than one manifest variable the quality of the measurement model is assessed by means of the communality index. The communality of a latent variable is interpreted as the average variance explained by its indicators (similar to R2).
The redundancy index computed for each endogenous block, measures the portion of variability of the manifest variables connected to an endogenous latent variable explained by the latent variables directly connected to the block.
Testing significanceTesting significance
Since PLS makes no assumptions (e.g., normality) about the distribution of the parameters, either of the following res-sampling approaches are employed.
Bootstrapping -
• k samples are created of size n in order to obtain k estimates for each parameter.
• Each sample is created by sampling with replacement from the original data set.
• For each of the k samples calculate the pseudo-bootstrapping value.
• Calculate the mean of the pseudo-bootstrapping values as a proxy for the overall
“population” mean.
• Treat the pseudo-bootstrapping values as independent and randomly distributed and calculate their standard deviation and standard error.
• Use the bootstrapping t-statistic with n-1 degrees of freedom (n = number of samples) to test the null hypothesis (significant of loadings, weights and paths).
• Jack-knifing -
• Calculate the parameter using the whole sample.
• Partition the sample into sub-samples according to the deletion number d.
• A process similar to bootstrapping is followed to test the null hypotheses
Recommendation – Use bootstrapping with ‘Individual sign changes’ option, k = number of valid observations and n = 500+
37
38
39
Model EvaluationModel Evaluation
Predictive Power - R2: The interpretation is similar to that employed under traditional multiple regression analysis, i.e. indicates the amount of variance explained by the model. Examination of the change in R2 can help to determine whether a LV has a substantial effect (significant) on a particular dependent LV.
The following expression provides an estimate of the effect size of f2 and, using the guidelines provided by Cohen (1988), interpret an f2 of .02, .15 and .35 as respectively representing small, medium and large effects.
41
2
222
1 included
excludedincluded
R
RRf
42
Removing the customer interface → purchase pathway results in an R2 of purchase of .585 ... f2 = (.664-.585)/(1-.664) = 0.23 which is a medium to large effect.
In addition to the significance we should also examine the relevance of pathways ... > .20 coefficient
Predictive Relevance - Q2 [Stone, 1974; Geisser, 1975]: This relates to the predictive sample reuse technique (PSRT) that represents a synthesis of cross-validation and function fitting. In PLS this can be achieved through a blindfolding procedure that “… omits a part of the data for a particular block of indicators during initial parameter estimation and then attempts to estimate the omitted part of the data by using the estimated parameters”.
Q2 = 1 – ΣE/ΣO
Where: ΣE = Sum square of prediction error [Σ(y – ye)2] for omitted data and ΣO = Sum square of observed error [Σ(y – y)2] for remaining data
Q2 > 0 implies that the model has predictive relevance while Q2 < 0 indicates a lack of predictive relevance.
43
44
Higher OrderHigher Order
45
If the number of indicators for each of your two constructs are approximately equal can use the method of repeated manifest variables. The higher order factor that represents the two first order constructs is created by using all the indicators used for the two first order constructs.
ReadingsReadings
Arbuckle, J. L. and Wothke, W. (1999), Amos 4.0 User’s Guide. Chicago:Small Waters Corporation
Byrne, B.M. (2010), Structural Equation Modeling with AMOS, 2nd ed., New York:Toutledge
Hair, J. F., Anderson, R. E., Tatham, R.L. and Black, W. C. (1998), Multivariate Data Analysis, 5th ed., New Jersey:Prentice Hall
Hair, J.F., Hult, G.T.M., Ringle, C.M. and Sarstedt, M. (2014), A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), London:Sage Publ.
Kaplan, D. (2009), Structural Equation Modeling: Foundations and Extensions, 2nd ed., London:Sage Publ.
Vinzi, V.E., Chin, W.W., Hensler, J. and Wang (eds) (2010), Handbook of Partial Least Squares, London:Springer
47