applied multivariate analysis - vaasan yliopistolipas.uwasa.fi/~sjp/teaching/mva/lectures/c6.pdf ·...
TRANSCRIPT
Applied Multivariate Analysis
Seppo Pynnonen
Department of Mathematics and Statistics, University of Vaasa, Finland
Spring 2017
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Confirmatory Factor Analysis (CFA)
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
1 The model
2 Model Evaluation
Chi-square Test
Some Other Statistics
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
In exploratory factor analysis the aim is to find, for a set ofobserved variables x1, . . . , xp, a set of underlying latent factorsf1, . . . , fq, where m < p.
The model is as in EFA of the form
x = Λf + δ, (1)
where δ is the error term vector.
The factors are supposed to account for the inter-correlations ofthe observed variables.
When m > 1, the factor solution is not unique (not identified).Factor axes can be rotated to find a ”simple structure”.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
In confirmatory factor analysis, the investigator is supposed toknow the number of underlying factors.
In addition he/she is supposed to have additional knowledge thatallows to specify at least m2 independent conditions on Λ(loadings) and Φ (the factor covariance matrix) in
Σ = Λ′ΦΛ + Θδ (2)
such the remaining parameters can be solved uniquely. In (2) Θδ isthe diagonal matrix with variances of the error terms δi on thediagonal i = 1, . . . , p
In such a case, we say that the model is identified.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Most of the restrictions come from the modeling constraints.
In addition, because factors do not have scales, one common wayis to define Φ a correlation matrix.
Another popular base set up is alternatively fix on each column ofΛ one loading equal to one.
Essentially this implies that the scale of the corresponding factor isfixed according to the corresponding variable.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
These technical constraints impose m restrictions.
Technically we need at least m(m − 1) more restrictions.
Usually it suffices that the zeros are distributed over the rows of Λsuch that the columns remain linearly independent (Λ has fullcolumn rank).
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Example 1
The following data is from a study, where the relationship betweenperformance and job satisfaction was investigateda.
The variables are:amm1: Achievement motivation measure 1,amm2: Achievement motivation measure 2,tssem1: Task specific self esteem measure 1,tssem2: Task specific self esteem measure 2,jsm1: Job satisfaction measure 1,jsm2: Job satisfaction measure 2.
In addition the data includesvim: Verbal intelligence measure,
performance: Performance (measured in hundreds of dollars).
aBagozzi, R.P. (1980). Performance and satisfaction in an industrial sales force: An examination of their
antecedents and simultaneity. Journal of Marketing 44, 65–77.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Here we investigate whether the achievement measures (amm1, amm2),
task specific measures (tssem1, tssem2), and job satisfaction measures
(jsm1, jsm2) are measuring the concept they are aimed to measure.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
The model: ����SAT
����ACH
����TSEM
jsm1
jsm2
amm1
amm2
tssem1
tssem2
�����
����:
XXXXXXXXXz
������
���:
XXXXXXXXXz
������
���:
XXXXXXXXXz
'
&
'
&'
&
--
--
--
δ1
δ2
δ3
δ4
δ5
δ6
�
�
�
�
�
�
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
The correlation matrix, means, and standard deviations based on asample of n = 122 observations is the following
perform js1 js2 am1 am2 tssm1 tssm2 verbal
perform 1.000js1 .418 1.000js2 .394 .627 1.000am1 .129 .202 .266 1.000am2 .189 .284 .208 .365 1.000tssm1 .544 .281 .324 .201 .161 1.000tssm2 .507 .225 .314 .172 .174 .546 1.000verbal -.357 -.156 -.038 -.199 -.277 -.294 -.174 1.000mean 720.86 15.54 18.46 14.90 14.35 19.57 24.16 21.36std 2.09 3.43 2.81 1.95 2.06 2.16 2.06 3.65
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
In SAS program estimating the model goes as follows.
/* Job satisfacton example */
data jobsat(type=corr);
infile cards missover; /* jumps over the (missing) symmetric part of the correlation matrix */
input _type_ $ _name_ $ performm jobsatm1 jobsatm2 achvm1 achvm2 sestm1 sestm2 intlm;
datalines;
corr performm 1.000
corr jobsatm1 .418 1.000
corr jobsatm2 .394 .627 1.000
corr achvm1 .129 .202 .266 1.000
corr achvm2 .189 .284 .208 .365 1.000
corr sestm1 .544 .281 .324 .201 .161 1.000
corr sestm2 .507 .225 .314 .172 .174 .546 1.000
corr intlm -.357 -.156 -.038 -.199 -.277 -.294 -.174 1.000
n . 122 122 122 122 122 122 122 122
std . 2.09 3.43 2.81 1.95 2.06 2.16 2.06 3.65
;
run;
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Use PROC CALIS to run confirmatory factor analysis:
proc calis data = jobsat;
/* path specifications */
path
Jsat --> jobsatm1 = 1, /* identification constraint */
Jsat --> jobsatm2, /* free parameter to be estimated */
Achiev --> achvm1 = 1, /* fixed to 1 */
Achiev --> achvm2, /* free parameter */
Selfes --> sestm1 = 1, /* fixed to 1 */
Selfes --> sestm2, /* free */
/* Variances of the error terms of observed indicator variables */
<--> jobsatm1, /* freely estimated */
<--> jobsatm2,
<--> achvm1,
<--> achvm2,
<--> sestm1,
<--> sestm2,
/* latent variable covariances
<--> Jsat Achiev Selfes /* freely estimated variances and covariances of the latent variables */
;
/* generate path diagram */
pathdiagram diagram=[init standardized] /* shows initial and standardized solutions */
exogcov /* shows correlations/covariances between factors */
title = "CFA for Job Satisfaction";
run;
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Initial path diagram produced by CALIS
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Estimated model (stadardized solution) with model fit summary
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Remark 1
Many SEM packages set initial identification constraints automatically.
Typically by fixing one loading for each factor equal to one. As discussed
earlier, this implies that the scale of the latent variable is fixed to that
particular variable. Also the coefficients of the error term paths are fixed
to one.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
The chi-square goodness of fit test statistic has value 3.92, which with 6degrees of freedom has p-value of 0.69, indicating that the model fits thedata.
On the basis of this short analysis our quick conclusion is that the
measures seem to be indicators of those concepts they are supposed to.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
1 The model
2 Model Evaluation
Chi-square Test
Some Other Statistics
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Model evaluation is an important step in empirical analysis.
The model extremes are:
Saturated model: no restrictions are imposed on the populationmoments.
Independence model: variables are uncorrelated
Modeling the population moments means imposing somerestrictions, implying that our proposed model is somewherebetween these extremes.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Simplicity: Models with relatively few parameters are preferred(the principle of parsimony).
At the same time a well fitting model is preferable to a poorlyfitting one.
Empirically the question is how well the model predicted covariancematrix
Σ = ΛΦΛ′ + Θδ (3)
matches with the sample covariance matrix S .
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
To asses the empirical fitting there are dozens of statistics.
These measures can be classified into different categories:
Measures of parsimony, Minimum sample discrepancy measures,Measures based on population discrepancy, Information-theoreticmeasures, Comparison to baseline model measures, Parsimonyadjusted measures, Goodness of Fit indexes, etc.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Chi-square Test
1 The model
2 Model Evaluation
Chi-square Test
Some Other Statistics
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Chi-square Test
The chi-square (χ2, CMIN in AMOS) statistic reported in theexamples, is perhaps one of the most popular statistic goodness offit statistic.
It can be classified to measure sample discrepancy.
Strictly speaking the null hypothesis it tests is:
H0 : x ∼ N(µ,Σ), (4)
where Σ is of the form (2), i.e., data generated according to ourhypnotized model.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Chi-square Test
The p-value indicates how plausible this hypothesis is.
A small p-value indicates discrepancy. A usual threshold is 5%,i.e., p < 0.05 is an indication that our model is not reallyconsistent with the data.
In Example 1, we found χ2 = 3.92, with 6 degrees of freedomproduces p-value 0.69, which suggest that the model fits well withthe data.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Chi-square Test
A derived measure is χ2/df (chi-square divided by the degrees offreedom).
The rule is that the ratio should be close to one.
In particular a ”large” (seems to be somewhere between 2 and 5)value represents an inadequate fit.
In the above Example 1 χ2/df < 1.
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Some Other Statistics
1 The model
2 Model Evaluation
Chi-square Test
Some Other Statistics
Seppo Pynnonen Applied Multivariate Analysis
The model Model Evaluation
Some Other Statistics
(a) Normed Fit Index (NFI) measures
The closer to 1 the better the fit (1 = perfect fit, 0 = no fit).
Should be > 0.90 (e.g. AMOS manual).
(b) Goodness of Fit Statistic (GFI)
The closer to 1 the better the fit (1 = perfect fit, 0 = no fit)
Threshold ?
(c) Root Mean Square of Approximation (RMSEA)
Should be ≤ 0.08. If > 0.1, the model should be improved.
Seppo Pynnonen Applied Multivariate Analysis