applied statistics lecture_5
TRANSCRIPT
1
Introduction to applied statistics
& applied statistical methods
Prof. Dr. Chang Zhu1
Overview
•Constructs vs. Variables
•Validity and Reliability concept
•Reliability analysis
•Factor analysis (theoretical ground)
•Practice: Factor and Reliability analysis
2
construct vs. variables
• Constructs are usually defined as unobservable latent
variables.
• Example: the construct of teaching effectiveness.
Several variables are used to allow the measurement of
such construct (usually several scale items are used)
because the construct may include several dimensions.
• Unlike variables directly measured such as speed,
height, weight, etc., some variables such as egoism,
creativity, happiness, satisfaction, learning conceptions,
learning styles, teaching styles, self-regulation…. are not
a single measurable entity.3
construct vs. variables
• In science, theoretical constructs are often
unobservable things.
• Even when things are observable, measurement
error means often there is a need to calculate
“summary” variables.
� A good test/instrument/questionnaire should have:
validity
reliability4
3
validity and reliability
• Validity refers to a test's accuracy. A test isvalid when it measures what it is intended tomeasure.
• Reliability is used to measure the extent towhich an item, scale, or instrument will yieldthe same score when administered indifferent times, locations, or populations,when the two administrations do not differ inrelevant variables.
validity: types
•Content validity: obtain information about an
examinee’s familiarity with a particular content or
behavior domain
•Criterion-related validity: measure standing or
performance on an external criterion
•Construct/factorial validity: to determine the extent to
which an examinee possesses a particular
hypothetical trait, including:
discriminant validity
convergent validity � assessed by factor analysis
4
construct : convergent and
discriminant validity
Do items in the test have
• high correlations with measures of the same
trait (convergent validity)? and
• low correlations with measures of unrelated
traits (discriminant validity)?
construct validity
Construct validity is the most theory-laden ofthe methods of test validation.
A test or instrument is designed to measure aconstruct begins with a theory about thenature of the construct.
5
reliability
• When a test is reliable, it provides dependable,
consistent results. The term consistency is often
given as a synonym for reliability (e.g., Anastasi,
1988).
Consistency = Reliability
• The degree of consistency between two
measures of the same thing (Mehrens and
Lehman, 1987).
• The measure of how stable, dependable,
trustworthy, and consistent a test is in measuring
the same thing each time (Worthen et al., 1993)
reliability
• To measure how much of the scores reflects
"truth" and how much reflects error. It is a
measure of reliability that provides us with an
estimate of the proportion of variability in
examinees' obtained scores that is due to true
differences among examinees on the attribute(s)
measured by a test.
6
reliability
Intrinsic
motivation
Extrinsic
motivation
reliability analysis
• Reliability analysis allows you to study theproperties of measurement scales and theitems that make them up.
• Test the extent to which the items in yourquestionnaire are related to each other
• Cronbach’s alpha is the most common usedmeasure of reliability (internal consistency).
• The commonly accepted value of α is .7
7
Factor Analysis
• The assumption of factor analysis is that
underlying dimensions (factors) can be used to
explain complex phenomena.
• Observed correlations between variables result
from their sharing of factors.
©
13
Factor Analysis
• Factor analysis measures not directly observable
constructs by measuring several of its underlying
dimensions.
• The identification of such underlying dimensions
(factors) simplifies the understanding and description
of complex constructs.
• From this angle, factor analysis is viewed as a data-
reduction technique as it reduces a large number of
overlapping variables to a smaller set of factors that
reflect construct(s) or different dimensions of
construct(s).14
8
Factor Analysis
• A major goal of factor analysis is to represent
relationships among sets of variables
parsimoniously yet keeping factors meaningful.
• A good factor solution is both simple and
interpretable.
• When factors can be interpreted, new insights
are possible.
15
Factor Analysis
• Factor analysis is commonly used in:
– Data reduction
– Scale development
– The evaluation of the psychometric quality of a measure, and
– The assessment of the dimensionality of a set of variables.
16
9
An example, a questionnaire of 30 items
5 factors are identified for the 30 item questionnaire
10
Application of Factor Analysis
• Examine three common applications of factor
analysis:
– Defining indicators of constructs (1)
– Defining dimensions for an existing measure (2)
– Selecting items or scales to be included in a
measure (3)
19
Application of Factor Analysis (1)
Defining indicators of constructs:
� Ideally 4 or more measures should be chosen
to represent each construct of interest.
� The choice of measures should, as much as
possible, be guided by theory, previous
research, and logic.
20
11
Application of Factor Analysis (1)
• Why do you go to college?
Which indicators measure intrinsic motivation:
1.Honestly I don’t know, I feel that I am wasting
time.
2.Because I experience satisfaction when learning
new things.
3.For the pleasure I experience in broadening my
knowledge about subjects that appeal me.
4.For the pleasure I experience when I discover
new things never seen before
Application of Factor Analysis (2)
� Defining dimensions for an existing measure:
In this case the variables to be analyzed are
chosen by the initial researcher
Factor analysis is performed on a predetermined
set of items/scales.
Results of factor analysis may not always be
satisfactory:
�The items or scales may be poor indicators of
the construct or constructs.
�There may be too few items or scales to
represent each underlying dimension.
12
Application of Factor Analysis (3)
� Selecting items or scales to be included in ameasure.
o Factor analysis may be conducted todetermine what items or scales should beincluded and excluded from a measure.
o Results of the analysis should not be usedalone in making decisions of inclusions orexclusions. Decisions should be taken inconjunction with the theory and what is knownabout the construct(s) that the items or scalesassess.
23
Steps in Factor Analysis
• Factor analysis usually proceeds in four steps:
� 1st step: evaluate the sample adequacy based
on the correlation matrix
� 2nd step: factor extraction
� 3rd step: factor rotation
� 4th step: make final decisions about the
number of underlying factors
24
13
Factor analysis
Step 1: The Correlation Matrix
– Generate a correlation matrix for all variables
– Identify variables not related to other variables
– If the correlation between variables are small, it is
unlikely that they share common factors (variables
must be related to each other for the factor model to
be appropriate).
– Think of correlations in absolute value.
– Correlation coefficients greater than 0.3 in absolute
value are indicative of acceptable correlations.
– Examine visually the appropriateness of the factormodel.
25
Factor analysis
Step 1: The Correlation Matrix
In SPSS:
• The Kaiser-Meyer-Olkin of sampling adequacy
(KMO) should be greater than .5 to be
acceptable.
• Barlett’s test should be significant to indicate
variables are relatively independent from one
another.
26
14
� The primary objective of this stage is to
determine the factors.
� Initial decisions can be made here about the
number of factors underlying a set of measured
variables.
� Estimates of initial factors are obtained using
Principal components analysis.
� The principal components analysis is the most
commonly used extraction method.27
Factor analysis
Step 2: Factor extraction
• In principal components analysis, linear combinations of
the observed variables are formed.
• The 1st principal component is the combination that
accounts for the largest amount of variance in the
sample (1st extracted factor).
• The 2nd principle component accounts for the next
largest amount of variance and is uncorrelated with the
first (2nd extracted factor).
• Successive components explain progressively smaller
portions of the total sample variance, and all are
uncorrelated with each other.28
Factor analysis
Step 2: Factor extraction
15
• To decide on how many factors we need to
represent the data, we use 2 statistical criteria:
– Eigen Values, and
– The Scree Plot
29
Factor analysis
Step 2: Factor extraction
• The determination of thenumber of factors is usuallydone by considering onlyfactors with Eigen valuesgreater than 1.
• Factors with a variance lessthan 1 are no better than asingle variable, since eachvariable is expected to havea variance of 1.
30
Total Variance Explained
Comp
onent
Initial Eigenvalues
Extraction Sums of Squared
Loadings
Total
% of
Variance
Cumulativ
e % Total
% of
Variance
Cumulativ
e %
1 3.046 30.465 30.465 3.046 30.465 30.465
2 1.801 18.011 48.476 1.801 18.011 48.476
3 1.009 10.091 58.566 1.009 10.091 58.566
4 .934 9.336 67.902
5 .840 8.404 76.307
6 .711 7.107 83.414
7 .574 5.737 89.151
8 .440 4.396 93.547
9 .337 3.368 96.915
10 .308 3.085 100.000
Extraction Method: Principal Component Analysis.
Factor analysis
Step 2: Factor extraction
16
• The examination of the Scree plotprovides a visual of the total varianceassociated with each factor.
• The steep slope shows the largefactors.
• The gradual trailing off (scree) showsthe rest of the factors usually lowerthan an Eigen value of 1.
• In choosing the number of factors, inaddition to the statistical criteria, oneshould make initial decisions basedon conceptual and theoreticalgrounds.
• At this stage, the decision about thenumber of factors is not final.31
Factor analysis
Step 2: Factor extraction
32
Component Matrixa
Component
1 2 3
I discussed my frustrations and feelings with person(s) in school .771 -.271 .121
I tried to develop a step-by-step plan of action to remedy the problems .545 .530 .264
I expressed my emotions to my family and close friends .580 -.311 .265
I read, attended workshops, or sought someother educational approach to correct the
problem
.398 .356 -.374
I tried to be emotionally honest with my self about the problems .436 .441 -.368
I sought advice from others on how I should solve the problems .705 -.362 .117
I explored the emotions caused by the problems .594 .184 -.537
I took direct action to try to correct the problems .074 .640 .443
I told someone I could trust about how I felt about the problems .752 -.351 .081
I put aside other activities so that I could work to solve the problems .225 .576 .272
Extraction Method: Principal Component Analysis.
a. 3 components extracted.
Component Matrix using Principle Component Analysis
Factor analysis
Step 2: Factor extraction
17
• In this step, factors are rotated.
• Un-rotated factors are typically not very interpretable
(most factors are correlated with many variables).
• Factors are rotated to make them more meaningful and
easier to interpret (each variable is associated with a
minimal number of factors).
• Different rotation methods may result in the identification
of somewhat different factors.
33
Factor analysis
Step 3: Factor rotation
• The most popular rotational method is Varimax
rotations (factors are theoretically independent)
• Varimax use orthogonal rotations yielding
uncorrelated factors/components.
• Varimax attempts to minimize the number of
variables that have high loadings on a factor.
This enhances the interpretability of the factors.
34
Factor analysis
Step 3: Factor rotation
18
• 4th Step: Making final decisions
– The final decision about the number of factors to choose is the number of factors for the rotated solution that is most interpretable.
– To identify factors, group variables that have large loadings for the same factor.
– Plots of loadings provide a visual for variable clusters.
– Interpret factors according to the meaning of the variables
• This decision should be guided by:
– A priori conceptual beliefs about the number of factors from past research or theory
– Eigen values computed in step 2.
– The relative interpretability of rotated solutions computed in step 3.35
Factor analysis
Step 4: Making final decisions
Practice
19
Practice: conduct factor and
reliability analyses
• A researcher has generated a new questionnaire which
is designed to measure happiness. The questionnaire
that she has generated has 10 items on it and she has
collected responses from 200 respondents.
• The questionnaire is measured on a five point scale
where 1 = strongly disagree and 5 = strongly agree.
• The data file is named Happy_measure.sav
• This example is taken from:
http://wps.pearsoned.co.uk/ema_uk_he_dancey_statsmath_4/84/21627/5536653.cw/content/index.html
In SPSS: Factor analysis
• Analyze > Dimension Reduction > Factor
• Move all the variables to the Items list
20
In SPSS: Descriptives options
• Select all the options in the Descriptives dialog box
In SPSS: Extraction method
• Method: Principal components
• Analyze: correlation matrix and Scree plot
• Eigenvalues greater than 1
21
In SPSS: Rotation method
• Choose Varimax as the rotation method
In SPSS: Factor scores
• Choose Anderson-Rubin as method of calculating
22
In SPSS: Options
• Choose Exclude case listwise for missing values
• Absolute value below: .4
Preliminary analysis
• The first table we should look at is labeledKMO and Barlett’s Test. The KMO value is.79 (above .05) and the Barlett’s test issignificant (p < .001), which indicates thatthe sample is adequate for factor analysis.
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .790
Bartlett's Test of Sphericity Approx. Chi-Square 819.746
df 45
Sig. .000
23
How many factors to extract?
• eigenvalues
scree plot
Total Variance Explained
Component
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.186 31.862 31.862 3.186 31.862 31.862 3.170 31.699 31.699
2 2.928 29.279 61.140 2.928 29.279 61.140 2.944 29.442 61.140
3 .757 7.569 68.710
4 .658 6.583 75.293
5 .637 6.369 81.662
6 .522 5.220 86.882
7 .429 4.290 91.171
8 .380 3.801 94.973
9 .316 3.155 98.128
10 .187 1.872 100.000
Extraction Method: Principal Component Analysis.
interpretation
• Examine the underlying theme
Rotated Component Matrixa
Component1 2
Q8_I want to go out and party .892Q7_I want to contact friends & family .837Q9_The people at work inspire me .779Q2_I have lots of friends .754Q3_I love meeting people .694Q6_I have a lot to look forward to .825Q10_I feel excited at the start of each day .802Q4_I feel full of energy .801Q1_I feel enthusiastic .748Q5_I have lots of interesting things to do .647
24
In SPSS: Reliability analysis
Based on the factor analysis, we have 2 factors
extracted or 2 sub-scales and the respective items as
below:
• Sub-scale 1 (sociability): Q2, 3, 7, 8, and 9
• Sub-scale 2 (positive feeling): Q1, 4, 5, 6, 10
We will calculate the Cronbach’s α for sub-scale 1 first.
In SPSS: Factor analysis
• Analyze > Scale > Reliability
• Move the variables Q2, 3, 7, 8, and 9 to the Items list
• In the output, the table Reliability Statistics tells us that
the internal consistency of the 5 items is measured with
α = .851 (which is high).
25
Reporting the results
• Description of the analysis
• Table of factor loadings
(practical guideline page 8 and 9)
Reporting the results
• A principal component analysis (PCA) was conducted on the 10items with orthogonal rotation (varimax). The Kaiser-Meyer-Olkinmeasure verified the sampling adequacy for the analysis: KMO = .79which is good according to Field (2009). All KMO values forindividual items are well above the acceptable limit of .50 (Field,2009). Bartlett’s test of spherity χ² (45) = 819.746, p < .001,indicated that correlations between items were sufficiently large forPCA. Two components had eigenvalues over Kaiser’s criterion of 1and in combination explained 61.14% of the variance. The scree plotalso supports a two-factor structure. Table 1 shows the factorloadings after rotation. The items that cluster on the same factorssuggest that factor 1 represents sociability and factor 2 positivefeeling.
Analysis description
26
Assignment 5
• Detail:
Lecture 5_practical guidelines_assignment(p. 9)
Deadline: November 24, 2014
• Questions?
52