applied statistics lecture_5

1

Introduction to applied statistics

& applied statistical methods

Prof. Dr. Chang Zhu1

Overview

•Constructs vs. Variables

•Validity and Reliability concept

•Reliability analysis

•Factor analysis (theoretical ground)

•Practice: Factor and Reliability analysis

2

construct vs. variables

• Constructs are usually defined as unobservable latent

variables.

• Example: the construct of teaching effectiveness.

Several variables are used to allow the measurement of

such construct (usually several scale items are used)

because the construct may include several dimensions.

• Unlike variables directly measured such as speed,

height, weight, etc., some variables such as egoism,

creativity, happiness, satisfaction, learning conceptions,

learning styles, teaching styles, self-regulation…. are not

a single measurable entity.3

construct vs. variables

• In science, theoretical constructs are often

unobservable things.

• Even when things are observable, measurement

error means often there is a need to calculate

“summary” variables.

� A good test/instrument/questionnaire should have:

validity

reliability4

3

validity and reliability

• Validity refers to a test's accuracy. A test isvalid when it measures what it is intended tomeasure.

• Reliability is used to measure the extent towhich an item, scale, or instrument will yieldthe same score when administered indifferent times, locations, or populations,when the two administrations do not differ inrelevant variables.

validity: types

•Content validity: obtain information about an

examinee’s familiarity with a particular content or

behavior domain

•Criterion-related validity: measure standing or

performance on an external criterion

•Construct/factorial validity: to determine the extent to

which an examinee possesses a particular

hypothetical trait, including:

discriminant validity

convergent validity � assessed by factor analysis

4

construct : convergent and

discriminant validity

Do items in the test have

• high correlations with measures of the same

trait (convergent validity)? and

• low correlations with measures of unrelated

traits (discriminant validity)?

construct validity

Construct validity is the most theory-laden ofthe methods of test validation.

A test or instrument is designed to measure aconstruct begins with a theory about thenature of the construct.

5

reliability

• When a test is reliable, it provides dependable,

consistent results. The term consistency is often

given as a synonym for reliability (e.g., Anastasi,

1988).

Consistency = Reliability

• The degree of consistency between two

measures of the same thing (Mehrens and

Lehman, 1987).

• The measure of how stable, dependable,

trustworthy, and consistent a test is in measuring

the same thing each time (Worthen et al., 1993)

reliability

• To measure how much of the scores reflects

"truth" and how much reflects error. It is a

measure of reliability that provides us with an

estimate of the proportion of variability in

examinees' obtained scores that is due to true

differences among examinees on the attribute(s)

measured by a test.

6

reliability

Intrinsic

motivation

Extrinsic

motivation

reliability analysis

• Reliability analysis allows you to study theproperties of measurement scales and theitems that make them up.

• Test the extent to which the items in yourquestionnaire are related to each other

• Cronbach’s alpha is the most common usedmeasure of reliability (internal consistency).

• The commonly accepted value of α is .7

7

Factor Analysis

• The assumption of factor analysis is that

underlying dimensions (factors) can be used to

explain complex phenomena.

• Observed correlations between variables result

from their sharing of factors.

©

13

Factor Analysis

• Factor analysis measures not directly observable

constructs by measuring several of its underlying

dimensions.

• The identification of such underlying dimensions

(factors) simplifies the understanding and description

of complex constructs.

• From this angle, factor analysis is viewed as a data-

reduction technique as it reduces a large number of

overlapping variables to a smaller set of factors that

reflect construct(s) or different dimensions of

construct(s).14

8

Factor Analysis

• A major goal of factor analysis is to represent

relationships among sets of variables

parsimoniously yet keeping factors meaningful.

• A good factor solution is both simple and

interpretable.

• When factors can be interpreted, new insights

are possible.

15

Factor Analysis

• Factor analysis is commonly used in:

– Data reduction

– Scale development

– The evaluation of the psychometric quality of a measure, and

– The assessment of the dimensionality of a set of variables.

16

9

An example, a questionnaire of 30 items

5 factors are identified for the 30 item questionnaire

10

Application of Factor Analysis

• Examine three common applications of factor

analysis:

– Defining indicators of constructs (1)

– Defining dimensions for an existing measure (2)

– Selecting items or scales to be included in a

measure (3)

19

Application of Factor Analysis (1)

Defining indicators of constructs:

� Ideally 4 or more measures should be chosen

to represent each construct of interest.

� The choice of measures should, as much as

possible, be guided by theory, previous

research, and logic.

20

11


• Why do you go to college?

Which indicators measure intrinsic motivation:

1.Honestly I don’t know, I feel that I am wasting

time.

2.Because I experience satisfaction when learning

new things.

3.For the pleasure I experience in broadening my

knowledge about subjects that appeal me.

4.For the pleasure I experience when I discover

new things never seen before


� Defining dimensions for an existing measure:

In this case the variables to be analyzed are

chosen by the initial researcher

Factor analysis is performed on a predetermined

set of items/scales.

Results of factor analysis may not always be

satisfactory:

�The items or scales may be poor indicators of

the construct or constructs.

�There may be too few items or scales to

represent each underlying dimension.

12


� Selecting items or scales to be included in ameasure.

o Factor analysis may be conducted todetermine what items or scales should beincluded and excluded from a measure.

o Results of the analysis should not be usedalone in making decisions of inclusions orexclusions. Decisions should be taken inconjunction with the theory and what is knownabout the construct(s) that the items or scalesassess.

23

Steps in Factor Analysis

• Factor analysis usually proceeds in four steps:

� 1st step: evaluate the sample adequacy based

on the correlation matrix

� 2nd step: factor extraction

� 3rd step: factor rotation

� 4th step: make final decisions about the

number of underlying factors

24

13

Factor analysis

Step 1: The Correlation Matrix

– Generate a correlation matrix for all variables

– Identify variables not related to other variables

– If the correlation between variables are small, it is

unlikely that they share common factors (variables

must be related to each other for the factor model to

be appropriate).

– Think of correlations in absolute value.

– Correlation coefficients greater than 0.3 in absolute

value are indicative of acceptable correlations.

– Examine visually the appropriateness of the factormodel.

25

Factor analysis

Step 1: The Correlation Matrix

In SPSS:

• The Kaiser-Meyer-Olkin of sampling adequacy

(KMO) should be greater than .5 to be

acceptable.

• Barlett’s test should be significant to indicate

variables are relatively independent from one

another.

26

14

� The primary objective of this stage is to

determine the factors.

� Initial decisions can be made here about the

number of factors underlying a set of measured

variables.

� Estimates of initial factors are obtained using

Principal components analysis.

� The principal components analysis is the most

commonly used extraction method.27

Factor analysis

Step 2: Factor extraction

• In principal components analysis, linear combinations of

the observed variables are formed.

• The 1st principal component is the combination that

accounts for the largest amount of variance in the

sample (1st extracted factor).

• The 2nd principle component accounts for the next

largest amount of variance and is uncorrelated with the

first (2nd extracted factor).

• Successive components explain progressively smaller

portions of the total sample variance, and all are

uncorrelated with each other.28

Factor analysis


15

• To decide on how many factors we need to

represent the data, we use 2 statistical criteria:

– Eigen Values, and

– The Scree Plot

29

Factor analysis


• The determination of thenumber of factors is usuallydone by considering onlyfactors with Eigen valuesgreater than 1.

• Factors with a variance lessthan 1 are no better than asingle variable, since eachvariable is expected to havea variance of 1.

30

Total Variance Explained

Comp

onent

Initial Eigenvalues

Extraction Sums of Squared

Loadings

Total

% of

Variance

Cumulativ

e % Total

% of

Variance

Cumulativ

e %

1 3.046 30.465 30.465 3.046 30.465 30.465

2 1.801 18.011 48.476 1.801 18.011 48.476

3 1.009 10.091 58.566 1.009 10.091 58.566

4 .934 9.336 67.902

5 .840 8.404 76.307

6 .711 7.107 83.414

7 .574 5.737 89.151

8 .440 4.396 93.547

9 .337 3.368 96.915

10 .308 3.085 100.000

Extraction Method: Principal Component Analysis.

Factor analysis


16

• The examination of the Scree plotprovides a visual of the total varianceassociated with each factor.

• The steep slope shows the largefactors.

• The gradual trailing off (scree) showsthe rest of the factors usually lowerthan an Eigen value of 1.

• In choosing the number of factors, inaddition to the statistical criteria, oneshould make initial decisions basedon conceptual and theoreticalgrounds.

• At this stage, the decision about thenumber of factors is not final.31

Factor analysis


32

Component Matrixa

Component

1 2 3

I discussed my frustrations and feelings with person(s) in school .771 -.271 .121

I tried to develop a step-by-step plan of action to remedy the problems .545 .530 .264

I expressed my emotions to my family and close friends .580 -.311 .265

I read, attended workshops, or sought someother educational approach to correct the

problem

.398 .356 -.374

I tried to be emotionally honest with my self about the problems .436 .441 -.368

I sought advice from others on how I should solve the problems .705 -.362 .117

I explored the emotions caused by the problems .594 .184 -.537

I took direct action to try to correct the problems .074 .640 .443

I told someone I could trust about how I felt about the problems .752 -.351 .081

I put aside other activities so that I could work to solve the problems .225 .576 .272


a. 3 components extracted.

Component Matrix using Principle Component Analysis

Factor analysis


17

• In this step, factors are rotated.

• Un-rotated factors are typically not very interpretable

(most factors are correlated with many variables).

• Factors are rotated to make them more meaningful and

easier to interpret (each variable is associated with a

minimal number of factors).

• Different rotation methods may result in the identification

of somewhat different factors.

33

Factor analysis

Step 3: Factor rotation

• The most popular rotational method is Varimax

rotations (factors are theoretically independent)

• Varimax use orthogonal rotations yielding

uncorrelated factors/components.

• Varimax attempts to minimize the number of

variables that have high loadings on a factor.

This enhances the interpretability of the factors.

34

Factor analysis

Step 3: Factor rotation

18

• 4th Step: Making final decisions

– The final decision about the number of factors to choose is the number of factors for the rotated solution that is most interpretable.

– To identify factors, group variables that have large loadings for the same factor.

– Plots of loadings provide a visual for variable clusters.

– Interpret factors according to the meaning of the variables

• This decision should be guided by:

– A priori conceptual beliefs about the number of factors from past research or theory

– Eigen values computed in step 2.

– The relative interpretability of rotated solutions computed in step 3.35

Factor analysis

Step 4: Making final decisions

Practice

19

Practice: conduct factor and

reliability analyses

• A researcher has generated a new questionnaire which

is designed to measure happiness. The questionnaire

that she has generated has 10 items on it and she has

collected responses from 200 respondents.

• The questionnaire is measured on a five point scale

where 1 = strongly disagree and 5 = strongly agree.

• The data file is named Happy_measure.sav

• This example is taken from:

http://wps.pearsoned.co.uk/ema_uk_he_dancey_statsmath_4/84/21627/5536653.cw/content/index.html

In SPSS: Factor analysis

• Analyze > Dimension Reduction > Factor

• Move all the variables to the Items list

20

In SPSS: Descriptives options

• Select all the options in the Descriptives dialog box

In SPSS: Extraction method

• Method: Principal components

• Analyze: correlation matrix and Scree plot

• Eigenvalues greater than 1

21

In SPSS: Rotation method

• Choose Varimax as the rotation method

In SPSS: Factor scores

• Choose Anderson-Rubin as method of calculating

22

In SPSS: Options

• Choose Exclude case listwise for missing values

• Absolute value below: .4

Preliminary analysis

• The first table we should look at is labeledKMO and Barlett’s Test. The KMO value is.79 (above .05) and the Barlett’s test issignificant (p < .001), which indicates thatthe sample is adequate for factor analysis.

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .790

Bartlett's Test of Sphericity Approx. Chi-Square 819.746

df 45

Sig. .000

23

How many factors to extract?

• eigenvalues

scree plot

Total Variance Explained

Component

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %

1 3.186 31.862 31.862 3.186 31.862 31.862 3.170 31.699 31.699

2 2.928 29.279 61.140 2.928 29.279 61.140 2.944 29.442 61.140

3 .757 7.569 68.710

4 .658 6.583 75.293

5 .637 6.369 81.662

6 .522 5.220 86.882

7 .429 4.290 91.171

8 .380 3.801 94.973

9 .316 3.155 98.128

10 .187 1.872 100.000


interpretation

• Examine the underlying theme

Rotated Component Matrixa

Component1 2

Q8_I want to go out and party .892Q7_I want to contact friends & family .837Q9_The people at work inspire me .779Q2_I have lots of friends .754Q3_I love meeting people .694Q6_I have a lot to look forward to .825Q10_I feel excited at the start of each day .802Q4_I feel full of energy .801Q1_I feel enthusiastic .748Q5_I have lots of interesting things to do .647

24

In SPSS: Reliability analysis

Based on the factor analysis, we have 2 factors

extracted or 2 sub-scales and the respective items as

below:

• Sub-scale 1 (sociability): Q2, 3, 7, 8, and 9

• Sub-scale 2 (positive feeling): Q1, 4, 5, 6, 10

We will calculate the Cronbach’s α for sub-scale 1 first.

In SPSS: Factor analysis

• Analyze > Scale > Reliability

• Move the variables Q2, 3, 7, 8, and 9 to the Items list

• In the output, the table Reliability Statistics tells us that

the internal consistency of the 5 items is measured with

α = .851 (which is high).

25

Reporting the results

• Description of the analysis

• Table of factor loadings

(practical guideline page 8 and 9)

Reporting the results

• A principal component analysis (PCA) was conducted on the 10items with orthogonal rotation (varimax). The Kaiser-Meyer-Olkinmeasure verified the sampling adequacy for the analysis: KMO = .79which is good according to Field (2009). All KMO values forindividual items are well above the acceptable limit of .50 (Field,2009). Bartlett’s test of spherity χ² (45) = 819.746, p < .001,indicated that correlations between items were sufficiently large forPCA. Two components had eigenvalues over Kaiser’s criterion of 1and in combination explained 61.14% of the variance. The scree plotalso supports a two-factor structure. Table 1 shows the factorloadings after rotation. The items that cluster on the same factorssuggest that factor 1 represents sociability and factor 2 positivefeeling.

Analysis description

26

Assignment 5

• Detail:

Lecture 5_practical guidelines_assignment(p. 9)

Deadline: November 24, 2014

• Questions?

52

applied statistics lecture_5

Documents

validity reliability

reliability validity

validity construct validity

variables validity

measure of reliability

trait convergent validity

types content validity

variables constructs