principal components analysis with spss karl l. wuensch dept of psychology east carolina university

52
Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Upload: abigayle-tesler

Post on 31-Mar-2015

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Principal Components Analysis with SPSS

Karl L. Wuensch

Dept of Psychology

East Carolina University

Page 2: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

When to Use PCA

• You have a set of p continuous variables.• You want to repackage their variance into

m components.• You will usually want m to be < p, but not

always.

Page 3: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Components and Variables

• Each component is a weighted linear combination of the variables

• Each variable is a weighted linear combination of the components.

pipiii XWXWXWC 2211

mmjjjj CACACAX 2211

Page 4: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Factors and Variables

• In Factor Analysis, we exclude from the solution any variance that is unique, not shared by the variables.

• Uj is the unique variance for Xj

jmmjjjj UFAFAFAX 2211

Page 5: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Goals of PCA and FA

• Data reduction.• Discover and summarize pattern of

intercorrelations among variables.• Test theory about the latent variables

underlying a set a measurement variables.• Construct a test instrument.• There are many others uses of PCA and

FA.

Page 6: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Data Reduction

• Ossenkopp and Mazmanian (Physiology and Behavior, 34: 935-941).

• 19 behavioral and physiological variables.• A single criterion variable, physiological

response to four hours of cold-restraint • Extracted five factors.• Used multiple regression to develop a

model for predicting the criterion from the five factors.

Page 7: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Exploratory Factor Analysis

• Want to discover the pattern of intercorrleations among variables.

• Wilt et al., 2005 (thesis).• Variables are items on the SOIS at ECU.• Found two factors, one evaluative, one on

difficulty of course.• Compared FTF students to DE students,

on structure and means.

Page 8: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Confirmatory Factor Analysis

• Have a theory regarding the factor structure for a set of variables.

• Want to confirm that the theory describes the observed intercorrelations well.

• Thurstone: Intelligence consists of seven independent factors rather than one global factor.

• Often done with SEM software

Page 9: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Construct A Test Instrument

• Write a large set of items designed to test the constructs of interest.

• Administer the survey to a sample of persons from the target population.

• Use FA to help select those items that will be used to measure each of the constructs of interest.

• Use Cronbach alpha to check reliability of resulting scales.

Page 10: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

An Unusual Use of PCA

• Poulson, Braithwaite, Brondino, and Wuensch (1997, Journal of Social Behavior and Personality, 12, 743-758).

• Simulated jury trial, seemingly insane defendant killed a man.

• Criterion variable = recommended verdict– Guilty– Guilty But Mentally Ill– Not Guilty By Reason of Insanity.

Page 11: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Predictor variables = jurors’ scores on 8 scales.

• Discriminant function analysis.• Problem with multicollinearity.• Used PCA to extract eight orthogonal

components.• Predicted recommended verdict from

these 8 components.• Transformed results back to the original

scales.

Page 12: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

A Simple, Contrived Example

• Consumers rate importance of seven characteristics of beer.– low Cost– high Size of bottle– high Alcohol content– Reputation of brand– Color– Aroma– Taste

Page 13: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• FACTBEER.SAV at http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm .

• Analyze, Data Reduction, Factor.• Scoot beer variables into box.

Page 14: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Click Descriptives and then check Initial Solution, Coefficients, KMO and Bartlett’s Test of Sphericity, and Anti-image. Click Continue.

Page 15: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Click Extraction and then select Principal Components, Correlation Matrix, Unrotated Factor Solution, Scree Plot, and Eigenvalues Over 1. Click Continue.

Page 16: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Click Rotation. Select Varimax and Rotated Solution. Click Continue.

Page 17: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Click Options. Select Exclude Cases Listwise and Sorted By Size. Click Continue.

• Click OK, and SPSS completes the Principal Components Analysis.

Page 18: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Variables 1

• Check the correlation matrix.• If there are any variables not well

correlated with some others, might as well delete them.

Page 19: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Variables 2

Correlation Matrix

cost size alcohol reputat color aroma taste

cost 1.00 .832 .767 -.406 .018 -.046 -.064

size .832 1.00 .904 -.392 .179 .098 .026

alcohol .767 .904 1.00 -.463 .072 .044 .012

reputat -.406 -.392 -.463 1.00 -.372 -.443 -.443

color .018 .179 .072 -.372 1.00 .909 .903 aroma -.046 .098 .044 -.443 .909 1.00 .870 taste -.064 .026 .012 -.443 .903 .870 1.00

Page 20: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Variables 3

• Bartlett’s test of sphericity tests null that the matrix is an identity matrix, but does not help identify individual variables that are not well correlated with others.

KMO and Bartle tt's Test

.665

1637.921

.000

Kaiser-Meyer-Olkin Measure of SamplingAdequacy.

Approx. Chi-SquaredfS ig.

Bartlett's Test ofSphericity

Page 21: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Variables 4• For each variable, check R2 between it

and the remaining variables.• SPSS reports these as the

initial communalities whenyou do a principal axisfactor analysis

• Delete any variable with alow R2 .

Page 22: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Correlations

• Look at partial correlations – pairs of variables with large partial correlations share variance with one another but not with the remaining variables – this is problematic.

• Kaiser’s MSA will tell you, for each variable, how much of this problem exists.

• The smaller the MSA, the greater the problem.

Page 23: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Correlations 2

• An MSA of .9 is marvelous, .5 miserable.• Variables with small MSAs should be

deleted• Or additional variables added that will

share variance with the troublesome variables.

Page 24: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Checking for Unique Correlations 3

Anti-image Matrices

cost size alcohol reputat color aroma taste

Anti-imageCorrelation

cost .779a -.543 .105 .256 .100 .135 -.105

size -.543 .550a -.806 -.109 -.495 .061 .435

alcohol.105 -.806 .630a .226 .381 -.060 -.310

reputat.256 -.109 .226 .763a -.231 .287 .257

color .100 -.495 .381 -.231 .590a -.574 -.693

aroma .135 .061 -.060 .287 -.574 .801a -.087

taste -.105 .435 -.310 .257 -.693 -.087 .676a

a. Measures of Sampling Adequacy (MSA) on main diagonal. Off diagonal are partial correlations x -1.

Page 25: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Extracting Principal Components 1

• From p variables we can extract p components.• Each of p eigenvalues represents the amount of

standardized variance that has been captured by one component.

• The first component accounts for the largest possible amount of variance.

• The second captures as much as possible of what is left over, and so on.

• Each is orthogonal to the others.

Page 26: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Extracting Principal Components 2

• Each variable has standardized variance = 1.

• The total standardized variance in the p variables = p.

• The sum of the m = p eigenvalues = p.• All of the variance is extracted.• For each component, the proportion of

variance extracted = eigenvalue / p.

Page 27: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Extracting Principal Components 3

• For our beer data, here are the eigenvalues and proportions of variance for the seven components:

3.313 47.327 47.3272.616 37.369 84.696.575 8.209 92.905.240 3.427 96.332.134 1.921 98.252

9.E-02 1.221 99.4734.E-02 .527 100.000

Component1234567

Total% of

VarianceCumulative

%

Initial Eigenvalues

Extrac tion Method: Princ ipal Component Analys is .

Page 28: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

How Many Components to Retain

• From p variables we can extract p components.

• We probably want fewer than p.• Simple rule: Keep as many as have

eigenvalues 1.• A component with eigenvalue < 1 captured

less than one variable’s worth of variance.

Page 29: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Visual Aid: Use a Scree Plot• Scree is rubble at base of cliff.• For our beer data,

Scree Plot

Co mp o n e n t Nu mb e r

7654321

Eig

en

va

lue

3. 5

3. 0

2. 5

2. 0

1. 5

1. 0

. 5

0. 0

Page 30: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Only the first two components have eigenvalues greater than 1.

• Big drop in eigenvalue between component 2 and component 3.

• Components 3-7 are scree.• Try a 2 component solution.• Should also look at solution with one fewer

and with one more component.

Page 31: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Less Subjective Methods

• Parallel Analysis and Velcier’s MAP test.• SAS, SPSS, Matlab scripts available at

https://people.ok.ubc.ca/brioconn/nfactors/nfactors.html

Page 32: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Parallel Analysis

• How many components account for more variance than do components derived from random data?

• Create 1,000 or more sets of random data.• Each with same number of cases and

variables as your data set.• For each set, find the eigenvalues.

Page 33: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• For the eigenvalues from the random sets, find the 95th percentile for each component.

• Retain as many components for which the eigenvalue from your data exceeds the 95th percentile from the random data sets.

Page 34: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Random Data Eigenvalues Root Prcntyle 1.000000 1.344920 2.000000 1.207526 3.000000 1.118462 4.000000 1.038794 5.000000 .973311 6.000000 .907173 7.000000 .830506

• Our data yielded eigenvalues of 3.313, 2.616, and 0.575.

• Retain two components

Page 35: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Velicer’s MAP Test

• Step by step, extract increasing numbers of components.

• At each step, determine how much common variance is left in the residuals.

• Retain all steps up to and including that producing the smallest residual common variance.

Page 36: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Velicer's Minimum Average Partial (MAP) Test: Velicer's Average Squared Correlations .000000 .266624 1.000000 .440869 2.000000 .129252 3.000000 .170272 4.000000 .331686 5.000000 .486046 6.000000 1.000000 The smallest average squared correlation is .129252 The number of components is 2

Page 37: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Which Test to Use?

• Parallel analysis tends to overextract.• MAP tends to underextract.• If they disagree, increase number of

random sets in the parallel analysis• And inspect carefully the two smallest

values from the MAP test.• May need apply the meaningfulness

criterion.

Page 38: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Loadings, Unrotated and Rotated

• loading matrix = factor pattern matrix = component matrix.

• Each loading is the Pearson r between one variable and one component.

• Since the components are orthogonal, each loading is also a β weight from predicting X from the components.

• Here are the unrotated loadings for our 2 component solution:

Page 39: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• All variables load well on first component, economy and quality vs. reputation.

• Second component is more interesting, economy versus quality.

Com ponent Matr ixa

.760 -.576

.736 -.614-.735 -.071.710 -.646.550 .734.632 .699.667 .675

COLORAROMAREPUTATTASTECOSTALCOHOLSIZE

1 2Component

Extraction Method: Principal Component Analysis.

2 components extracted.a.

Page 40: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Rotate these axes so that the two dimensions pass more nearly through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE).

• The number of degrees by which I rotate the axes is the angle PSI. For these data, rotating the axes -40.63 degrees has the desired effect.

Page 41: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Component 1 = Quality versus reputation.• Component 2 = Economy (or cheap drunk)

versus reputation.

Rotated Com ponent Matr ixa

.960 -.028

.958 1.E-02

.952 6.E-027.E-02 .9472.E-02 .942

-.061 .916-.512 -.533

TASTEAROMACOLORSIZEALCOHOLCOSTREPUTAT

1 2Component

E xtraction Method: P rincipal Component A nalysis. Rotation Method: Varimax with K aiser Normalization.

Rotation converged in 3 iterations .a.

Page 42: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Number of Components in the Rotated Solution

• Try extracting one fewer component, try one more component.

• Which produces the more sensible solution?• Error = difference in obtained structure and true

structure.• Overextraction (too many components)

produces less error than underextraction.• If there is only one true factor and no unique

variables, can get “factor splitting.”

Page 43: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• In this case, first unrotated factor true factor.

• But rotation splits the factor, producing an imaginary second factor and corrupting the first.

• Can avoid this problem by including a garbage variable that will be removed prior to the final solution.

Page 44: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Explained Variance

• Square the loadings and then sum them across variables.

• Get, for each component, the amount of variance explained.

• Prior to rotation, these are eigenvalues.• Here are the SSL for our data, after rotation:

Page 45: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• After rotation the two components together account for (3.02 + 2.91) / 7 = 85% of the total variance.

Total Variance Explained

3.017 43.101 43.1012.912 41.595 84.696

Component12

Total% of

VarianceCumulative

%

Rotation Sums of SquaredLoadings

Extrac tion Method: Princ ipal Component Analys is .

Page 46: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• If the last component has a small SSL, one should consider dropping it.

• If SSL = 1, the component has extracted one variable’s worth of variance.

• If only one variable loads well on a component, the component is not well defined.

• If only two load well, it may be reliable, if the two variables are highly correlated with one another but not with other variables.

Page 47: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Naming Components

• For each component, look at how it is correlated with the variables.

• Try to name the construct represented by that factor.

• If you cannot, perhaps you should try a different solution.

• I have named our components “aesthetic quality” and “cheap drunk.”

Page 48: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Communalities

• For each variable, sum the squared loadings across components.

• This gives you the R2 for predicting the variable from the components,

• which is the proportion of the variable’s variance which has been extracted by the components.

Page 49: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• Here are the communalities for our beer data. “Initial” is with all 7 components, “Extraction” is for our 2 component solution.

Com m unalities

1.000 .8421.000 .9011.000 .8891.000 .5461.000 .9101.000 .9181.000 .922

COSTSIZEALCOHOLREPUTATCOLORAROMATASTE

Initial Extraction

Extraction Method: Principal Component Analysis.

Page 50: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Orthogonal Rotations

• Varimax -- minimize the complexity of the components by making the large loadings larger and the small loadings smaller within each component.

• Quartimax -- makes large loadings larger and small loadings smaller within each variable.

• Equamax – a compromize between these two.

Page 51: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Oblique Rotations

• Axes drawn through the two clusters in the upper right quadrant would not be perpendicular.

Page 52: Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

• May better fit the data with axes that are not perpendicular, but at the cost of having components that are correlated with one another.

• More on this later.