factor analysis sociology 229, class 10 copyright © 2010 by evan schofer do not copy or distribute...
Post on 20-Dec-2015
215 views
TRANSCRIPT
Factor Analysis
Sociology 229, Class 10
Copyright © 2010 by Evan SchoferDo not copy or distribute without permission
Announcements
• Assignment #6 Due• Assignments 4 & 5 handed back
• Today’s Class:• Guest: Andrew Penner: Quantile Regression
– Break
• Factor analysis.
Factor Analysis• Factor analysis is an exploratory tool
• Often called “Exploratory Factor Analysis”• Helps identify simple patterns that underlie complex
multivariate data– Not about hypothesis testing– Rather, it is more like data mining
• And also helps us understand some principles of SEM
– Note: Factor analysis is informally used to refer to two different methods
• Factor analysis (FA)• Principle component analysis (PCA)• Differences aren’t critical here
– I will focus on FA– Most of lecture will apply to PCA.
Factor Analysis
• The basic idea: FA seeks to identify a small number of “underlying variables” that effectively summarize multivariate data
• Ex: Suppose we have many political opinion variables– Approval of president; environmental views; etc.
• Perhaps one unmeasured “factor” accounts for people’s positions on all those variables…
– Ex: Liberalism vs. conservatism…
• FA seeks to identify common patterns– But, it is up to the researcher to determine what the underlying
pattern really means…
Factor Analysis: ‘Depression’
• Suppose we believe in a theoretical construct such as “depression”.
• There is no single variable that perfectly measures it… but we believe it exists
• Hypothetical questions:• HAPPY: How happy are you? (1-10)• WORLDGOOD: How much do you agree with the
statement that “The world is a good place”? (1-5)• HOPELESS: Do you often feel hopeless? (1-5)• SAD: Do you often feel sad? (1-5)• TIRED: Do you often feel tired or discouraged? (1-10)
Example: ‘Depression’
• Strategy 1: We could ask many questions & create an index that combines all measures
• Note: we would have to flip signs on some measures• “Happy” would have to be reversed to effectively
measure ‘depression’
• Strategy 2: We could ask many questions and then conduct a factor analysis
• To see if answers to questions exhibit an underlying pattern (which we could label “depression”).
Factor Analysis: Depression• Hypothetical results from a factor analysis:
Factor Loadings
Factor 1 Factor 2
Happy -.86 …
WorldGood -.75 …
Hopeless .92 …
Sad .95 …
Tired .71 …
A factor is a variable that explains lots of variance among the variables being analyzed (Happy, sad, hopeless, etc)
Loadings are the correlation between each variable and the unobserved factor…
The loadings tell you a lot about patterns of variation among cases…Notably: People who score high on “sad” & “hopeless” & “tired” tend to score very low on “happy” and “worldgood” and vice versa…
Factor Analysis: Depression• Issue: It is wholly up to the researcher to
interpret the factors• We are just data mining… • To ascribe meaning to factors requires much careful
thought – and is ideally informed by theory…
Factor 1
Happy -.86
WorldGood -.75
Hopeless .92
Sad .95
Tired .71
What might factor 1 represent?
Does it seem like it captures “Depression”? Might it mean something else?
Factor Analysis: Depression• Factor analysis is agnostic to direction of
factor variables… results might look like this:
Factor 1
Happy .86
WorldGood .75
Hopeless -.92
Sad -.95
Tired -.71
For all intents & purposes, these results are identical… but flipped
The factor is capturing the inverse of depression… (happiness?)
Factor Analysis
• Things you can do with factor analysis:• 1. Examine factor loadings
– Use them to interpret factors that are identified in the data
• 2. Plot factor loadings– Vividly describe which variables “go together” (people score
high on one tend to score high on another or vice versa)
• 3. Compute factor scores– Estimate how individual cases score on underlying factors– How depressed is each case?
• 4. Determine variation explained by factors– See which factors account for the major patterns in your data
• 5. “Rotate” the factors– Modify them to enhance interpretability… Will discuss later.
FA Example: Civic Engagement
• How do people participate in politics?• Do people vary systematically in civic participation?• Is there such a thing as “civic engagement”?
– A common pattern of behavior that appears in empirical data?
– World Values Survey Data for USA:• Membership in civic groups• Volunteering• Participation in demonstrations• Participation in strikes• Participation in boycotts• Sign petitions.
FA Example: Civic Engagement• Factor analysis of US civic participation. factor member volunteer petition boycott demonstrate strike occupybldg
Factor analysis/correlation Number of obs = 1110 Method: principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 18
-------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.51105 0.71238 0.8319 0.8319 Factor2 | 0.79867 0.67994 0.4397 1.2717 Factor3 | 0.11872 0.20190 0.0654 1.3370 Factor4 | -0.08318 0.04249 -0.0458 1.2912 Factor5 | -0.12567 0.05446 -0.0692 1.2221 Factor6 | -0.18013 0.04305 -0.0992 1.1229 Factor7 | -0.22318 . -0.1229 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(21) = 1405.19 Prob>chi2 = 0.0000
Initial output describes process of factor extraction – identifying factors within the data. Stata identifies many factors (all possible patterns until it runs out of variation). But, only factors with large eigenvalues explain a lot…
FA Example: Civic Engagement• Output (cont’d)Factor loadings (pattern matrix) and unique variances
----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- member | 0.7111 -0.5941 0.0984 | 0.1316 volunteer | 0.6689 -0.6450 0.0939 | 0.1278 petition | 0.3485 0.2288 -0.6927 | 0.3464 boycott | 0.6350 0.3756 -0.2149 | 0.4095 demonstrate | 0.6210 0.4021 -0.1098 | 0.4406 strike | 0.4035 0.4387 0.4021 | 0.4830 occupybldg | 0.2698 0.4038 0.5597 | 0.4509 -----------------------------------------------------------
Next, stata reports the main factors it finds.Factor 1 explains most variation, others less…
Factor 1 correlates with ALL measures of civic participationIn other words, people tend to be high on all measures or low on all.
Is this “civic engagement”?
Factor 2: Some people are LOW on membership & moderately high on demonstrations/strikes.Others are the converse…
Maybe some people are alienated or active in social movements?
FA Example: Civic Engagement• Output (cont’d)Factor loadings (pattern matrix) and unique variances
----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- member | 0.7111 -0.5941 0.0984 | 0.1316 volunteer | 0.6689 -0.6450 0.0939 | 0.1278 petition | 0.3485 0.2288 -0.6927 | 0.3464 boycott | 0.6350 0.3756 -0.2149 | 0.4095 demonstrate | 0.6210 0.4021 -0.1098 | 0.4406 strike | 0.4035 0.4387 0.4021 | 0.4830 occupybldg | 0.2698 0.4038 0.5597 | 0.4509 -----------------------------------------------------------
Factor 3 finds that some people engage in strikes/occupation of buildings but do not sign petitions.
A bit hard to interpret… Focus your energies on first few factors that have big eigenvalues…
FA Example: Civic Engagement• A visual representation of factor loadings
membervolunteer
petition
boycottdemonstrate
strikeoccupybldg
-.4
-.2
0.2
.4F
acto
r 2
0 .2 .4 .6 .8Factor 1
Factor loadings Command: “loadingplot”-- run after factor analysis
Descriptive patterns emerge from the data
Membership & volunteering go together…But are far from strikes, protests, etc.
Factor Rotation
• Factors can be “rotated”• Rotation = recalculating them to maximize differences
between them• This can improve interpretability of factors
Rotated factor loadings (pattern matrix) and unique variances
----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- member | 0.8061 0.0974 0.0139 | 0.3405 volunteer | 0.8055 0.0377 -0.0087 | 0.3497 petition | 0.0615 0.3130 -0.1456 | 0.8771 boycott | 0.1504 0.5724 0.0165 | 0.6494 demonstrate | 0.1358 0.5614 0.0671 | 0.6619 strike | 0.0371 0.3536 0.2421 | 0.8150 occupybldg | -0.0030 0.2439 0.2501 | 0.8780 -----------------------------------------------------------
Here, we see a clearer pattern… Factors 1 & 2 are more distinct.Factor 1 = civic membership; factor 2 = protest/social mvmts, etc…
FA Example: Civic Engagement• Let’s plot the rotated factor loadings:
Pattern is similar to unrotated…But, rotation moves variables closer to axes
membervolunteer
petition
boycottdemonstrate
strike
occupybldg
0.2
.4.6
Fac
tor
2
0 .2 .4 .6 .8Factor 1
Rotation: orthogonal varimaxMethod: principal factors
Factor loadings
Factor Scores
• Factors = variables…• We can compute the value of them for a given case…• Ex: How high do I score on F1 (depression)?• Stata syntax: “predict f1 f2 f3…”
– If you only want scores from first 2 factors, just list 2 variable names…
– Note: If done after rotation, scores will be based on rotated factor loadings! Results will differ
– This is a powerful way to create index variables…• Ex: Depression. You could sum several variables to
create an index… • Or do a factor analysis and compute scores for a factor
that appeared to reflect depression…
FA Example: Civic Engagement
• Factor scores from some sample cases:. predict f1 f2 f3(regression scoring assumed)
Scoring coefficients (method = regression; based on varimax rotated factors). list member volunteer f1 f2
+-------------------------------------------+ | member volunt~r f1 f2 | |-------------------------------------------| 1. | 3 2 .3280279 .4303528 | 2. | 1 0 -.6338809 -.305814 | 3. | 3 3 .575327 -.8480528 | 4. | 5 5 1.52282 .3150256 | 5. | 7 3 1.450748 .4064942 | 6. | 4 4 1.044003 -.4640276 | 8. | 0 0 -.8484179 .5083777 | 9. | 5 5 1.523822 -.9253936 | 12. | 2 2 .1134908 1.244545 | 13. | 1 0 -.6204671 .5076937 | 14. | 5 4 1.276523 .353012 | 15. | 7 5 1.956463 -.4956342 | 16. | 9 1 1.374107 -.3197608 |
Cases that are high on membership & volunteering score very high on factor 1
FA Example: Civic Engagement• Factor scores can also be plotted
This is most useful when you have a small number of cases…Ex: countries, which can be labeled on plot
-10
12
3S
core
s fo
r fa
cto
r 2
-2 0 2 4 6Scores for factor 1
Rotation: orthogonal varimaxMethod: principal factors
Score variables (factor)
Stata: Loadingplots & scoreplots
• Notes:• 1. Plots can be done of all factors…
– I’ve only showed first two… to keep things simple– Syntax: loadingplot, factors(3)
• 2. Case labels can be useful on scoreplots– Scoreplot, mlabel(countryid)– Jitter can sometimes be useful, too…
• 3. Some software allows “biplots”– Plotting loadings & scores together– Helps uncover patterns in data.
Example: Biplot
• Cross-national data on civic participationBiplot (axes F1 and F2: 74.71 %)
East Germany
West Germany
united statesgreat britain
ukraine
turkey
sweden
spain
south africa
slovakia
russian federationromaniaportugal
poland
philippines peru
netherlands
mexico
luxembourg
japan
italy
irelandhungary
france
finland
denmark
czech republic
chile
canada
belarus
belgium
austria
argentina
doccupy
ddemon
dstrike
dboycottdpetition
wtotmtot
-3
-2
-1
0
1
2
3
4
-5 -4 -3 -2 -1 0 1 2 3 4 5
F1 (58.36 %)
F2
(16.
35 %
)
Note that France falls near to activities like “strikes”
US is nearer to mtot (memberhip)
Factor Analysis: Methods
• There are MANY algorithms to extract & rotate factors
• A thorough discussion is beyond the scope of this class• Some defaults (if you don’t choose):
– SPSS: Principle components extraction, varimax rotation– Stata: Principle factors extraction; varimax rotation
• Results can vary if you use different methods…– In practice, few people are skilled in choosing among
methods… people mainly use defaults– I recommend trying multiple methods to ensure that results
are robust…
Wrap Up
• Discuss Factor Analysis reading, if time remains…