green acre 2007 mc a col
DESCRIPTION
Green Acre 2007 Mc a ColTRANSCRIPT
Correspondence analysisand Related Methods – Part 2
1. What is multiple correspondence analysis (MCA)?
2. Why is MCA so useful as a method of visualizingquestionnaire data?
3. How is MCA implemented in XLSTAT?
• “Classical” or “simple” CA analyses the relationshipsbetween two variables, although the method is extendedto analyse different forms of tabular data, for example theproduct–attribute data shown previously, as well as ratings, preferences, on an individual or aggregate level.
• Multiple CA analyses several categorical variables wherewe are interested in all the relationships within the set of variables, not between one set and another
• The best way to understand the difference is to see thedifferent data format for the MCA program in XLSTAT: these are individual-level responses to several questions.
Responses to four questions concerning working women Demographic categories
Source:Family & ChangingGenderRoles SurveyISSP (1994)
• “between-set” means that there are two sets of variables and we are interested in the relationshipsbetween them – e.g., between demographics and the question responses
• “within-set” means that there is one set of variables and we are interested in the relationships amongstthem – e.g., amongst the question responses... thisis the multiple correspondence analysis (MCA) case
BetweenBetweenBetweenBetween----setsetsetset versusversusversusversus withinwithinwithinwithin----setsetsetset• Questions: Should a women work full-time, work part-time
or stay at home or missing data [4 response categories]: (Q1) before she has children; (Q2) when she has a pre-school child; (Q3) when children are still at school; (Q4) when all children have left home.
• Demographics: Country [24], Sex [2], Age group [6]
BetweenBetweenBetweenBetween----setsetsetset exampleexampleexampleexample: Simple CA: Simple CA: Simple CA: Simple CAQ3: Should a woman with a child at school work full-time, part-time or stay at home? work work stay at DK/unsure/
full-time part-time home missingCOUNTRY W w H ? TotalAUS 256 1156 176 191 1779DW 101 1394 581 248 2324DE 278 691 62 66 1097GB 161 646 70 107 984NIRL 126 394 75 52 647USA 482 686 107 172 1447A 84 632 202 59 977H 285 736 447 32 1500I 171 670 167 10 1018IRL 223 424 209 82 938NL 539 1205 143 81 1968N 487 1242 205 153 2087S 295 833 39 105 1272CZ 228 585 198 13 1024SLO 341 428 222 41 1032PL 431 425 589 152 1597BG 270 427 335 94 1126RUS 175 1154 550 119 1998NZ 120 754 72 101 1047CDN 566 497 108 269 1440IL 468 664 92 63 1287J 203 671 313 120 1307E 738 1012 514 230 2494RP 243 448 484 25 1200Total 7271 17774 5960 2585 33590Average profile 0.216 0.529 0.177 0.077 1
Source:Family & Changing GenderRoles SurveyISSP (1994)
Simple CASimple CASimple CASimple CAShould a woman with a child at school work full-time, part-time or stay at home?
2W
2w
2H
2?
AUS
DW
DE
GB
NIRL
USA
A
H
I
IRL
NLN
S CZ
SLO
PL
BG
RUS
NZ
CDN
RP
IL
J
E
-0.4
-0.2
0
0.2
0.4
0.6
-0.4 -0.2 0 0.2 0.4 0.6
0.0737 (50.6%)
0.0532 (36.5%)
87.1% inertiaexplained
W
?
w
H
Simple CA Simple CA Simple CA Simple CA ofofofof multiwaymultiwaymultiwaymultiway tablestablestablestablesShould a woman with a child at school work full-time, part-time or stay at home?
work work stay at DK/unsure/full-time part-time home missing
COUNTRY W w H ? TotalAUSm 117 596 114 82 909AUSf 138 559 60 109 866DWm 43 675 357 123 1198DWf 58 719 224 125 1126 . . . . . . . . . . . . . . . . . . RPm 347 445 294 111 1197RPf 390 566 218 118 1292Total 7271 17774 5960 2585 33590Average profile 0.216 0.529 0.177 0.077 1
•Each country is split by gender: 24×2 country-age groups. We say the variables country and age are interactively coded
•Average profile stays the same, so definition of centre and geometric distance remainidentical to previous map, all thathas been done is to split eachcountry point into two profiles
Simple CA Simple CA Simple CA Simple CA ofofofof multiwaymultiwaymultiwaymultiway tablestablestablestablesShould a woman with a child at school work full-time, part-time or stay at home?
86.8% inertiaexplained
W
w
H
?
AUSmDWm
Dem
GBmNIRLm
USAm
Am
Hm
Im
IRLm
NLm Nm
Sm CZm
SLOmPLm
BGm
RUSm
NZm
CDNm
RPm
Ilm
Jm
Em
AUSf
DWf
Def
GBf
NIRLf
USAf
Af
Hf
If
IRLf
NLfNf
Sf CZf
SLOf
PLf
BGf
RUSfNZf
CDNf
RPf
Ilf
Jf
Ef
-0.4
-0.2
0
0.2
0.4
0.6
-0.4 -0.2 0 0.2 0.4 0.6 0.8
0.0797 (51.5%)
0.0546 (35.3%)
•Ireland (IRL) has largest M–Fdifference
•Bulgaria (BG) is only country witha reverse M–F difference
•Inertia before:0.01456
•Inertia with M–F split:
0.01546•5.8% due to M–F
Simple CA Simple CA Simple CA Simple CA ofofofof multiwaymultiwaymultiwaymultiway tablestablestablestablesShould a woman with a child at school work full-time, part-time or stay at home?
87.3% inertiaexplained
•Points tend to lie in a curved pattern (calledarch or horseshoe )
•Points that lie insidethe arch arepolarized, e.g. PLm26-35: 32% W, 22% w, 32% H, butNZm>66: 7% W, 73% w, 15% HAverage: 22% W, 53% w, 18% H
•Interactive coding of country (24), gender (2) and age (6), giving 288 combinations
?H
w
W
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
0.1301 (54.3%)
0.0791 (33.0%)
CDNf<25
Hm>66
PLm>66
NZm>66
DEm<25PLm<26-35
StackedStackedStackedStacked tablestablestablestablesShould a woman with a child at school work full-time, part-time or stay at home?
•Since the column margins of each table areidentical (and same as the interactively codedtables before), the basic geometry remains thesame, it’s just the detail that is sacrificed here, all the information is collapsed into “maineffects”.
•Each variable is separately cross-tabulated withthe question and then stacked one on top of another.
W w H ?
Country (24)
Gender (2)
Age (6)
Education (7)
Marital status (5)
Social class (8)
•Inertia of stacked table is the average of theinertias of its subtables
StackedStackedStackedStacked tablestablestablestables... with a child at school ...
•Tables can be stackedrow-wise and column-wise, adding additional questionsas columns
W w H ?
Country (24)
Gender (2)
Age (6)
Education (7)Marital status (5)
Social class (8)
W w H ? W w H ? W w H ?
Should a (married) woman before havingchildren...
... with a preschool child...
... when herchildren haveleft home workfull-time, part-time or stay at home?
•24 contingency tables in a 6 × 4 pattern, row margins and column margins are the same.
•Inertia of stacked table is the average of the inertiasof its subtables
StackedStackedStackedStacked tablestablestablestablesWomen in the workplace and 6 demographic variables
71.0% inertiaexplained
•Relationshipswithin questionsand relationshipswithindemographics notdisplayedexplicitly
•Join categories of ordinal variable to see trends, for example age.
•Relationshipsbetween eachdemographicvariable and eachquestiondisplayed jointly
1W
1w
1H
1?
2W
2w
2H
2?
3W
3w
3H
3? 4W
4w
4H
4?
AUS
DW
DE
GB
NIRL
USA
A
H
I
IRLNL
NS
CZ
SLO PL
BG
RUS
NZ
CDN
RP
IL
J
E
M
F
A1
A2
A3
A4
A5 A6ma widi
se
si
E1
E2
E3E4
E5E6
E7 S0
S1
S2
S3S4
S5
S6
S*
-0.4
-0.2
0
0.2
0.4
-0.4 -0.2 0 0.2 0.4 0.6
0.0188 (49.1%)
0.0084 (21.9%)
MultipleMultipleMultipleMultiple correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis (MCA)(MCA)(MCA)(MCA)Women in the workplace – 4 questions
West & East German samples only
•N rows, Qquestions, q-thquestion has Jqcategories, total number of categories is J
( N = 3415, Q = 4 Jq = 4 for all q,
J = 16 )
•One definition of MCA is that it is the CA of theindicator matrix
•Response data is recoded as dummy variables
Questions Qu. 1 Qu. 2 Qu. 3 Qu. 4
1 2 3 4 W w H ? W w H ? W w H ? W w H ?
--------------------------------------------------
1 3 2 2 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
2 3 3 2 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0
4 3 3 2 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0
4 4 4 4 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
4 4 4 4 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
1 3 2 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0. . .
. . .
. . . and so on for 3415 rows
Original data Indicator Matrix
MCA: XLSTAT MCA: XLSTAT MCA: XLSTAT MCA: XLSTAT initialinitialinitialinitial outputoutputoutputoutput
Total inertia: 3
Eigenvalues and percentages of inertia:
F1 F2 F3 F4 F5Eigenvalue 0.692 0.513 0.365 0.307 0.218Inertia (%) 23.061 17.108 12.156 10.248 7.254Cumulative % 23.061 40.169 52.325 62.573 69.827Adjusted Inertia 0.347 0.123 0.023 0.006Adjusted Inertia (%) 66.152 23.482 4.456 1.118Cumulative % 66.152 89.634 94.090 95.208
... F12
Total inertia in MCA of indicator matrix Z = 34
416 =−=−Q
QJ
MultipleMultipleMultipleMultiple correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis (MCA)(MCA)(MCA)(MCA)Women in the workplace – 4 questions
•If Z (N×J) is theindicator matrix, then the Burtmatrix B (J×J) is B = ZTZ
•Alternativedefinition of MCA is that it is the CA of the Burt matrix
•Stacked matrix of all two-waycontingencytables, includingeach variable withitself
1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4?
2500 0 0 0 172 1107 1130 91 355 1709 345 91 1766 537 40 157
0 476 0 0 7 129 335 5 16 261 181 18 128 293 17 38
0 0 79 0 1 6 72 0 1 17 61 0 14 21 38 6
0 0 0 360 1 57 108 194 7 96 55 202 51 45 2 262
172 7 1 1 181 0 0 0 127 48 4 2 165 15 0 1
1107 129 6 57 0 1299 0 0 219 997 61 22 972 239 13 75
1130 335 72 108 0 0 1645 0 24 988 573 60 760 615 84 186
91 5 0 194 0 0 0 290 9 50 4 227 62 27 0 201
355 16 1 7 127 219 24 9 379 0 0 0 360 14 1 4
1709 261 17 96 48 997 988 50 0 2083 0 0 1348 566 23 146
345 181 61 55 4 61 573 4 0 0 642 0 202 286 73 81
91 18 0 202 2 22 60 227 0 0 0 311 49 30 0 232
1766 128 14 51 165 972 760 62 360 1348 202 49 1959 0 0 0
537 293 21 45 15 239 615 27 14 566 286 30 0 896 0 0
40 17 38 2 0 13 84 0 1 23 73 0 0 0 97 0
157 38 6 262 1 75 186 201 4 146 81 232 0 0 0 463
Burt matrix
1W
1w
1H
1?
2W
2w
2H
2?
3W
3w
3H
3?
4W
4w
4H
4?
MCA (MCA (MCA (MCA (BurtBurtBurtBurt matrixmatrixmatrixmatrix versionversionversionversion))))
64.9% inertiaexplained (only 40.2% if indicatormatrix analysed)
•Missing valuecategories havestrong association
•Relationshipsamongst (within) theset of questions aredisplayed jointly
Women in the workplace – 4 questions
1W
1w
1H
1?
2W
2w
2H
2?
3W
3w
3H
3?4W
4w
4H
4?
-3
-2
-1
0
1
2
-1 0 1 2 3
0.263 (41.9%)
0.479 (23.0%)
0.479 (41.9%)
0.263 (23.0%) •Results are same for Burt matrix, just principal inertiaschange.
MultipleMultipleMultipleMultiple correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis (MCA)(MCA)(MCA)(MCA)Women in the workplace – 4 questions
•Since thediagonal inertiasare so high, thisinflates theaverage, hencelow percentages
•Total inertia of Burt matrix is average of theinertias of itssubmatrices = 1.143
1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4?
2500 0 0 0 172 1107 1130 91 355 1709 345 91 1766 537 40 157
0 476 0 0 7 129 335 5 16 261 181 18 128 293 17 38
0 0 79 0 1 6 72 0 1 17 61 0 14 21 38 6
0 0 0 360 1 57 108 194 7 96 55 202 51 45 2 262
172 7 1 1 181 0 0 0 127 48 4 2 165 15 0 1
1107 129 6 57 0 1299 0 0 219 997 61 22 972 239 13 75
1130 335 72 108 0 0 1645 0 24 988 573 60 760 615 84 186
91 5 0 194 0 0 0 290 9 50 4 227 62 27 0 201
355 16 1 7 127 219 24 9 379 0 0 0 360 14 1 4
1709 261 17 96 48 997 988 50 0 2083 0 0 1348 566 23 146
345 181 61 55 4 61 573 4 0 0 642 0 202 286 73 81
91 18 0 202 2 22 60 227 0 0 0 311 49 30 0 232
1766 128 14 51 165 972 760 62 360 1348 202 49 1959 0 0 0
537 293 21 45 15 239 615 27 14 566 286 30 0 896 0 0
40 17 38 2 0 13 84 0 1 23 73 0 0 0 97 0
157 38 6 262 1 75 186 201 4 146 81 232 0 0 0 463
Burt matrix – inertias of each subtable
1W
1w
1H
1?
2W
2w
2H
2?
3W
3w
3H
3?
4W
4w
4H
4?
3.000 0.363 0.424 0.644
0.363 3.000 0.892 0.345
0.424 0.892 3.000 0.480
0.644 0.345 0.480 3.000
•Percentage of varianceexplained is actually muchhigher, in MCA the overall inertiais inflated by thediagonal tables in the Burt matrix –the percentage is actually about90%
AdjustmentAdjustmentAdjustmentAdjustment ofofofof principal principal principal principal inertiasinertiasinertiasinertias((((eigenvalueseigenvalueseigenvalueseigenvalues))))
Here are the steps to rescale the solution:
1. Calculate the average off-diagonal inertia :
average off-diagonal inertia =
2. Calculate the adjusted principal inertias :
adjusted principal inertias =
3. Calculate adjusted percentages of inertia :
adjusted percentages of inertia =
−−− 2
)( 1 Q
QJinertia
Q
QB
QQQ
Qkk
111
λλ for only
22
>
−
−
inertia diagonal-off averageinertias principal adjusted
We can rescale an existing MCA solution in order to best fit the off-diagonal tables. All we need is the total inertia of the Burt matrix, inertia(B), and the
principal inertias λλλλk2 of the Burt matrix in the solution space.
If we have computed the solution on the indicator matrix Z (as in MCA module
of XLSTAT), the eigenvalues calculated are λλλλk so all the squares of theprincipal inertias of Z need to be summed in order to get inertia(B). If you
have analysed the Burt matrix B, inertia(B) is the total inertia.
MCA (MCA (MCA (MCA (adjustedadjustedadjustedadjusted))))Women in the workplace – 4 questions
4?
4H
4w
4W 3?
3H
3w
3W
2?
2H
2w
2W
1?
1H
1w
1W
-3
-2
-1
0
1
2
-1 0 1 2 3
0.347 (66.2%)
0.123 (23.5%)
89.7% inertia explained
MCA (MCA (MCA (MCA (BurtBurtBurtBurt matrixmatrixmatrixmatrix versionversionversionversion))))Women in the workplace – 4 questions
1W
1w
1H
1?
2W
2w
2H
2?
3W
3w
3H
3?4W
4w
4H
4?
-3
-2
-1
0
1
2
-1 0 1 2 3
0.263 (41.9%)
0.479 (23.0%)
0.479 (41.9%)
0.263 (23.0%)
64.9% inertia explained
MCAMCAMCAMCAWomen in the workplace – supplementary demographic groups
DW
DE
M
F
A1A2
A3
A4A5
A6
E1
E2
E3
E4E5
E6
E*
ma
wi
di
se
si
-0.5
0.5
-0.5 0.5
Related topicsRelated topicsRelated topicsRelated topics1. Subset correspondence analysis• restricting analysis to a subset of categories (e.g. all
substantive responses excluding missing categories, or missing categories by themselves, or “middle” categories)
2. Square asymmetric tables• mobility tables, brand-switching, migration...
3. Recoding of data before applying CA
• ratings, preferences, paired comparisons, continuous-scale
data (ratio and interval)
4. Stability and inference• concentration ellipses, convex hulls, permutation tests
5. Canonical correspondence analysis (CCA)• CA with explanatory variables (combination of dimensions
reduction and regression)
SubsetSubsetSubsetSubset correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysisFor example, analysing the women working data but ignoring the missingvalues (this is NOT just a CA of the table without the missing value columns –the masses and metric of the complete matrix are maintained). In XLSTAT’s MCA program you are given a menu for selecting whichcategories you want to retain or omit:
SubsetSubsetSubsetSubset correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis
4H
4w
4W
3H
3w
3W
2H2w
2W
1H
1w
1W
-0.5
0
0.5
1
-1.5 -1 -0.5 0 0.5
0.1240 (70.0%)
0.0241 (13.5%)
Canonical Canonical Canonical Canonical correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis ((((CCACCACCACCA))))
This has the same objective as CA but restricts the CA solution to be (linearly) related to external predictor variables, for exampe we want to find the best low-dimensional view of the responses which is related to age (either agegroup or original age variable)
Canonical Canonical Canonical Canonical correspondencecorrespondencecorrespondencecorrespondence analysisanalysisanalysisanalysis((((restrictedrestrictedrestrictedrestricted to to to to ageageageage groupgroupgroupgroup differencesdifferencesdifferencesdifferences))))
Q4-4Q4-3
Q4-2 Q4-1
Q3-4
Q3-3
Q3-2
Q3-1
Q2-4
Q2-3Q2-2
Q2-1
Q1-4
Q1-3
Q1-2Q1-1
agegp-6
agegp-5
agegp-4agegp-3
agegp-2
agegp-1
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6
0.685 (63.5%)
0.465 (18.4%)