cross tabulation statistical analysis of categorical variables
Post on 28-Dec-2015
250 Views
Preview:
TRANSCRIPT
Cross Tabulation
Statistical Analysis of Categorical Variables
To date….
• We have examined statistical tests for differences of means, proportions, regression coefficients and correlation coefficients.
• These statistics are all measured at the interval level.
New Test…
• Now we wish to examine statistical tests for questions involving nominal and ordinal variables. To do so we introduce the Chi Square Test.
Cross Tabulation
• We are interested in the counting the number of cases for the categories of one variable in terms of the categories of a second variable, and….
• Implicitly, we are asking if there are differences in the patterns of the counts….
Cross Tabulation and Chi Square Test
A cross tabulation cross classifies one variable by another variable. Below is a cross classification of occupational groups and wards for the Simon data for 1905.
Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074
Cross Tabulation and Chi Square Test
We count the number of cases in each occupational category for each ward. At the edges of the table we total the rows and columns.
Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074
Graphic Illustration of the Counts of Occupational Groups by Ward
The Simplest Example….a 2 by 2 Table
• Do the opinions of men and women differ on the War in Iraq?
• Do the opinions of men and women differ on the importance of capturing Osama Bin Laden?
• Data: September 2006 ABC News Poll on the War on Terror. A Sample of about 1000 respondents.
Q12 War worth fighting NET
430 42.9 43.8 43.8
552 55.0 56.2 100.0
982 97.9 100.0
1 .1
20 2.0
21 2.1
1003 100.0
Worth fighting NET
Not worth fighting NET
Total
Valid
System Missing
DK/No opinion
Total
Missing
Total
Frequency Percent Valid PercentCumulative
Percent
Basic Frequencies
Basic Frequencies Broken Down by Gender
Bars show counts
Another Graphical Illustration: 4 Bins of Counts
Tabular Data
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
Count
213 217 430
245 307 552
458 524 982
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Some Terms and Assumptions• Cell frequency: number in the body of the table• Marginal total: total of the row or the column• Row percent: the proportion of cases in the cell for
the particular row.• Column percent: the proportion of cases in the cell
for the particular column• Expected frequency: the number of cases expected
based upon the marginal proportions• Deviation: the difference between the expected
frequency and the actual frequency
Tabular Data
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
Count
213 217 430
245 307 552
458 524 982
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Cell Frequency
Marginals
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
Count
213 217 430
245 307 552
458 524 982
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
213 245 217 307
Table Counts
and Graph
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
% within Q12 War worth fighting NET
49.5% 50.5% 100.0%
44.4% 55.6% 100.0%
46.6% 53.4% 100.0%
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Row Percents
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
% within Q921 GENDER
46.5% 41.4% 43.8%
53.5% 58.6% 56.2%
100.0% 100.0% 100.0%
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Column Percents
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
Count
213 217 430
245 307 552
458 524 982
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
% within Q12 War worth fighting NET
49.5% 50.5% 100.0%
44.4% 55.6% 100.0%
46.6% 53.4% 100.0%
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
% within Q921 GENDER
46.5% 41.4% 43.8%
53.5% 58.6% 56.2%
100.0% 100.0% 100.0%
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Frequencies, Row and Column Percents
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
213 217 430
200.5 229.5 430.0
245 307 552
257.5 294.5 552.0
458 524 982
458.0 524.0 982.0
Count
Expected Count
Count
Expected Count
Count
Expected Count
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
New Concept: Expected Frequencies
• What would the counts in the cells be if there was no impact of gender on attitudes towards the Iraq War?
• The marginal proportions would define the cell counts.
Expected Frequencies
• Row Total * Column Total/ Grand Total
• Or…
• Row Proportion * Column Total
• Or…
• Column Proportion * Row Total
Another Example: The Importance of Capturing Osama
Bin Laden
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation
Count
173 246 419
277 255 532
450 501 951
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Frequencies by Gender
Frequencies By Gender
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation
% within Q.28 Do you think (the United States has to capture or kill Osama bin Laden for thewar on terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)?
41.3% 58.7% 100.0%
52.1% 47.9% 100.0%
47.3% 52.7% 100.0%
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Row Percents
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation
% within Q921 GENDER
38.4% 49.1% 44.1%
61.6% 50.9% 55.9%
100.0% 100.0% 100.0%
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Column Percents
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation
% within Q.28 Do you think (the United States has to capture or kill Osama bin Laden for thewar on terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)?
41.3% 58.7% 100.0%
52.1% 47.9% 100.0%
47.3% 52.7% 100.0%
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the waron terrorism to be a success), or do you think (the war on terrorism can be a successwithout Osama bin Laden being killed or captured)? * Q921 GENDER Crosstabulation
% within Q921 GENDER
38.4% 49.1% 44.1%
61.6% 50.9% 55.9%
100.0% 100.0% 100.0%
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the war on terrorism to be asuccess), or do you think (the war on terrorism can be a success without Osama bin Laden being killed or
captured)? * Q921 GENDER Crosstabulation
173 246 419
198.3 220.7 419.0
277 255 532
251.7 280.3 532.0
450 501 951
450.0 501.0 951.0
Count
Expected Count
Count
Expected Count
Count
Expected Count
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Expected Frequencies
Actual Frequencies, Expected Frequencies, and Deviations
(Residual)Q.28 Do you think (the United States has to capture or kill Osama bin Laden for the war on terrorism to be asuccess), or do you think (the war on terrorism can be a success without Osama bin Laden being killed or
captured)? * Q921 GENDER Crosstabulation
173 246 419
198.3 220.7 419.0
-25.3 25.3
277 255 532
251.7 280.3 532.0
25.3 -25.3
450 501 951
450.0 501.0 951.0
Count
Expected Count
Residual
Count
Expected Count
Residual
Count
Expected Count
U.S. must capture/killOsama bin Laden
War can be a successwithout capturing/killingbin Laden
Q.28 Do you think (theUnited States has tocapture or kill Osama binLaden for the war onterrorism to be asuccess), or do you think(the war on terrorism canbe a success withoutOsama bin Laden beingkilled or captured)?
Total
Male Female
Q921 GENDER
Total
Chi Square
• Chi Square = Sum of [ (Expected – Observed)2 / Expected Frequency ]
• Chi Square Table: http://www.uwm.edu/~renlex/chisquare.html
Examples of Chi Square Distribution
Degrees of Freedom for Chi Square
• Degrees of Freedom = (r-1)* (c-1)
• So, 2 by 2 table has 1 degree of freedom
• 3 by 2 table has (3-1)(2-1)= 2 degrees of freedom
Calculations: Catching Osama bin Laden by Gender
• 640.09/198.3 = 3.23
• 640.09/220.7 = 2.90
• 640.09/251.7 = 2.54
• 640.09/280.3 = 2.29
• Chi Square (SUM) = 10.96
Attitudes toward Iraq War by Gender
Q12 War worth fighting NET * Q921 GENDER Crosstabulation
213 217 430
200.5 229.5 430.0
12.5 -12.5
245 307 552
257.5 294.5 552.0
-12.5 12.5
458 524 982
458.0 524.0 982.0
Count
Expected Count
Residual
Count
Expected Count
Residual
Count
Expected Count
Worth fighting NET
Not worth fighting NET
Q12 War worthfighting NET
Total
Male Female
Q921 GENDER
Total
Calculations: Attitudes toward Iraq War by Gender
• 156.25/200.5 = .78
• 156.25/229.5 = .68
• 156.25/257.5 = .61
• 156.25/294.5 = .53
• Chi Square (SUM) = 2.60
• (not statistically signfication at .05 level)
Chi Square Test• For a larger table, calculation is the
same, but the number of terms increases. The number of terms is equal to the number of cells.
Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074
Concentration of Occupational Groups by Ward
Cross Tabulation
• Are the occupational patterns different in the four wards?
• Or….are the patterns a result of chance? (null hypothesis)
• How would we decide?
Illustration: Frequencies and Marginals
Frequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 (Marginals) skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074
(Marginals)
Row and Column Percents Row percents OCC$ (rows) by WARD (columns) 14 18 20 22 Total N +-----------------------------------------+ profcler | 5.960 36.424 19.868 37.748 | 100.000 151 prop | 16.981 20.755 33.962 28.302 | 100.000 159 skilled | 24.390 4.336 40.379 30.894 | 100.000 369 skillpart | 14.286 13.187 43.956 28.571 | 100.000 91 unskilled | 57.566 3.947 23.355 15.132 | 100.000 304 +-----------------------------------------+ Total 29.236 11.918 32.030 26.816 100.000 N 314 128 344 288 1074 Column percents OCC$ (rows) by WARD (columns) 14 18 20 22 Total N +-----------------------------------------+ profcler | 2.866 42.969 8.721 19.792 | 14.060 151 prop | 8.599 25.781 15.698 15.625 | 14.804 159 skilled | 28.662 12.500 43.314 39.583 | 34.358 369 skillpart | 4.140 9.375 11.628 9.028 | 8.473 91 unskilled | 55.732 9.375 20.640 15.972 | 28.305 304 +-----------------------------------------+ Total 100.000 100.000 100.000 100.000 100.000 N 314 128 344 288 1074
Expected and Actual FrequenciesFrequencies OCC$ (rows) by WARD (columns) 14 18 20 22 Total +-----------------------------------------+ profcler | 9 55 30 57 | 151 prop | 27 33 54 45 | 159 skilled | 90 16 149 114 | 369 skillpart | 13 12 40 26 | 91 unskilled | 175 12 71 46 | 304 +-----------------------------------------+ Total 314 128 344 288 1074
Expected values OCC$ (rows) by WARD (columns) 14 18 20 22 +-----------------------------------------+ profcler | 44.15 18.00 48.36 40.49 | prop | 46.49 18.95 50.93 42.64 | skilled | 107.88 43.98 118.19 98.95 | skillpart | 26.61 10.85 29.15 24.40 | unskilled | 88.88 36.23 97.37 81.52 | +-----------------------------------------+
Deviates: (Observed-Expected)
OCC$ (rows) by WARD (columns)
14 18 20 22
+-----------------------------------------+
profcler | -35.147 37.004 -18.365 16.508 |
prop | -19.486 14.050 3.073 2.363 |
skilled | -17.883 -27.978 30.810 15.050 |
skillpart | -13.605 1.155 10.853 1.598 |
unskilled | 86.121 -24.231 -26.371 -35.520 |
+-----------------------------------------+
Deviates
Case number OCC$ WARD FREQUENCY EXPECTED RESIDUAL CHITERM
1 profcler 14.000 9.000 44.147 -35.147 27.982
2 profcler 18.000 55.000 17.996 37.004 76.087
3 profcler 20.000 30.000 48.365 -18.365 6.973
4 profcler 22.000 57.000 40.492 16.508 6.730
5 prop 14.000 27.000 46.486 -19.486 8.168
6 prop 18.000 33.000 18.950 14.050 10.418
7 prop 20.000 54.000 50.927 3.073 0.185
8 prop 22.000 45.000 42.637 2.363 0.131
9 skilled 14.000 90.000 107.883 -17.883 2.964
10 skilled 18.000 16.000 43.978 -27.978 17.799
11 skilled 20.000 149.000 118.190 30.810 8.032
12 skilled 22.000 114.000 98.950 15.050 2.289
13 skillpart 14.000 13.000 26.605 -13.605 6.957
14 skillpart 18.000 12.000 10.845 1.155 0.123
15 skillpart 20.000 40.000 29.147 10.853 4.041
16 skillpart 22.000 26.000 24.402 1.598 0.105
17 unskilled 14.000 175.000 88.879 86.121 83.449
18 unskilled 18.000 12.000 36.231 -24.231 16.205
19 unskilled 20.000 71.000 97.371 -26.371 7.142
20 unskilled 22.000 46.000 81.520 -35.520 15.477
Calculations
Review: Terms and Assumptions• Cell frequency: number in the body of the table• Marginal total: total of the row or the column• Row percent: the proportion of cases in the cell for
the particular row.• Column percent: the proportion of cases in the cell
for the particular column• Expected frequency: the number of cases expected
based upon the marginal proportions• Deviation: the difference between the expected
frequency and the actual frequency
Strength of Relationships
• Phi: Square root of (Chi Square/N)
• Cramer’s V: Square root of (Chi Square/n*min(r-1, c-1))
• Contingency Coefficient: Square root of (Chi Square/(Chi Square+n))
top related