wf ed 540, class meeting 6, contingency tables, 2016
TRANSCRIPT
Contingency TablesDATA ANALYSIS27 SEPTEMBER 2016
A refresher:Hypothesis testingTHEORY, PROPOSITIONS, LOGIC
Language of hypothesis testing… Hypotheses are“tested”
Hypotheses are never“proved”
Hypotheses only are“rejected”
Theories are built and verified by testing hypotheses
Decision-by-truth tableTruth
Ho true Ho falseD
ecis
ion Fail to
reject Ho
Reject Ho
Decision-by-truth table
Error
Error
TruthHo true Ho false
Dec
isio
n Fail to reject Ho
Reject Ho
Decision-by-truth table
Type 1error
Type 2error
TruthHo true Ho false
Dec
isio
n Fail to reject Ho
Reject Ho
Decision-by-truth table
TRADITIONALLY, probability of Type 1
error set at .05
Minimize Type 2error by
increasing sample size
TruthHo true Ho false
Dec
isio
n Fail to reject Ho
Reject Ho
Contingency tablesAlso known as CROSSTABULATIONS
What is a contingency table?A contingency table is a table of counts.A two-dimensional contingency table is
formed by classifying subjects by two variables.
One variable identifies the row categories; the other variable defines the column categories.
The combinations of row and column categories are called cells.
Structure of rows-by-column contingency table…
R1; C1 R1; C2
R2; C1 R2; C2
Column 1 Column 2Ro
w 1
Row
2
R1tot
R2tot
C1 tot C2 tot Total
Data from NLSY79
Example of contingency table…
R1; C1 R1; C2
R2; C1 R2; C2
Male FemaleRo
w 1
Row
2
R1total
R2total
C1 total C2 total Total
Example of contingency table…
R1; C1 R1; C2
R2; C1 R2; C2
Male Female
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
R1total
R2total
C1 total C2 total Total
Example of contingency table…
Males not in poverty
Females not in poverty
Males in poverty Females in poverty
Male Female
R1total
R2total
C1 total C2 total Total
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
Example of contingency table…
Males not in poverty
Females not in poverty
Males in poverty Females in poverty
Male Female
Nopovtotal
Pov total
Male total Female total Total
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
Example of contingency table…
Males not in poverty
Females not in poverty
Males in poverty Females in poverty
Male Female
Nopovtotal
Pov total
Male total Female total Total
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
Marginals =>
<=
Mar
gina
ls
R analysis for cell countsR script:
Console output:
Example of contingency table…
3,086 3,039
443 623
Male Female
Male total Female total Total
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
Nopovtotal
Pov total
R analysis for marginal counts
R script:
Console output:
Example of contingency table…
3,086 3,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
Research question
3,086 3,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
Is gender independent of household poverty status?
Research question
3,086 3,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
If you know a person’s gender, can you predict poverty status?
Research question
3,086 3,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
If you know a person’s poverty status, can you predict gender?
Under the null hypothesis…
3,086 3,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
A cell value should be equal to (row total x column total) ÷ total
Under the null hypothesis…
3,086Expected value
is 30063,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
E.g., (6125 x 3529) ÷ 7191 should be equal to 3086, but is 3006
Under the null hypothesis…
3,086Expected value
is 30063,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
An expected cell count is a hypothetical count that would occur if there is no relationship between the two variables
test of independence
3,086Expected value
is 30063,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
A value is the sum of the squared deviations of observed minus expected divided by the expected value
test of independence
3,086Expected value
is 30063,039
443 623
Male Female
3,529 3,662 7,191
2005
Hou
seho
ldNo
t in
Pove
rty20
05 H
ouse
hold
In
Pov
erty
6,125
1,066
A value is the sum of the squared deviations of observed minus expected divided by the expected value
Hypothesis tested about …Null hypothesis is H0: R x C = 0
Alternate hypothesis is H1: R x C ≠ 0
a = .05 Described as a test of independence
Calculating in R….it’s simple
Console output:
R script:
Calculating in R….it’s simple
Console output:
R script:
Degrees of freedom (df) = (# rows – 1)(# columns – 1)
Calculating in R….it’s simple
Console output:
R script:
p-value < .05, so reject null
test of independence
A test of the hypothesis that rows and columns in a table are independent
In our case, a test of the independence of gender and poverty status reveals• Household poverty status and gender
are not independent• Knowing household poverty status helps
predict gender
test of independence
A test of the hypothesis that rows and columns in a table are independent
In our case, a test of the independence of gender and poverty status reveals• Household poverty status and gender
are not independent• Knowing household poverty status helps
predict gender
But how much?
Contingency TablesDATA ANALYSIS27 SEPTEMBER 2016