ppa 415 – research methods in public administration lecture 9 – bivariate association
Post on 21-Dec-2015
222 views
TRANSCRIPT
PPA 415 – Research Methods in Public Administration
Lecture 9 – Bivariate Association
Statistical Significance and Theoretical Significance
Tests of significance detect nonrandom relationships.
Measures of association go one step farther and provide information on the strength and direction of relationships.
Statistical Significance and Theoretical Significance
Measures of association allow us to achieve two goals: Trace causal relationships among variables.
But, they cannot prove causality. Predict scores on the dependent variable
based on information from the independent variable.
Bivariate Association and Bivariate Tables
Two variables are said to be associated if the distribution of one of them changes under the various categories or scores of the other. Liberals are more likely to vote for Democratic
candidates than for Republican Candidates. Presidents are more likely to grant disaster
assistance when deaths are involved than when there are no deaths.
Bivariate Association and Bivariate Tables
Bivariate tables are devices for displaying the scores of cases on two different variables. Independent or X variable in the columns. Dependent variable or Y variable in the rows. Each column represents a category on the
independent variable. Each row represents a category on the dependent variable. Each cell represents those cases that fall into each combination of categories.
Bivariate Association and Bivariate Tables
Table 1. Home Buying Plans by Race, Jefferson County Housing Authority, 1999
81 37 0 118
79.4% 50.0% .0% 66.7%
21 37 1 59
20.6% 50.0% 100.0% 33.3%
102 74 1 177
100.0% 100.0% 100.0% 100.0%
Count
% within Race
Count
% within Race
Count
% within Race
No
Yes
Do you plan to buya home within thenext five years?
Total
White Black Other
Race
Total
Each column’s frequency distribution is called a conditional distribu-tion of Y.
Bivariate Association and Bivariate Tables
Often you will calculate a chi-square when you generate table. If chi-square is zero, then there is no association.
Usually, however, chi-square is positive to some degree.
Statistical significance is not the same as association. It is, however, usually the case that significance is the first step in determining the strength of association.
Three Characteristics of Bivariate Associations
Does an association exist? Statistical significance the first step. Calculate column percentages and compare
across conditional distributions. If there is an association, the largest cell will
change from column to column. If there is no association, the conditional
percentages will not change.
Three Characteristics of Bivariate Associations
How strong is the association? Once we establish the existence of an association,
we need to determine how strong it is? A matter of measuring the changes across conditional
distributions. No association – no change in column percentages. Perfect association – each value of the independent
variable is associated with one and only one value of the dependent variable.
The huge majority of relationships fall in between.
Three Characteristics of Bivariate Associations
How strong is the association (contd.)? Virtually all statistics of association are
designed to vary between 0 for no association and +1 for perfect association (±1 for ordinal and interval data).
The meaning of the statistics varies a little from statistic to statistics, but 0 signifies no association and 1 signifies perfect association in all cases.
Three Characteristics of Bivariate Associations
What is the pattern and/or direction of the association? Pattern is determined by examining which categories
of X are associated with which categories of Y. Direction only matters for ordinal and interval-ratio
data. In positive association, low values on one variable are associated with low values on the other and high with high.
On negative association, low values on variable are associated with high values on the other and vice versa.
Three Characteristics of Bivariate Associations
7-PT SCALE PARTY IDENTIFICATION * LIBERAL-CONSERVATIVE 7PT SCALE Crosstabulation
172 752 548 1095 317 301 85 3270
40.0% 38.2% 21.7% 16.1% 8.1% 8.5% 14.2% 16.5%
79 438 723 1592 653 358 58 3901
18.4% 22.3% 28.6% 23.4% 16.6% 10.1% 9.7% 19.7%
104 417 536 1003 335 188 32 2615
24.2% 21.2% 21.2% 14.7% 8.5% 5.3% 5.4% 13.2%
41 144 200 911 363 231 49 1939
9.5% 7.3% 7.9% 13.4% 9.2% 6.5% 8.2% 9.8%
13 86 206 793 697 555 76 2426
3.0% 4.4% 8.2% 11.6% 17.7% 15.7% 12.7% 12.3%
13 72 230 990 1003 670 77 3055
3.0% 3.7% 9.1% 14.5% 25.5% 19.0% 12.9% 15.4%
8 58 81 425 561 1231 220 2584
1.9% 2.9% 3.2% 6.2% 14.3% 34.8% 36.9% 13.1%
430 1967 2524 6809 3929 3534 597 19790
100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
Strong Democrat
Weak Democrat
Independent - Democrat
Independent -Independent
Independent - Republican
Weak Republican
Strong Republican
7-PT SCALEPARTYIDENTIFICATION
Total
Extremelyliberal Liberal Slightly liberal
Moderate,middle ofthe road
Slightlyconservative Conservative
Extremelyconservative
LIBERAL-CONSERVATIVE 7PT SCALE
Total
Three Characteristics of Bivariate Associations
President's recommendation * Snow and ice/ Winter storm Crosstabulation
100 31 131
33.2% 44.3% 35.3%
49 13 62
16.3% 18.6% 16.7%
152 26 178
50.5% 37.1% 48.0%
301 70 371
100.0% 100.0% 100.0%
Count
% within Snow andice/ Winter storm
Count
% within Snow andice/ Winter storm
Count
% within Snow andice/ Winter storm
Count
% within Snow andice/ Winter storm
Turndown
Emergency Declaration
Major disaster declaration
President'srecommendation
Total
No Yes
Snow and ice/ Winterstorm
Total
Nominal Association – Chi-Square Based
tables.2 x 2n larger tha ;1-c 1,-r of Minimum
V
tables.2 x 2 ;
2
2
N
N
The five-step model is calculated using chi-square.
Nominal Association – Chi-Square Based
Snow and ice/ Winter storm * Presidential administration Crosstabulation
121 181 302
94.5% 73.9% 81.0%
7 64 71
5.5% 26.1% 19.0%
128 245 373
100.0% 100.0% 100.0%
Count
% within Presidentialadministration
Count
% within Presidentialadministration
Count
% within Presidentialadministration
No
Yes
Snow and ice/Winter storm
Total
Gerald R.Ford Jimmy Carter
Presidentialadministration
Total
Nominal Association – Chi-Square-Based
Chi-Square Tests
23.271b 1 .000
21.950 1 .000
27.380 1 .000
.000 .000
23.209 1 .000
373
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is24.36.
b.
.250.0624.373
271.23
1)-c 1,-r of min.(
2
N
V
Nominal Association – Proportional Reduction in Error
The logic of proportional reduction in error (PRE) involves first attempting to guess or predict the category into which each case will fall on the dependent variable without using information from the independent variable.
The second step involves using the information on the conditional distribution of the dependent variable within categories of the independent variable to reduce errors in prediction.
Nominal Association – Proportional Reduction in Error
If there is a strong association, there will be a substantial reduction in error from knowing the joint distribution of X and Y.
If there is no association, there will be no reduction in error.
Nominal Association – PRE - Lambda
1
21
E
EE
Prediction rule: Predict that all cases fall the largest category.Where E1 is the number of prediction errors without knowing X.And E2 is the number of prediction errors knowing the distributionOn X.
Nominal Association – PRE - Lambda
E1 is calculated by subtracting the cases in the largest category of the row marginals from the total number of cases.
E2 is calculated by subtracting the largest category in each column from the column total and summing across columns of the independent variable.
Nominal Association – PRE - Lambda
Table 2. Home Ownership by Race in Birmingham, 2000
21 81 102
35.0% 51.3% 46.8%
39 77 116
65.0% 48.7% 53.2%
60 158 218
100.0% 100.0% 100.0%
Count
% within Race(Dichotomous)
Count
% within Race(Dichotomous)
Count
% within Race(Dichotomous)
Rent or lease
Own
Do you rent, lease or ownyour current residence?
Total
White Non-white
Race (Dichotomous)
Total
Nominal Association – PRE - Lambda
s.respondent about then informatio no knowingover 3.9%by homesown their not they or whether
predictingin error your reduced haveyou respondent theof race theknowingBy :tionInterpreta
.039.102
4
102
98102
.987721)81158()3960()()(
.102116218category Maximum
1
21
22112
1
E
EE
MaxNMaxNE
NE
Nominal Association – PRE - Lambda
The key problem with lambda occurs when the distribution on one of the variables is lopsided (many cases in one category and few cases in the others). Under those circumstances, lambda can equal zero, even when there is a relationship.
Nominal Association – PRE – Goodman and Kruskal’s Tau b
A better coefficient of association is Goodman and Kruskal’s tau b, which is also PRE.
Instead of assuming that the person making the prediction will always select the largest category in each column, tau b assumes that the researcher will select cases based on their actual distribution in the column.
For example, on a three-category variable with 40% of the cases in category 1, 30% in category 2, and 30% in category 3, E1 would be 60%N1+70%N2+70%N3.
Nominal Association – PRE – Goodman and Kruskal’s Tau b
k.column and i rowby defined cell in the cases ofnumber theN
k.column in cases ofnumber totaltheN
i. rowin cases ofnumber totaltheN
cases. ofnumber totaltheN
table.in the columns ofnumber totalthel
examined. beingnumber column thek
table.in the rows ofnumber totalthej
examined. beingnumber row thei
.t variableindependen theknowing errors ofnumber theE
.t variableindependen theknowing without errors ofnumber theE
where
ik
.k*
i*
**
2
1
1
21
...1 ...1 *
*2
...1 **
****1
E
EE
N
NNNE
N
NNNE
b
ji lk k
ikkik
ji
ii
Nominal Association – PRE – Goodman and Kruskal’s Tau b
.t variableindependen theknowingnot over 2.1%by variabledependent the
predictingin errorsyour reducecan you ,t variableindependen theknowingBy :tionInterpreta
.021.5504.108
3010.2
5504.108
2494.1065504.108
2494.1064747.394747.3965.1365.13
158
7715877
158
8115881
60
396039
60
216021
.5504.1082752.542752.54218
116218116
218
102218102
1
21
2
...1 ...1 *
*2
...1 **
****1
E
EE
E
N
NNNE
N
NNNE
b
ji lk k
ikkik
ji
ii
Nominal Association – PRE – Goodman and Kruskal’s Tau b
Directional Measures
.025 .077 .318 .750
.039 .121 .318 .750
.000 .000 .c
.c
.021 .019 .032d
.021 .019 .032d
Symmetric
Do you rent, lease orown your currentresidence? Dependent
Race (Dichotomous)Dependent
Do you rent, lease orown your currentresidence? Dependent
Race (Dichotomous)Dependent
Lambda
Goodman andKruskal tau
Nominal byNominal
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Cannot be computed because the asymptotic standard error equals zero.c.
Based on chi-square approximationd.
Nominal Association
The five-step test of significance for both the chi-square-based statistics and the PRE statistics is chi-square.
Ordinal Association
Some ordinal variables have many categories and look like interval variables. These can be called continuous ordinal variables. The appropriate ordinal association statistic is Spearman’s rho.
Some ordinal variables have only a few categories and can be called collapsed ordinal variables. The appropriate ordinal association statistic is gamma and the other tau-class statistics.
The Computation of Gamma and Other Tau-class Statistics
These measures of association compare each case to every other case. The total number of pairs of cases is equal to N(N-1)/2.
The Computation of Gamma and Other Tau-class Statistics
There are five classes of pairs: C or Ns: Pairs where one case is higher than the other
on both variables. D or Nd: Pairs where one case is higher on one
variable and lower on the other. Ty: Pairs tied on the dependent variable but not the
independent variable. Tx: Pairs tied on the independent variable but not the
dependent variable. Txy:Pairs tied on both variables.
The Computation of Gamma and Other Tau-class Statistics
To calculate C, start in the upper left cell and multiply the frequency in each cell by the total of all frequencies to right and below the cell in the table.
To calculate D, start in the upper right cell and multiply the frequency in each cell by the total of all frequencies to the left and below the cell in the table.
The Computation of Gamma and Other Tau-class Statistics
To calculate Tx, start in the upper left cell and multiply the frequency in each cell by the total of all frequencies directly below the cell.
To calculate Ty, start in the upper left cell and multiple the frequency in each cell by the total of all frequencies directly to the right of the cell.
The Computation of Gamma and Other Tau-class Statistics
To calculate Txy, start in the upper left cell and multiply N(N-1)/2 for each cell.
TyDCTxDC
DCbtausKendall
TyDC
DCdsSomer
DC
DC
yx
'
'
Gamma Example – JCHA 2000
How safe do you feel alone at night in your home? * Race Crosstabulation
13 9 22
48.1% 64.3% 53.7%
11 4 15
40.7% 28.6% 36.6%
2 0 2
7.4% .0% 4.9%
1 1 2
3.7% 7.1% 4.9%
27 14 41
100.0% 100.0% 100.0%
Count
% within Race
Count
% within Race
Count
% within Race
Count
% within Race
Count
% within Race
Very safe
Somewhat safe
Somewhat unsafe
Very unsafe
How safe doyou feel aloneat night in yourhome?
Total
White Black
Race
Total
Gamma Example – JCHA 2000
141.
8442.426
60
482378
60
2661387816213878
60'
159.378
60
16213878
13878'
278.216
60
13878
13878
820
176000165536782
)0(1
2
)0(1
2
)0(0
2
)1(2
2
)3(4
2
)10(11
2
)8(9
2
)12(13
2660445233182)1(0)10(4)104(9)1(2)12(11)1211(13
1621044117)1(1)0(2)4(11)9(13
138012126)1(0)12(4)1211(9
7821165)1(2)10(11)104(13
TyDCTxDC
DCbtausKendall
TyDC
DCdyxsSomer
DC
DC
PairsTotal
Txy
Tx
Ty
D
C
Gamma and its associated statistics can have a PRE interpretation.
Gamma Example – JCHA 2000Directional Measures
-.140 .146 -.952 .341
-.159 .166 -.952 .341
-.124 .131 -.952 .341
Symmetric
How safe do you feelalone at night in yourhome? Dependent
Race Dependent
Somers' dOrdinal by OrdinalValue
Asymp.Std. Error
aApprox. T
bApprox. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Symmetric Measures
-.141 .147 -.952 .341
-.278 .289 -.952 .341
41
Kendall's tau-b
Gamma
Ordinal byOrdinal
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
The Computation of Spearman’s Rho
Suppose we wished to test whether Democratic and Republican thermometer ratings from the NES are inversely related. Because these scores are measured on a 100-point scale, we can use Spearman’s rho test the strength of the relationship.
Fewer categorical ties.
The Computation of Spearman’s Rho
Spearman's Rho for Democratic and Republican Thermometer Scales
DEMO RankDem REP RankRep D D2
97 2 50 10 -8 6497 2 30 13 -11 12197 2 15 15 -13 16985 4.5 85 2 2.5 6.2585 4.5 70 6 -1.5 2.2580 6.5 75 4 2.5 6.2580 6.5 20 14 -7.5 56.2570 8 70 6 2 460 9.5 50 10 -0.5 0.2560 9.5 50 10 -0.5 0.2550 12 70 6 6 3650 12 50 10 2 450 12 85 2 10 10040 14 50 10 4 160 15 85 2 13 169
D2= 754.5
The Computation of Spearman’s Rho
.3473.0.3473.113360
45271
)115(15
5.75461
1
61
22
2
NN
Drs
The Five-Step Test for Gamma
Step 1. Making assumptions. Random sampling. Ordinal measurement. Normal sampling distribution.
Step 2 – Stating the null hypothesis. H0: γ=0.0
H1: γ 0.0
The Five-Step Test for Gamma
Step 3. Selecting the sampling distribution and establishing the critical region. Sampling distribution = Z distribution. Alpha = .05. Z (critical) = ±1.96.
The Five-Step Test for Gamma
Step 4. Computing the test statistic.
Step 5. Making a decision. Z(obtained) is less than Z(critical). Fail to reject the
null hypothesis that gamma is zero in the population. There is no relationship between race and the feeling of safety at home based on the JCHA 2000 sample.
.664.)3895.2(278.8314.37
216278.
278.141
13878278.
1)(
22
GN
DCGobtainedZ
The Five-Step Test for Spearman’s Rho.
Step 1. Making assumptions. Random sampling. Ordinal measurement. Normal sampling distribution.
Step 2 – Stating the null hypothesis. H0: ρs=0.0
H1: ρs 0.0
The Five-Step Test for Spearman’s Rho.
Step 3. Selecting the sampling distribution and establishing the critical region. Sampling distribution = t distribution. Alpha = .05. Degrees of freedom = N – 2 = 15 – 2 = 13 t (critical) = ±2.160.
The Five-Step Test for Spearman’s Rho.
Step 4. Computing the test statistic.
Step 5. Making a decision. T(obtained) < t(critical). Fail to reject the null
hypothesis. There is no relationship between the Democratic and Republican thermometers.
.3353.1)8449.3(3473.7831.143473.)(
8794.
133473.
)3473.(1
2153473.
1
2)(
22
obtainedt
r
Nrobtainedt
s
s