1 psych 5500/6500 chi-square (part two) test for association fall, 2008

20
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

Upload: derick-blake

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

1

Psych 5500/6500

Chi-Square (Part Two)

Test for Association

Fall, 2008

Page 2: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

2

Test for AssociationUsed to determine whether two variables are

associated (related). The variables are both categorical; which can be nominal, ordinal, or even cardinal scores divided into intervals.

H0: the variables are independent

Ha: the variables are associated

Page 3: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

3

Example

Deciduous Evergreen

Normal

Diseased

Parasites

We will begin with an example where the variables are ‘type of tree’ (deciduous or evergreen) and ‘condition of tree’ (normal, diseased, or has parasites). We sample 310 trees from a forest and note both what type of tree it is as well as its condition. Each tree must fall into one and only one of the six cells of the table (we will assume that a tree can’t both be diseased and have parasites at the same time).

Page 4: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

4

Deciduous Evergreen

Normal 52 48

Diseased 60 60

Parasites 74 16

N=310

As our variables are categorical in nature, the only thing we can really do with the data is to count how many trees fall into eachcategory (e.g. it makes no sense to find the mean condition of the trees). The data are given below, these are our observed frequencies.

Observed Frequencies

Page 5: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

5

Expected Frequencies

We have our observed frequencies, next we need to determine what the frequencies would look like if H0 were true and the variables were independent (i.e. not associated). Then, we can use Chi Square to see if our obtained frequencies differ significantly from the frequencies we would expect to get if H0 were true.

Page 6: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

6

IndependenceIf H0 is true and our variables are independent then

that means that knowing in which category a tree falls in one variable is of no help in predicting in which category it falls in the other variable.

In other words, if the variables are independent then knowing what type of tree it is (deciduous or evergreen) does not help us predict what condition the tree is in (normal, diseased, parasitic). And, knowing what condition the tree is in does not help us predict which type of tree it is. Let’s see what the frequencies would look like if the variables were independent.

Page 7: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

7

Deciduous Evergreen Row Total

Normal 52 48 100

Diseased 60 60 120

Parasites 74 16 90

Column Total

186 124 N=310

First, in this table I have calculated the total number of trees that were normal (100), diseased (120), and parasitic (90), which add up to 310 (the total number of trees). I have also calculated the total number of trees that were deciduous (186) and evergreen (124) which also add up to 310.

Page 8: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

8

Deciduous Evergreen Row Proportion

Normal 52 48 100/310 = .32

Diseased 60 60 120/310 = .39

Parasites 74 16 90/310 = .29

Column Total

186 124 Proportions sum to 1.00

Second, if we ignore the variable ‘type of tree’ we can see that overall 100 of the 310 trees were ‘normal’, so we can say that the proportion of trees that were normal is .32, or 32%. We can also see that .39 (39%) of the trees were diseased, and .29 (29%) had parasites.

Page 9: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

9

Independence

Third, this table shows what the proportions would look like if the two variables were independent, we can see that knowing which type of tree it is does not change the chances of it being normal, diseased, or parasitic.

Deciduous Evergreen Row Proportion

Normal .32 .32 .32

Diseased .39 .39 .39

Parasites .29 .29 .29

Page 10: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

10

Expected FrequenciesFourth, the expected frequencies are those we would

expect to get in each cell if H0 were true and the variables were independent. So the next step is to use the expected proportions (repeated below) to compute the expected frequencies if Ho were true (next slide)

Deciduous Evergreen Row Proportion

Normal .32 .32 .32

Diseased .39 .39 .39

Parasites .29 .29 .29

Page 11: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

11

If the variables are independent then 32% of the 186 deciduous trees would be normal, and 32% of the 124 evergreen trees would be normal, and so on. These are what the frequencies would be if the variables were independent. Note the number of deciduous trees still adds up to 186 and the number of evergreen to 124.

Deciduous Evergreen Row Proportion

Normal (186)(.32)= 59.52

(124)(.32)= 39.68

.32

Diseased (186)(.39)= 72.54

(124)(.39)= 48.36

.39

Parasites (186)(.29)= 53.94

(124)(.29)= 35.96

.29

Column Total

186 124

Page 12: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

12

Deciduous Evergreen Row Total Normal 52 48 100 Diseased 60 60 120 Parasites 74 16 90

Col Total 186 124 N=310

Deciduous Evergreen Row Total Normal 59.52 39.68 100 Diseased 72.54 48.36 120 Parasites 53.94 35.96 90

Col Total 186 124 N=310

Observed Frequencies (our actual data)

Expected Frequencies (if H0 true)

The obtained frequencies differ somewhat from the frequencies wewould expect if H0 were true, do they differ enough to reject H0?

Page 13: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

13

Chi Square Test for Association

2

j

jj2

E

EO

185.26

96.35

96.3516...

68.39

68.3948

52.59

52.5952 2222

This is the same formula as for ‘goodness of fit’. This time weapply it to the observed and expected frequencies from each cell.

The formula for degrees of freedom for the test for association: df = (# of rows – 1)(# of cols – 1)

Which in this example would be: df = (3-1)(2-1) =2

Page 14: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

14

Chi Square Test for AssociationIf H0 is true the mean value of χ²=df=2. If H0 is false then thevalue of χ² is expected to be greater than 2. How large doesχ² have to be do reject H0? With two degrees of freedomχ²critical=5.991. As χ²obtained=26.185 we easily reject H0 and concludethat there is a relationship (association) between the two variables ‘type of tree’ and ‘condition of tree’. In the standard format theresults would be χ²(2)=26.185, p<.001

Page 15: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

15

‘Eyeballing’ an Association

While Chi Square works with frequencies, it is not all that easy to look at a table of frequencies and guess whether the variables are associated or not.

Deciduous Evergreen Row Total Normal 52 48 100 Diseased 60 60 120 Parasites 74 16 90

Col Total 186 124 N=310

Table of Frequencies

Page 16: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

16

‘Eyeballing’ an Association

It is much easier to view the proportions or percents. If the sample exactly fits the null hypothesis then the columns would be identical. Here are the percentages from our example, the columns are not all the same, thus the variables may be associated (get a p value to make sure).

Deciduous Evergreen Row Total Normal 28% 39% 32% Diseased 32% 48% 39% Parasites 40% 13% 29% 100% 100% 100%

Table of Percents

Page 17: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

17

Effect Size: Cramer’s V

The value of χ² obtained and it’s corresponding p value are affected by both the strength of the association between the two variable and the size of N and thus are not direct indications of how strongly the two variables are associated. A measure that removes the effect of N, leaving just a measure of the strength of the relationship between the two variables is Cramer’s V.

Page 18: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

18

Cramer’s VThe formula for computing Cramer’s V is quite

simple:

columns ofnumber and

rows ofnumber theoflesser theis L'' Where

1)-N(LV

2obt

This will result in a value of V that is between 0 (no association between the variables) and 1 (the strongest possible association between the variables). V is a pure measure of strength of association (having removed the effect of N). By the way, why not use a formula that will result in a value between –1 and 1, as in correlation? Think about it.

Page 19: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

19

Cramer’s V in our Example

29.0084.01)-(310)(2

185.26

1)-N(LV

2obt

Page 20: 1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008

20

Strength of Association Examplesa1 a2 total

b1 20 40 60

b2 40 80 120

total 60 120

a1 a2 total

b1 60 0 60

b2 0 120 120

total 60 120

V=0

V=1.00