contingency tables prepared by yu-fen li 1. contingency table when working with nominal data that...

31
Contingency Tables Prepared by Yu-Fen Li 1

Upload: marc-packman

Post on 31-Mar-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Contingency Tables

Prepared by Yu-Fen Li

1

Page 2: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

contingency table

• When working with nominal data that have been grouped into categories, we often arrange the counts in a tabular format known as a contingency table (or cross-tabulation)– A r × c table (r rows and c columns)– In the simplest case, two dichotomous random

variables are involved; the rows of the table represent the outcomes of one variable, and the columns represent the outcomes of the other.

2

Page 3: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Example

• To examine the effectiveness of bicycle safety helmets, we wish to know whether these is an association between the incidence of head injury and the use of helmets among individuals who have been involved in accidents.

3

Page 4: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

What are the hypotheses?

H0: The proportion of persons suffering head injuries among the population of individuals wearing safety helmets at the accident is equal to the proportion of persons sustaining head injuries among those not wearing helmets

versusHA: The proportions of persons suffering head

injuries are not identical in the two populations

4

Page 5: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Chi-Square Test

• The first step in carrying out the test is to calculate the expected count for each cell of the contingency table, given that H0 is true

• The chi-square test compares the observed frequencies in each category of the contingency table (represented by O) with the expected frequencies in each category of the contingency table (represented by E) given the null hypothesis is true.

5

Page 6: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Chi-Square Test• It is used to determine whether the deviations between the

observed and the expected counts, O−E, are too large to be attributed to chance

– where rc is the number of cells in the table.• To ensure that the sample size is large enough to make this

approximation valid, – no cell in the table should have an expected count less than 1,

and – no more than 20% of the cells should have an expected count

less than 5.

6

Page 7: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

How to compute the expected values?

7

Observed (O) Expected (E)

Pr( 1 , 2 ) Pr( 1 ) Pr( 2 ) under Ho: 1 2

e.g. the expected value for 1 and 2 is

Pr( 1 , 2 ) Pr( 1 ) Pr( 2 )

( )( )

Var i Var j Var i Var j Var Var

Var yes Var yes

n Var yes Var yes n Var yes Var yes

a b a c a b a cn

n n n

Page 8: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Chi-square distributions

• A chi-square random variable cannot be negative; it assumes values from zero to infinity and is skewed to the right.

8Chi-square distributions with 2, 4, and 10 degrees of freedom

Page 9: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Yates correction

• We are using discrete observations to estimate χ2, a continuous distribution.

• The approximation is quite good when the degrees of freedom are big.

• We can apply a continuity correction (Yates correction) for a 2 × 2 table as

9

Page 10: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

TS formula for a 2 × 2 table

• For a 2 × 2 table in the general format shown below

10

22 2

1 1

2

2(1)

( )test statistic

( )

( )( )( )( )

~ , under Ho

ij ij

i j ij

O E

E

n ad bc

a b a c b d c d

Page 11: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Another TS formula for a 2 × 2 table

• the test statistic (TS) χ2 without continuity correction can be express as

• the test statistic (TS) χ2 with continuity correction can also be express as

11

22 2

1 1

22(1)

( )test statistic

( )~ , under Ho

( )( )( )( )

ij ij

i j ij

O E

E

n ad bc

a b a c b d c d

Page 12: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Example

• For the bicycle example we talked about earlier, if we apply the Yates correction, we would get

– the p-value is smaller than 0.001

12

Page 13: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Fisher’s exact test

• When the sample size is small, one can use Fisher’s exact test to obtain the exact probability of the observed frequencies in the contingency table, given that there is no association between the rows and columns and that the marginal totals are fixed.

• The details of this test is not presented here because the computations involved can be arduous.

13

Page 14: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

McNemar’s Test

• We cannot use the regular chi-square test for the matched data, as the previous chi-square test disregard the paired nature of the data. – We must take the pairing into account in our analysis.

• Consider a 2 × 2 table of observed cell counts about exposure status for a sample of n matched case-control pairs as follows

14

Page 15: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

McNemar’s Test

• If the data of interest in the contingency table are paired rather than independent, we use McNemar’s test to evaluate hypotheses about the data

• To conduct McNemar’s test for matched pairs, we calculate the test statistic

– where b and c are the number of discordant pairs

15

Page 16: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Independent vs dependent data

16

22 2

1 1

22(1)

( )test statistic

( )~ , under Ho

( )( )( )( )

ij ij

i j ij

O E

E

n ad bc

a b a c b d c d

22(1)

( )test statistic ~ , under Ho

b c

b c

Independent Data Dependent Data

Page 17: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Example: matched pairs

• Consider the following data taken from a study investigating acute myocardial infarction (MI) among Navajos in the US. – In this study, 144 MI cases were age- and gender-

matched with 144 individuals free of heart disease

17

Independent DataDependent Data

Page 18: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Example: matched pairs

• The test statistic is

– with 0.001 < p < 0.01. Since p is less than α = 0.05, we reject the null hypothesis.

• For the given population of Navajos, we conclude that if there is a difference between individuals who experience infarction and those who do not, victims of acute MI are more likely to suffer from diabetes than the individuals free from heart disease who have been matched on age and gender

18

Page 19: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Strength of the association

• The chi-square test allows us to determine whether an association exists between two independent nominal variables

• McNemar’s test does the same thing for paired dichotomous variables

• However, neither test provides us a measure of the strength of the association

19

Page 20: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Odds Ratio

• If an event occurs with probability p, the odds in favor of the event are p/(1−p) to 1.

– We can express an estimator of the OR as

20

Page 21: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Confidence Interval (CI) for Odds Ratio

• The cross-product ratio is simply a point estimate of the strength of association between two dichotomous variables.

• To gauge the uncertainty in this estimate, we must calculate a confidence interval (CI) as well; the width of the interval reflects the amount of variability in the estimate of OR

21

Page 22: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Confidence Interval (CI) for Odds Ratio

• When computing a CI for the OR, we must make the assumption of normality. – However, the probability distribution of the OR is

skewed to the right, and the relative odds can be any positive value between 0 and infinity.

• In contrast, the probability distribution of the natural logarithm of the OR, i.e. ln(OR), is more symmetric and approximately normal.

• Therefore, when calculating a CI for the OR, we typically work in the log scale.

22

Page 23: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Confidence Interval (CI) for Odds Ratio

• Besides, to ensure that the sample size is large enough, the expected value of each cell in the contingency table should be at least 5.

• a 95% CI for the natural logarithm of the OR is

– where

• a 95% CI for the OR itself is

23

Page 24: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Bicycle Example

• we reject the null hypothesis and conclude that wearing a safety helmet at the accident is protective to the head injury

24

1 1 1 117 218 130 428

1.360 1.96 0.27 1.360 1.96 0.27

1.889 0.831

17 4280.257

218 130

ln( ) 1.360, (ln( )) 0.27

95%CI of OR is (e ,e ), or

(e ,e ), i.e. (0.151,0.436)

OR

OR SE OR

Page 25: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

The Odds Ratio and 95% CI for matched pairs

• An OR can be calculated to estimate the strength of association between two paired dichotomous variables

• a 95% CI for the OR itself is

25

Page 26: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

MI and DM example

26

1 137 16

0.837 1.96 0.299 0.837 1.96 0.299

0.251 1.423

372.31

16

ln( ) 0.837, (ln( )) 0.288

95%CI of OR is (e ,e ), or

(e ,e ), i.e. (1.29,4.15)

OR

OR SE OR

Page 27: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Berkson’s Fallacy

• Berkson’s Fallacy is a common type of bias in case-control studies in particular hospital-based and practice-based studies. – It occurs due to differential admission rates

between cases and controls.

• This leads to positive (and spurious) associations between exposure and the case control status with the lowest admission rate.

27

Page 28: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

Example : Berkson’s Fallacy

28

Hospitalized patientsHospitalized patients +

nonhospitalized subjects

individuals who have a disease ofthe circulatory system are more likely to suffer from respiratory illness than individuals who do not

there is no association between thetwo diseases

Page 29: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

What happened?

• Why do the conclusions drawn from these two samples differ so drastically?

• To answer this question, we must consider the rates of hospitalization that occur within each of the four disease subgroups:

29

Page 30: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

What happened?

• individuals with both circulatory and respiratory disease are more likely to be hospitalized than individuals in any of the three other

• subjects with circulatory disease are more likely to be hospitalized than those with respiratory illness.

• Therefore, the conclusions will be biased if we only sample patients who are hospitalized

30

Page 31: Contingency Tables Prepared by Yu-Fen Li 1. contingency table When working with nominal data that have been grouped into categories, we often arrange

What’s the lesson?

• We observe an association that does not actually exist.

• This kind of spurious relationship among variables – which is evident only because of the way in which the sample was chosen – is known as Berkson’s fallacy– the sample must be representative

31