mc namer test of correlation

35
EPI809/Spring 2008 EPI809/Spring 2008 1 Fisher’s Exact Test Fisher’s Exact Test Fisher’s Exact Test Fisher’s Exact Test is a test for is a test for independence in a 2 X 2 table. It is independence in a 2 X 2 table. It is most useful when the total sample most useful when the total sample size and the expected values are size and the expected values are small. The test holds the marginal small. The test holds the marginal totals fixed and computes the totals fixed and computes the hypergeometric probability that n hypergeometric probability that n 11 11 is at least as large as the observed is at least as large as the observed value value Useful when E(cell counts) < 5. Useful when E(cell counts) < 5.

Upload: saad-niazi

Post on 19-Nov-2014

150 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 11

Fisher’s Exact TestFisher’s Exact Test

Fisher’s Exact TestFisher’s Exact Test is a test for is a test for independence in a 2 X 2 table. It is most independence in a 2 X 2 table. It is most useful when the total sample size and the useful when the total sample size and the expected values are small. The test holds expected values are small. The test holds the marginal totals fixed and computes the the marginal totals fixed and computes the hypergeometric probability that nhypergeometric probability that n1111 is at is at

least as large as the observed valueleast as large as the observed value Useful when E(cell counts) < 5.Useful when E(cell counts) < 5.

Page 2: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 22

Hypergeometric distributionHypergeometric distribution Example: 2x2 table with cell counts a, b, c, d. Assuming Example: 2x2 table with cell counts a, b, c, d. Assuming

marginal totals are fixed:marginal totals are fixed:M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.for convenience assume N1<N2, M1<M2.for convenience assume N1<N2, M1<M2.possible value of a are: 0, 1, …min(M1,N1). possible value of a are: 0, 1, …min(M1,N1).

Probability distribution of cell count a follows a Probability distribution of cell count a follows a hypergeometric distribution:hypergeometric distribution:N = a + b + c + d = N1 + N2 = M1 + M2N = a + b + c + d = N1 + N2 = M1 + M2

Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!] Mean (x) = M1N1/ NMean (x) = M1N1/ N Var (x) = M1M2N1N2 / [NVar (x) = M1M2N1N2 / [N22(N-1)](N-1)]

Fisher exact test is based on this hypergeometric distr. Fisher exact test is based on this hypergeometric distr.

Page 3: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 33

Fisher’s Exact Test ExampleFisher’s Exact Test Example

Is HIV Infection related to Hx of STDs in Sub Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% Saharan African Countries? Test at 5% level.level.

yesyes nono totaltotal

yesyes 33 77 1010

nono 55 1010 1515

totaltotal 88 1717

HIV Infection

Hx of STDs

Page 4: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 44

Hypergeometric prob.Hypergeometric prob.

Probability of observing this specific table Probability of observing this specific table given fixed marginal totals isgiven fixed marginal totals isPr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]= 0.3332= 0.3332

Note the above is not the p-value. Why?Note the above is not the p-value. Why? Not the accumulative probability, or not the Not the accumulative probability, or not the

tail probability.tail probability. Tail prob = sum of all values (a = 3, 2, 1, Tail prob = sum of all values (a = 3, 2, 1,

0). 0).

Page 5: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 55

Hypergeometric probHypergeometric prob

Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]= 0.2082= 0.2082

Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]= 0.0595= 0.0595

Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]= 0.0059= 0.0059

Tail prob = .3332+.2082+.0595+.0059 Tail prob = .3332+.2082+.0595+.0059 = .6068= .6068

Page 6: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 66

Data dis;Data dis;input STDs $ HIV $ count;input STDs $ HIV $ count;cards;cards;no no 10 no no 10 No Yes 5No Yes 5yes no 7yes no 7yes yes 3yes yes 3;;run;run;

proc freq data=dis order=data;proc freq data=dis order=data; weight Count;weight Count; tables STDs*HIV/chisq fisher;tables STDs*HIV/chisq fisher;run;run;

Fisher’s Exact Test Fisher’s Exact Test SAS CodesSAS Codes

Page 7: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 77

Pearson Chi-squares test Pearson Chi-squares test Yates correction Yates correction

Pearson Chi-squares testPearson Chi-squares test

χχ22 = ∑ = ∑ii (O (Oii-E-Eii))22/E/Ei i follows a chi-squares follows a chi-squares

distribution with df = (r-1)(c-1)distribution with df = (r-1)(c-1)

if Eif Eii ≥ 5. ≥ 5.

Yates correction for more accurate p-valueYates correction for more accurate p-value

χχ22 = ∑ = ∑ii (|O (|Oii-E-Eii| - 0.5)| - 0.5)22/E/Ei i

when Owhen Oi i and Eand Ei i are close to each other.are close to each other.

Page 8: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 88

Fisher’s Exact Test SAS OutputFisher’s Exact Test SAS Output Statistics for Table of STDs by HIV

Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350

WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.

Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263

Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000

Page 9: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 99

Fisher’s Exact TestFisher’s Exact Test

The output consists of three p-values: Left: Use this when the alternative to

independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right.

Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right.

2-Tail: Use this when there is no prior alternative.

Page 10: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1010

Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data

Cohen’s Kappa ( Cohen’s Kappa ( ) ) Also referred to as Cohen’s General Index of Also referred to as Cohen’s General Index of

Agreement. It was originally developed to Agreement. It was originally developed to assess the degree of agreement between two assess the degree of agreement between two judges or raters assessing n items on the judges or raters assessing n items on the basis of a nominal classification for 2 basis of a nominal classification for 2 categories. Subsequent work by Fleiss and categories. Subsequent work by Fleiss and Light presented extensions of this statistic to Light presented extensions of this statistic to more than 2 categories.more than 2 categories.

Page 11: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1111

Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data

Cohen’s Kappa ( Cohen’s Kappa ( ) )

.33 (a)

.07 (b)

.13 (c)

.47 (d)

Inspector A

Good Bad

Inspector ‘B’

Bad

Good .40 (pB)

.60 (qB)

.46 .54 1.00 (pA) (qA)

Page 12: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1212

Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data

Cohen’s Kappa ( Cohen’s Kappa ( ) ) Cohen’s Cohen’s requires that we calculate two requires that we calculate two

values:values:• ppoo : the proportion of cases in which agreement : the proportion of cases in which agreement

occurs. In our example, this value equals 0.80.occurs. In our example, this value equals 0.80.• Pe : the proportion of cases in which agreement Pe : the proportion of cases in which agreement

would have been expected due purely to chance, would have been expected due purely to chance, based upon the marginal frequencies; wherebased upon the marginal frequencies; where

pe = pApB + qAqB = 0.508 for our data

Page 13: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1313

Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data

Cohen’s Kappa ( Cohen’s Kappa ( ) ) Then, Cohen’s Then, Cohen’s measures the measures the

agreement between two variables and is agreement between two variables and is defined bydefined by

=po - pe

1 - pe

= 0.593

Page 14: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1414

Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data

Cohen’s Kappa ( Cohen’s Kappa ( ) ) To test the Null Hypothesis that the true To test the Null Hypothesis that the true

kappa kappa = 0, we use the Standard Error: = 0, we use the Standard Error:

then z = then z = //~N(0,1)~N(0,1)

=1

(1 - pe) npe + pe

2 - pi.p.i (pi.+ p.i)i = 1

k

where pi. & p.i refer to row and column proportions (in textbook, ai= pi. & bi=p.i)

Page 15: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1515

Useful Measures of Association Useful Measures of Association - Nominal Data- SAS CODES- Nominal Data- SAS CODES

DataData kap; kap;input B $ A $ prob;input B $ A $ prob;n=n=100100;;count=prob*n;count=prob*n;cards;cards;Good Good .33 Good Good .33 Good Bad .07Good Bad .07Bad Good .13Bad Good .13Bad Bad .47Bad Bad .47;;runrun;;

procproc freqfreq data=kap order=data; data=kap order=data; weight Count;weight Count; tables B*A/chisq;tables B*A/chisq; test kappa;test kappa;runrun;;

Page 16: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1616

Useful Measures of Association Useful Measures of Association

- Nominal Data-- Nominal Data- SAS OUTPUT SAS OUTPUT

The FREQ Procedure

Statistics for Table of B by A

Simple Kappa Coefficient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Kappa 0.5935 ASE 0.0806 95% Lower Conf Limit 0.4356 95% Upper Conf Limit 0.7514

Test of H0: Kappa = 0

ASE under H0 0.0993 Z 5.9796 One-sided Pr > Z <.0001 Two-sided Pr > |Z| <.0001

Sample Size = 100

Page 17: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1717

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Page 18: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1818

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

The approximate test previously presented for The approximate test previously presented for assessing a difference in proportions is based upon assessing a difference in proportions is based upon the assumption that the two samples are the assumption that the two samples are independentindependent..

Suppose, however, that we are faced with a situation Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all imagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20% sampled peoples; and on a post-test, we find that 20% have flu. have flu.

Basis / Rationale for the TestBasis / Rationale for the Test

Page 19: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 1919

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

We might be tempted to suppose that no hypothesis We might be tempted to suppose that no hypothesis test is required under these conditions, in that the test is required under these conditions, in that the ‘Before’ and ‘After’ ‘Before’ and ‘After’ pp values are identical, and would values are identical, and would surely result in a test statistic value of 0.00.surely result in a test statistic value of 0.00.

The problem with this thinking, however, is that the two The problem with this thinking, however, is that the two sample sample pp values are dependent, in that each person values are dependent, in that each person was assessed twice. It was assessed twice. It isis possible that the 20 people possible that the 20 people that had flu originally still had flu. It is also possible that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test that the 20 people that had flu on the second test were were a completely different seta completely different set of 20 people! of 20 people!

Page 20: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2020

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

It is for precisely this type of situation that It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) Proportions is applicable.Proportions is applicable.

McNemar’s Test employs two unique features for McNemar’s Test employs two unique features for testing the two proportions:testing the two proportions:

* a special fourfold contingency table; with a* a special fourfold contingency table; with a

* special-purpose chi-square (* special-purpose chi-square ( 22) test statistic ) test statistic (the approximate test).(the approximate test).

Page 21: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2121

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Nomenclature for the Fourfold (2 x 2) Nomenclature for the Fourfold (2 x 2) Contingency TableContingency Table

AA BB

DDCC

(B + D)(B + D)(A + C)(A + C)

(C + D)(C + D)

(A + B)(A + B)

nn

where(A+B) + (C+D) =(A+C) + (B+D) = n = number of units evaluated

and where

df = 1

Page 22: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2222

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

1. Construct a 2x2 table where the paired 1. Construct a 2x2 table where the paired observations are the sampling units.observations are the sampling units.

2. Each observation must represent a single 2. Each observation must represent a single joint event possibility; that is, classifiable in joint event possibility; that is, classifiable in only one cell of the contingency table.only one cell of the contingency table.

3. In it’s Exact form, this test may be conducted 3. In it’s Exact form, this test may be conducted as a One Sample Binomial for the B & C cellsas a One Sample Binomial for the B & C cells

Underlying Assumptions of the TestUnderlying Assumptions of the Test

Page 23: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2323

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

4. The expected frequency (4. The expected frequency (ffee) for the B ) for the B

and C cells on the contingency table and C cells on the contingency table must be equal to or greater than 5; must be equal to or greater than 5; where where

ffe e = (B + C) / 2 = (B + C) / 2

from the Fourfold tablefrom the Fourfold table

Underlying Assumptions of the TestUnderlying Assumptions of the Test

Page 24: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2424

McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) ProportionsProportions

Sample ProblemSample ProblemA randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.

Page 25: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2525

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Sample ProblemSample ProblemThe summary data for this study appear as follows:

Number ofStudents

Status BeforeSession

Status AfterSession

4 Fail Fail4 Pass Fail

56 Fail Pass56 Pass Pass

Page 26: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2626

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

The data are then entered into the Fourfold Contingency table:

5656 5656

4444

60606060

88

112112

120120

BeforePass Fail

Pass

Fail

After

Page 27: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2727

Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses

HH00 : : = =

HH11 : :

where where and and relate to the proportion of observations reflecting relate to the proportion of observations reflecting changeschanges in in

status (the B & C cells in the table)status (the B & C cells in the table)

Step II : Step II : 0.050.05

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Page 28: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2828

Step III : State the Associated Test StatisticStep III : State the Associated Test Statistic

ABS (B - C) - 1 }2

2 = B + C

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Page 29: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 2929

Step IV : State the distribution of the Test Statistic When HStep IV : State the distribution of the Test Statistic When Hoo is True is True

22 = = 22 with 1 with 1 dfdf when H when Hoo is True is True

d

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Page 30: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3030

Step V : Reject Ho if ABS (2 ) > 3.84

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

X² Distribution

X²03.84

Chi-Square Curve w. df=1

Page 31: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3131

Step VI : Calculate the Value of the Test StatisticStep VI : Calculate the Value of the Test Statistic

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

ABS (56 - 4) - 1 }2

2 = = 43.35 56 + 4

Page 32: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3232

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS Codes(Dependent) Proportions-SAS Codes

DataData test; test;input Before $ After $ count;input Before $ After $ count;cards;cards;pass pass 56 pass pass 56 pass fail 56pass fail 56fail pass 4fail pass 4fail fail 4;fail fail 4;runrun;;

procproc freqfreq data=test order=data; data=test order=data; weight Count;weight Count; tables Before*After/tables Before*After/agreeagree;;runrun;;

Page 33: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3333

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS Output(Dependent) Proportions-SAS Output

Statistics for Table of Before by After

McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001

Sample Size = 120

Without the correction

Page 34: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3434

Conclusion: What we have learnedConclusion: What we have learned

1.1. Comparison of binomial proportion using Z and Comparison of binomial proportion using Z and 22 Test. Test.

2.2. Explain Explain 22 Test for Independence of 2 variables Test for Independence of 2 variables

3.3. Explain The Fisher’s test for independenceExplain The Fisher’s test for independence

4.4. McNemar’s tests for correlated dataMcNemar’s tests for correlated data

5.5. Kappa StatisticKappa Statistic

6.6. Use of SAS Proc FREQ Use of SAS Proc FREQ

Page 35: Mc namer test of correlation

EPI809/Spring 2008EPI809/Spring 2008 3535

Conclusion: Further readingsConclusion: Further readings

Read textbook for Read textbook for

1.1. Power and sample size calculationPower and sample size calculation

2.2. Tests for trendsTests for trends