ch2. contingency tables 2 - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung1/3.pdfโ€“...

28
Ch2. Contingency Tables_2 Namhyoung Kim Dept. of Applied Statistics Gachon University [email protected] 1

Upload: others

Post on 04-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Ch2. Contingency Tables_2

    Namhyoung Kim

    Dept. of Applied Statistics

    Gachon University

    [email protected]

    1

  • 2.3 The Odds Ratio

    โ€ข For a probability of success ๐œ‹๐œ‹, ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ = ๐œ‹๐œ‹/(1 โˆ’ ๐œ‹๐œ‹) =prob. of success/prob. of failure โ€ข The odds are nonnegative

    ๐œ‹๐œ‹ =๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ

    ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ + 1

    โ€ข In 2x2 tables, ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ1 = ๐œ‹๐œ‹1/(1 โˆ’ ๐œ‹๐œ‹1) and ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ2 =๐œ‹๐œ‹2/(1 โˆ’ ๐œ‹๐œ‹2)

    โ€ข The odds ratio ๐œƒ๐œƒ: another measure of association

    ๐œƒ๐œƒ =๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ

    =๐œ‹๐œ‹1/(1 โˆ’ ๐œ‹๐œ‹1)๐œ‹๐œ‹2/(1 โˆ’ ๐œ‹๐œ‹2)

    2

  • Properties of the Odds Ratio

    3

    โ€ข The odds ratio can equal any nonnegative number.

    โ€ข When X and Y are independent, ๐œ‹๐œ‹1 = ๐œ‹๐œ‹2 odds1=odds2 and ๐œƒ๐œƒ = 1

    โ€ข When ๐œƒ๐œƒ >1, the odds of success are higher in row 1 than in row 2.

    โ€ข Values of ๐œƒ๐œƒ father from 1.0 in a given direction represent stronger association.

  • Properties of the Odds Ratio

    4

    โ€ข When one value is the inverse of the other represent the same strength of association, but in opposite direction ๐œƒ๐œƒ=0.25 is equivalent to ๐œƒ๐œƒ=1/0.25=4

    โ€ข The odds ratio does not change value when the table orientation reverses โ€“ it is unnecessary to identify one

    classification as a response variable in order to estimate ๐œƒ๐œƒ (cf. the relative risk requires this)

  • Properties of the Odds Ratio

    5

    โ€ข When both variables are response variables ๐œƒ๐œƒ = ๐œ‹๐œ‹11/๐œ‹๐œ‹12

    ๐œ‹๐œ‹21/๐œ‹๐œ‹22= ๐œ‹๐œ‹11๐œ‹๐œ‹22

    ๐œ‹๐œ‹12๐œ‹๐œ‹21

    โ€ข The odds ratio is also called the cross-product ratio.

    โ€ข The sample odds ratio

    ๐œƒ๐œƒ๏ฟฝ =๐‘๐‘1/(1 โˆ’ ๐‘๐‘1)๐‘๐‘2/(1 โˆ’ ๐‘๐‘2)

    =๐‘›๐‘›11/๐‘›๐‘›12๐‘›๐‘›21/๐‘›๐‘›22

    =๐‘›๐‘›11๐‘›๐‘›22๐‘›๐‘›12๐‘›๐‘›21

    โ€ข This is the ML estimator of ๐œƒ๐œƒ

  • Example: Odds Ratio for Aspirin Use and Heart Attacks

    โ€ข For the physicians taking placebo, the estimated odds of MI : n11/n12=189/10845=0.0174

    โ€ข For those taking aspirin : 104/10933=0.0095 โ€ข The sample odds ratio ๏ฟฝฬ‚๏ฟฝ๐œƒ =0.0174/0.0095=1.832 The estimated

    odds were 83% higher for the placebo group

    6

  • Inference for Odds Ratios and Log Odds Ratios

    โ€ข Unless the sample size is extremely large, the sampling distribution of the odds ratio is highly skewed. (positive skew, skewed to the right)

    โ€ข Because of this skewness, use an alternative but equivalent measure log(๐œƒ๐œƒ)

    โ€ข independence corresponds to log(๐œƒ๐œƒ)=0 โ€ข The log odds ratio is symmetric about

    zero

    7

  • Inference for Odds Ratios and Log Odds Ratios

    โ€ข Its approximating normal dist. has a mean of log(๐œƒ๐œƒ) and a SE

    ๐‘†๐‘†๐‘†๐‘† =1๐‘›๐‘›11

    +1๐‘›๐‘›12

    +1๐‘›๐‘›21

    +1๐‘›๐‘›22

    โ€ข C.I. for log(๐œƒ๐œƒ) log ๐œƒ๐œƒ๏ฟฝ ยฑ ๐‘ง๐‘ง๐›ผ๐›ผ

    2(๐‘†๐‘†๐‘†๐‘†)

    โ€ข Exponentiating endpoints of this C.I. yields one for ๐œƒ๐œƒ

    8

  • Inference for Odds Ratios and Log Odds Ratios

    โ€ข For Table 2.3, log(1.832)=0.605

    โ€ข ๐‘†๐‘†๐‘†๐‘† = 1189

    + 110933

    + 1104

    + 110845

    = 0.๐‘œ๐‘œ3

    โ€ข a 95% C.I. for log๐œƒ๐œƒ equals 0.605 ยฑ1.96(0.123) or (0.365,0.846) โ€ข the corresponding C.I. for ๐œƒ๐œƒ is [exp(0.365), exp(0.846)]=(1.44, 2.33)

    9

  • Inference for Odds Ratios and Log Odds Ratios

    โ€ข The sample odds ratio ๐œƒ๐œƒ๏ฟฝ equals 0 or โˆž if any ๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–=0, and it is undefined if both entries in a row or column are zero.

    โ€ข The slightly amended estimator

    ๐œƒ๐œƒ๏ฟฝ =(๐‘›๐‘›11 + 0.5)(๐‘›๐‘›22 + 0.5)(๐‘›๐‘›12 + 0.5)(๐‘›๐‘›21 + 0.5)

    10

  • Relationship Between Odds Ratio and Relative Risk

    โ€ข Odds ratio= ๐‘๐‘1/(1โˆ’๐‘๐‘1)๐‘๐‘2/(1โˆ’๐‘๐‘2)

    = ๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘… ๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘œ๐‘œ๐‘Ÿ๐‘Ÿ ร— (1โˆ’๐‘๐‘2)(1โˆ’๐‘๐‘1)

    โ€ข When ๐‘๐‘1 and ๐‘๐‘2 are both close to zero, the fraction in the last term of this expression equals approximately 1.0 odds ratio and relative risk take similar values

    โ€ข For Table 2.3, the sample odds ratio of 1.83 is similar to the sample relative risk of 1.82

    โ€ข In such a case, an odds ratio of 1.83 does mean that ๐‘๐‘1 is approximately 1.83 times ๐‘๐‘2

    11

  • The Odds Ratio Applies in Case-Control Studies

    โ€ข The marginal dist. of MI is fixed by the sampling design. (each case was matched with two control patients)

    โ€ข The outcome measured for each subject is whether she was a smoker

    โ€ข The study, which uses a retrospective design to look into the past, is called a case-control study โ€“ common in health-related applications

    12

  • The Odds Ratio Applies in Case-Control Studies

    โ€ข estimate the conditional distribution of smoking status, given MI status. โ€“ for women suffering MI, 172/262=0.656 โ€“ for women who had not suffered MI,

    173/519=0.333 โ€ข the sample odds ratio is [0.656/(1-

    0.656)]/[0.333/(1-0.333)]=(172x346)/(173x90)=3.8 โ€ข if we expect P(Y=1|X) to be small, then the

    sample odds ratio as a rough indication of the relative risk women who had ever smoked were about four times as likely to suffer MI as women who had never smoked.

    13

  • Types of Observational Studies

    โ€ข retrospective design(ํ›„ํ–ฅ์  ์—ฐ๊ตฌ์„ค๊ณ„) โ€“ case-control study

    โ€ข prospective design(์ „ํ–ฅ์  ์—ฐ๊ตฌ์„ค๊ณ„) โ€“ cohort study โ€“ clinical trials

    โ€ข cross-sectional design(ํšก๋‹จ์—ฐ๊ตฌ์„ค๊ณ„)

    โ€ข Observational study โ€“ case-control, cohort, and cross-sectional design

    โ€ข Experimental study โ€“ a clinical trial

    14

  • 2.4 Chi-Squared Tests of Independence

    โ€ข Consider the null hypothesis (H0) that cell probabilities equal certain fixed value {๐œ‹๐œ‹๐‘–๐‘–๐‘–๐‘–}

    โ€ข For a sample size n with cell counts {๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–}, the values {๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘– = ๐‘›๐‘›๐œ‹๐œ‹๐‘–๐‘–๐‘–๐‘–} are expected frequencies.

    โ€ข To judge whether the data contradict H0, we compare {๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–} to {๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–}

    โ€ข The larger the differences {๐‘›๐‘›๐‘–๐‘–๐‘–๐‘– โˆ’ ๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–}, the stronger the evidence against H0.

    15

  • Pearson Statistics and the Chi-squared Distribution

    โ€ข The Pearson chi-squared statistic for testing H0

    โ€ข ๐‘‹๐‘‹2 = โˆ‘ (๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–โˆ’๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–)2

    ๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–

    โ€ข This statistic takes its minimum value of zero when all ๐‘›๐‘›๐‘–๐‘–๐‘–๐‘– = ๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–

    โ€ข For a fixed sample size, greater differences {๐‘›๐‘›๐‘–๐‘–๐‘–๐‘– โˆ’ ๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–} produce larger ๐‘‹๐‘‹2 values and stronger evidence against H0

    16

  • Pearson Statistics and the Chi-squared Distribution

    โ€ข The ๐‘‹๐‘‹2 statistic has approximately a chi-squared distribution, for large n.

    17

  • Pearson Statistics and the Chi-squared Distribution

    โ€ข The chi-squared approximation improves as {๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–} increase, and {๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘– โ‰ฅ5} is usually sufficient

    โ€ข The chi-squared dist. is concentrated over nonnegative values.

    โ€ข It has mean equal to its degrees of freedom(df), and its standard deviation equals (๐‘œ๐‘œ๐‘œ๐‘‘๐‘‘)

    โ€ข The distribution is skewed to the right, but it becomes more bell-shaped(normal) as df increases.

    โ€ข the df value equals the difference between the number of parameters in the alternative hypothesis and in the null hypothesis.

    18

  • Likelihood-Ratio Statistic โ€ข likelihood function: the probability of the data, viewed

    as a function of the parameter once the data are observed

    โ€ข The likelihood-ratio method for significance tests test statistics uses the ratio of the maximized likelihoods :

    โˆ’๐‘œ log๐‘š๐‘š๐‘…๐‘…๐‘š๐‘š๐‘…๐‘…๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘…๐‘…๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘™๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ ๐‘ค๐‘ค๐‘™๐‘…๐‘…๐‘›๐‘› ๐‘๐‘๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘š๐‘š๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘œ๐‘œ ๐‘œ๐‘œ๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘œ๐‘œ๐‘‘๐‘‘๐‘ ๐‘  ๐ป๐ป0

    ๐‘š๐‘š๐‘…๐‘…๐‘š๐‘š๐‘…๐‘…๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘…๐‘…๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘™๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ ๐‘ค๐‘ค๐‘™๐‘…๐‘…๐‘›๐‘› ๐‘๐‘๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘š๐‘š๐‘…๐‘…๐‘…๐‘…๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘œ๐‘œ ๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘… ๐‘š๐‘š๐‘›๐‘›๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘œ๐‘œ๐‘…๐‘…๐‘Ÿ๐‘Ÿ๐‘…๐‘…๐‘ข๐‘ข๐‘…๐‘…๐‘…๐‘…๐‘œ๐‘œ

    โ€ข For two-way contingency tables with the multinomial dist., the likelihood-ratio statistic simplifies to

    ๐บ๐บ2 = ๐‘œโˆ‘๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–log (๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘–

    )

    โ€ข This statistic is called the likelihood-ratio chi-squared statistic.

    19

  • Tests of Independence

    โ€ข The null hypothesis of statistical independence is

    H0 : ๐œ‹๐œ‹๐‘–๐‘–๐‘–๐‘– = ๐œ‹๐œ‹๐‘–๐‘–+๐œ‹๐œ‹+๐‘–๐‘– for all i and j โ€ข the expected frequency ๐œ‡๐œ‡๐‘–๐‘–๐‘–๐‘– = ๐‘›๐‘›๐œ‹๐œ‹๐‘–๐‘–๐‘–๐‘– =๐‘›๐‘›๐œ‹๐œ‹๐‘–๐‘–+๐œ‹๐œ‹+๐‘–๐‘–

    โ€ข estimated expected frequencies ๏ฟฝฬ‚๏ฟฝ๐œ‡๐‘–๐‘–๐‘–๐‘– = ๐‘›๐‘›๐‘๐‘๐‘–๐‘–+๐‘๐‘+๐‘–๐‘– = ๐‘›๐‘›

    ๐‘›๐‘›๐‘–๐‘–+๐‘›๐‘›

    ๐‘›๐‘›+๐‘–๐‘–๐‘›๐‘›

    =๐‘›๐‘›๐‘–๐‘–+๐‘›๐‘›+๐‘–๐‘–๐‘›๐‘›

    20

  • Tests of Independence

    โ€ข For testing independence in IxJ contingency tables, the Pearson and likelihood-ratio statistics equal

    โ€ข ๐‘‹๐‘‹2 = โˆ‘ (๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–โˆ’๐œ‡๐œ‡๏ฟฝ๐‘–๐‘–๐‘–๐‘–)2

    ๐œ‡๐œ‡๏ฟฝ๐‘–๐‘–๐‘–๐‘–,๐บ๐บ2 = ๐‘œโˆ‘๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–log (

    ๐‘›๐‘›๐‘–๐‘–๐‘–๐‘–๐œ‡๐œ‡๏ฟฝ๐‘–๐‘–๐‘–๐‘–

    )

    โ€ข Their large-sample chi-squared dist. have df=(I-1)(J-1)

    21

  • Example: Gender Gap in Political Affiliation

    22

  • Example: Gender Gap in Political Affiliation

    23

  • Residuals for Cells in a Contingency Table

    โ€ข For the test of independence, a useful cell residual is

    ๐‘›๐‘›๐‘–๐‘–๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐œ‡๐‘–๐‘–๐‘–๐‘–๏ฟฝฬ‚๏ฟฝ๐œ‡๐‘–๐‘–๐‘–๐‘–(1 โˆ’ ๐‘๐‘๐‘–๐‘–+)(1 โˆ’ ๐‘๐‘+๐‘–๐‘–)

    โ€ข The ratio is called a standardized residual. โ€ข When H0 is true, each standardized

    residual has a large-sample standard normal distribution.

    24

  • โ€ข Positive residuals for female Democrats and male Republicans more female Democrats and male Republicans than the hypothesis of independence predicts

    Residuals for Cells in a Contingency Table

    25

  • Partitioning Chi-Squared

    โ€ข One chi-squared statistic with df1 + a separate, independent, chi-squared statistic with df2 = a chi-squared distribution with df1+df2 โ€“ For example, suppose we have two 2x3

    tables, then the sum of the ๐‘‹๐‘‹2 or ๐บ๐บ2 values from the two tables is a chi-squared statistic with df=2+2=4

    26

  • Partitioning Chi-Squared

    โ€ข Chi-squared statistics having df>1 can be broken into components with fewer degrees of freedom. โ€“ For testing independence in 2xJ tables,

    df=(J-1) and a chi-squared statistic can partition into J-1 components

    27

  • Comments About Chi-Squared Tests

    โ€ข limitations โ€“ merely indicate the degree of evidence for

    an association โ€“ require large samples โ€“ treat both classifications as nominal

    28

    Ch2. Contingency Tables_22.3 The Odds RatioProperties of the Odds RatioProperties of the Odds RatioProperties of the Odds Ratio Example: Odds Ratio for Aspirin Use and Heart AttacksInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosRelationship Between Odds Ratio and Relative RiskThe Odds Ratio Applies in Case-Control StudiesThe Odds Ratio Applies in Case-Control StudiesTypes of Observational Studies2.4 Chi-Squared Tests of IndependencePearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionLikelihood-Ratio StatisticTests of IndependenceTests of IndependenceExample: Gender Gap in Political AffiliationExample: Gender Gap in Political AffiliationResiduals for Cells in a Contingency TableResiduals for Cells in a Contingency TablePartitioning Chi-SquaredPartitioning Chi-SquaredComments About Chi-Squared Tests