making inferences for associations between categorical variables: chi square chapter 12 reading...

Making Inferences for Associations Between

Categorical Variables: Chi Square

Chapter 12

Reading Assignment

pp. 463-482; 485

Elements of a test of hypotheses 3

Hypothesis testing: Process for finding out whether we can generalize about an association from a sample to a population

Null hypothesis : (H_0) Represents the status quo to the party performing the sampling experiment, i.e., will be accepted unless the data provides convincing evidence it is false.

Research hypothesis: (H_1) (aka alternative hypothesis) Will be accepted only if the data provides convincing evidence of its truth

Homework: Skills 1, p. 464

Process of Hypothesis Testing 5

Step 1: Specify a research hypothesis and a null hypothesis

Step 2: Compute the value of a test statistic for the relationship

Step 3: Calculate the degrees of freedom for the variables involved

Step 4: Look up the distribution for the test statistic to find its critical value at a specified level of probability (to determine the likelihood that a test stat. of a particular value could have occurred by chance alone)

Step 5: Decide whether to reject the null hypothesis

Null Hypothesis 3

Null Hypothesis(H_0): speculates there is no association between the two variables. Examples:

– H_0: men are no different from women in there political affiliations– H_0: There is no relationship between a respondent’s educational

level and his or her parents– H_0: Older people are no more likely to be happy than younger

people This is the only hypothesis that can actually be tested-we

either reject or fail to reject the null hypothesis EX: H_0: There is no association between age and

happiness among American adults; hw/ read p. 466

2 Statistical Independence

Statistical Independence: Two variables are statistically independent when changes in one variable (age of respondents) have nothing to do with changes in a second (happiness), ie, they vary independently of one another

Conversely, when two variables are statistically dependent on one another, changes in one variable are associated with changes in a second variable.,ie, changes in age(older respondents) are associated with changes in levels of happiness (more happiness)

2 Statistical Independence and hypothesis testing

Ex/ Null Hypothesis: Age is statistically independent of happiness, ie, differences among respondents to the variable age are unrelated to any differences in their levels of reported happiness

– Hyp. Testing: can assess the likelihood that the degree of statistical indep found in the sample is due to chance

– If we find that the degree of statistical indep found in the sample is not likely to be due to chance, null hyp is rejected

– If it is likely due to chance, null hyp is accepted

3 Type I and Type II Errors

“Mistakes “ arising from whether a given sample may or may not be representative of a population

If a Null Hypothesis assumes there is no association between two variables, and we reject it even though there is no association is a Type I error, i.e, we call someone a liar when he is telling the truth

If a Null Hypothesis assumes there is no association between two variables, and we accept it even though there is an association is a Type II error, i.e., we say someone is truthful when he is lying

3 Type I and Type II Errors

Conclusion H_0 true H_1 true

H_0 true Correct decision Type II error

H_1 true Type I error Correct decision

3 Elements of a Test of Hypothesis

Null Hypothesis (H_0): a theory about one of the population parameters. The theory generally represents the status quo, which must be proven false

Research Hypothesis (H_1): a theory that contradicts the null hypothesis. The theory generally represents the truth that will be accepted only if there is evidence

Test statistic: Sample statistic used to decide whether to reject the null hypothesis

3 Elements of a Test of Hypothesis (cont)

Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the probability is that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of chosen is usually small (e.g., 0.01,0.05, or 0.1), and is referred to as the level of significance of the test. A 0.05 (or 5%) level of significance indicates that there is a 5% chance that we would reject the hypothesis when we should not, or we have 95% confidence that we have made the right decision

Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled

Experiment and calculation of test statistic Conclusion:

– If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the research hypothesis is true. We know that hypothesis testing will led to this conclusion incorrectly (Type I Error) 100% of the time when H_0 is true.

– If the test statistic does not fall in the rejection region, we do not reject H_0. Thus we reserve judgment about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not, in general, know the probability that our test procedure will lead to an incorrect failure to reject H_0 (Type II Error)

Elements of a Test of Hypothesis (cont)

5 Chi-Square

Formula 12.1 Observed vs. Expected: Roll a die 6 times, get three

3’s—observed; expected: one 3 Pp.469-71 + skills: Filling in the table of expected

values Skills 3,4: Excel Generally, the greater the value of chi-square, the

more statistical dependence between two variables

3 Chi-Square /degrees of freedom

We are using observations from a sample as well as certain population parameters. If these parameters are unknown,they must be estimated from the sample.

Degrees of Freedom (): the number N of independent observations in the sample (ie, sample size) minus the number k of population parameters which must be estimatede from sample observations

= N – k When working with a contingency table, df=(r-1)(c-1),

where r and c are the number of rows and columns (resp) in the contingency table

Chi squared example—generate random digits 250 times

digit 0 1 2 3 4 5 6 7 8 9

obs

freq

17 31 29 18 14 20 35 30 20 36

Exp

freq

25 25 25 25 25 25 25 25 25 25

Chi squared example—generate random digits 250 times

Question: Does the observed frequency differ from the expected distribution in a significant way?

digit 0 1 2 3 4 5 6 7 8 9

obs

freq

17 31 29 18 14 20 35 30 20 36

Exp

freq

25 25 25 25 25 25 25 25 25 25

3 Chi-Square random digit example

^2 = (17-25)^2/25 + (31-25)^2/25 + (29-25)^2/25 + … + (36-25)^2/25= [excel]

23.3 Degrees of freedom: 10-1=9 Table, p. 545 ^2 at .99 is 21.7; 23.3> 21.7, so the observed

frequency differs from the expected frequency at the 0.01 level of significance, so the table of “random” numbers is somewhat doubtful

3 Chi-Square question

200 tosses of a fair coin, 115 heads, 85 tails. Test the hypothesis that the coin is fair using (a) 0.05, (b) 0.01 levels of significance

Ans: Df=2-1=1 (2 for H,T) O1=115, O2=85; E1=E2=100 2=(115-100)^2/100 + (85-100)^2/100 = 4.5 (a) 2 table for .95 is 3.84; 4.5>3.84, so reject hyp that coin is fair

at the 0.05 level of significance(b) 2 table for .99 is 6.63; 4.5<6.63, so cannot reject hyp that coin

is fair at the 0.01 level of significance

Interpreting Chi Square 4

When hypothesizing about an association between two variables, chi-square tells the likelihood that the degree of statistical dependence observed is simply the luck of the draw

A p value of 0.05 tells that there are no more than 5 chances in 100 that the statistical dependence is due to chance. Thus, there are 95 chances in 100 that the statistical dependence found is not due to chance, so the null hypothesis, ie., no association between variables, is rejected

The higher the value of p, the less likely we are to make a Type I error

bility

Interpreting Chi Square 4

P. 480-81: Table 12.4 (p. 472) has ^2 = 15.487, =6 The higher the ^2 value, the less likely it is that the

value obtained is due to chance. (read table 12.9, p. 481)

Rule of thumb: reject null hypothesis when ^2 reaches 0.05—only 5 chances in 100 that the dependence is due to chance

Skills7, p. 481 Skills 8, p. 485 (following their example, p. 484)

4

Homework/ p. 492/ 1,3

P 494/ spss 1,2

making inferences for associations between categorical variables: chi square chapter 12 reading...

Documents

null hypothesis step

population null hypothesis

null hypothesisnull

null hypothesis ex

research hypothesis

alternative hypothesis

process of hypothesis

null hypothesish