making inferences for associations between categorical variables: chi square chapter 12 reading...
Post on 15-Jan-2016
226 views
TRANSCRIPT
Making Inferences for Associations Between
Categorical Variables: Chi Square
Chapter 12
Reading Assignment
pp. 463-482; 485
Elements of a test of hypotheses 3
Hypothesis testing: Process for finding out whether we can generalize about an association from a sample to a population
Null hypothesis : (H_0) Represents the status quo to the party performing the sampling experiment, i.e., will be accepted unless the data provides convincing evidence it is false.
Research hypothesis: (H_1) (aka alternative hypothesis) Will be accepted only if the data provides convincing evidence of its truth
Homework: Skills 1, p. 464
Process of Hypothesis Testing 5
Step 1: Specify a research hypothesis and a null hypothesis
Step 2: Compute the value of a test statistic for the relationship
Step 3: Calculate the degrees of freedom for the variables involved
Step 4: Look up the distribution for the test statistic to find its critical value at a specified level of probability (to determine the likelihood that a test stat. of a particular value could have occurred by chance alone)
Step 5: Decide whether to reject the null hypothesis
Null Hypothesis 3
Null Hypothesis(H_0): speculates there is no association between the two variables. Examples:
– H_0: men are no different from women in there political affiliations– H_0: There is no relationship between a respondent’s educational
level and his or her parents– H_0: Older people are no more likely to be happy than younger
people This is the only hypothesis that can actually be tested-we
either reject or fail to reject the null hypothesis EX: H_0: There is no association between age and
happiness among American adults; hw/ read p. 466
2 Statistical Independence
Statistical Independence: Two variables are statistically independent when changes in one variable (age of respondents) have nothing to do with changes in a second (happiness), ie, they vary independently of one another
Conversely, when two variables are statistically dependent on one another, changes in one variable are associated with changes in a second variable.,ie, changes in age(older respondents) are associated with changes in levels of happiness (more happiness)
2 Statistical Independence and hypothesis testing
Ex/ Null Hypothesis: Age is statistically independent of happiness, ie, differences among respondents to the variable age are unrelated to any differences in their levels of reported happiness
– Hyp. Testing: can assess the likelihood that the degree of statistical indep found in the sample is due to chance
– If we find that the degree of statistical indep found in the sample is not likely to be due to chance, null hyp is rejected
– If it is likely due to chance, null hyp is accepted
3 Type I and Type II Errors
“Mistakes “ arising from whether a given sample may or may not be representative of a population
If a Null Hypothesis assumes there is no association between two variables, and we reject it even though there is no association is a Type I error, i.e, we call someone a liar when he is telling the truth
If a Null Hypothesis assumes there is no association between two variables, and we accept it even though there is an association is a Type II error, i.e., we say someone is truthful when he is lying
3 Type I and Type II Errors
Conclusion H_0 true H_1 true
H_0 true Correct decision Type II error
H_1 true Type I error Correct decision
3 Elements of a Test of Hypothesis
Null Hypothesis (H_0): a theory about one of the population parameters. The theory generally represents the status quo, which must be proven false
Research Hypothesis (H_1): a theory that contradicts the null hypothesis. The theory generally represents the truth that will be accepted only if there is evidence
Test statistic: Sample statistic used to decide whether to reject the null hypothesis
3 Elements of a Test of Hypothesis (cont)
Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection region is chosen so that the probability is that it will contain the test statistic when the null hypothesis is true, thereby leading to a Type I error. The value of chosen is usually small (e.g., 0.01,0.05, or 0.1), and is referred to as the level of significance of the test. A 0.05 (or 5%) level of significance indicates that there is a 5% chance that we would reject the hypothesis when we should not, or we have 95% confidence that we have made the right decision
Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled
Experiment and calculation of test statistic Conclusion:
– If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that the research hypothesis is true. We know that hypothesis testing will led to this conclusion incorrectly (Type I Error) 100% of the time when H_0 is true.
– If the test statistic does not fall in the rejection region, we do not reject H_0. Thus we reserve judgment about which hypothesis is true. We do not conclude that the null hypothesis is true because we do not, in general, know the probability that our test procedure will lead to an incorrect failure to reject H_0 (Type II Error)
Elements of a Test of Hypothesis (cont)
5 Chi-Square
Formula 12.1 Observed vs. Expected: Roll a die 6 times, get three
3’s—observed; expected: one 3 Pp.469-71 + skills: Filling in the table of expected
values Skills 3,4: Excel Generally, the greater the value of chi-square, the
more statistical dependence between two variables
3 Chi-Square /degrees of freedom
We are using observations from a sample as well as certain population parameters. If these parameters are unknown,they must be estimated from the sample.
Degrees of Freedom (): the number N of independent observations in the sample (ie, sample size) minus the number k of population parameters which must be estimatede from sample observations
= N – k When working with a contingency table, df=(r-1)(c-1),
where r and c are the number of rows and columns (resp) in the contingency table
Chi squared example—generate random digits 250 times
digit 0 1 2 3 4 5 6 7 8 9
obs
freq
17 31 29 18 14 20 35 30 20 36
Exp
freq
25 25 25 25 25 25 25 25 25 25
Chi squared example—generate random digits 250 times
Question: Does the observed frequency differ from the expected distribution in a significant way?
digit 0 1 2 3 4 5 6 7 8 9
obs
freq
17 31 29 18 14 20 35 30 20 36
Exp
freq
25 25 25 25 25 25 25 25 25 25
3 Chi-Square random digit example
^2 = (17-25)^2/25 + (31-25)^2/25 + (29-25)^2/25 + … + (36-25)^2/25= [excel]
23.3 Degrees of freedom: 10-1=9 Table, p. 545 ^2 at .99 is 21.7; 23.3> 21.7, so the observed
frequency differs from the expected frequency at the 0.01 level of significance, so the table of “random” numbers is somewhat doubtful
3 Chi-Square question
200 tosses of a fair coin, 115 heads, 85 tails. Test the hypothesis that the coin is fair using (a) 0.05, (b) 0.01 levels of significance
Ans: Df=2-1=1 (2 for H,T) O1=115, O2=85; E1=E2=100 2=(115-100)^2/100 + (85-100)^2/100 = 4.5 (a) 2 table for .95 is 3.84; 4.5>3.84, so reject hyp that coin is fair
at the 0.05 level of significance(b) 2 table for .99 is 6.63; 4.5<6.63, so cannot reject hyp that coin
is fair at the 0.01 level of significance
Interpreting Chi Square 4
When hypothesizing about an association between two variables, chi-square tells the likelihood that the degree of statistical dependence observed is simply the luck of the draw
A p value of 0.05 tells that there are no more than 5 chances in 100 that the statistical dependence is due to chance. Thus, there are 95 chances in 100 that the statistical dependence found is not due to chance, so the null hypothesis, ie., no association between variables, is rejected
The higher the value of p, the less likely we are to make a Type I error
bility
Interpreting Chi Square 4
When hypothesizing about an association between two variables, chi-square tells the likelihood that the degree of statistical dependence observed is simply the luck of the draw
A p value of 0.05 tells that there are no more than 5 chances in 100 that the statistical dependence is due to chance. Thus, there are 95 chances in 100 that the statistical dependence found is not due to chance, so the null hypothesis, ie., no association between variables, is rejected
The higher the value of p, the less likely we are to make a Type I error
bility
Interpreting Chi Square 4
P. 480-81: Table 12.4 (p. 472) has ^2 = 15.487, =6 The higher the ^2 value, the less likely it is that the
value obtained is due to chance. (read table 12.9, p. 481)
Rule of thumb: reject null hypothesis when ^2 reaches 0.05—only 5 chances in 100 that the dependence is due to chance
Skills7, p. 481 Skills 8, p. 485 (following their example, p. 484)
4
Homework/ p. 492/ 1,3
P 494/ spss 1,2