qs101: introductiontoquantitativemethodsin socialscience · week14: crosstabulationsandchi-squared...

32
Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence QS101: Introduction to Quantitative Methods in Social Science Week 14: Crosstabulations and Chi-Squared Dr. Florian Reiche Teaching Fellow in Quantitative Methods Course Director BA Politics and Sociology Deputy Director of Student Experience and Progression, PAIS January 29, 2015 Dr. Florian Reiche QS101: Introduction to Quantitative Methods in Social Science

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

QS101: Introduction to Quantitative Methods inSocial Science

Week 14: Crosstabulations and Chi-Squared

Dr. Florian ReicheTeaching Fellow in Quantitative MethodsCourse Director BA Politics and Sociology

Deputy Director of Student Experience and Progression, PAIS

January 29, 2015

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 2: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Crosstabulations

Independence and Dependence

Chi-Squared Test of Independence

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 3: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Crosstabulations

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 4: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

What is a Crosstabulation (cross tab)?

I A Crosstab (AKA contingency table) serves for the analysis ofcategorical variables

I It displays the number of subjects observed at all combinationsof possible outcomes for the two variables

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 5: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

What does that look like?

Is there an association between gender and ice-cream flavourpreference?

Ice-Cream Flavours

Gender Chocolate Vanilla Total

Male 10 5 15Female 8 12 20

Total 18 17 35

The row totals and the column totals are called marginaldistributions.

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 6: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Percentage Comparisons

To study how ice-cream flavour preference depends on gender, weconvert the frequencies to percentages within each row.

Ice-Cream Flavours

Gender Chocolate Vanilla Total n

Male 66.6% 33.3% 100% 15Female 40% 60% 100% 20

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 7: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Percentage Comparisons (contd.)

I The two sets of percentages for males and females are calledconditional distributions on ice-cream flavour.

I They refer to the sample data distribution of ice-creamflavour, conditional on gender.

I It is practice to form the conditional distribution for theresponse variable (here ice-cream flavour), within categories ofthe explanatory variable (here gender).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 8: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Good Practice for Cross Tabs

I We want to show the percentages of the response (dependent)variable, in the categories of the explanatory (independent)variable

I The dependent variable goes into the columnsI Clearly label the variable and the categoriesI Include the total sample sizes on which the percentages are

based

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 9: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Independence and Dependence

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 10: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

I The question is now: Is there an association betweenice-cream flavour and gender?

I Put more technically: are the population conditionaldistributions on one categorical variable identical at eachcategory of the other variable?

I What would that look like?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 11: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Statistical Independence

Ice-Cream Flavours

Gender Chocolate Vanilla Total

Male 8 (51.4%) 7 (48.6%) 15 (100%)Female 10 (51.4%) 10 (48.6%) 20 (100%)

This table is hypothetical – you will never see it.

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 12: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Queries

I Our initial table was a sample

I We would expect variability depending on the sample we drawI But what does the population look like?I How plausible, given the sample, is it, that in the population

gender and ice-cream flavour are independent?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 13: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Queries

I Our initial table was a sampleI We would expect variability depending on the sample we draw

I But what does the population look like?I How plausible, given the sample, is it, that in the population

gender and ice-cream flavour are independent?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 14: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Queries

I Our initial table was a sampleI We would expect variability depending on the sample we drawI But what does the population look like?

I How plausible, given the sample, is it, that in the populationgender and ice-cream flavour are independent?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 15: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Queries

I Our initial table was a sampleI We would expect variability depending on the sample we drawI But what does the population look like?I How plausible, given the sample, is it, that in the population

gender and ice-cream flavour are independent?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 16: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

We need a significance test!

I H0: The variables are statistically independentI H1: The variables are statistically dependent

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 17: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Chi-Squared Test of Independence

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 18: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

The Chi-Squared Test

I The Chi-Squared (χ2) test compares the observed frequenciesin the contingency table (our initial table) with values thatsatisfy the null hypothesis

I (The following table shows the observed frequencies, and theexpected frequencies if H0 was true in parentheses.

Ice-Cream Flavours

Gender Chocolate Vanilla Total

Male 10 (8) 5 (7) 15Female 8 (10) 12 (10) 20

Total 18 17 35

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 19: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How did I calculate the expected values?

I Let fo denote an observed frequency in a cell of the table.I Let fe denote an expected frequency.I fe is the count expected in a cell if the variables were

independent.I It equals the product of the row and the column totals for that

cell, divided by the total sample size.I E.g. 15× 18/35

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 20: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

The χ2 test statistic

χ2 = Σfo − fe

fe(1)

I We square the difference between the observed and expectedfrequency in a particular cell, and divide it by the expectedfrequency

I We sum the result from each cell up (That’s what Σ does)I If H0 is true, then χ2 is quite smallI The larger the χ2 value...

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 21: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

The χ2 test statistic

χ2 = Σfo − fe

fe(2)

I We square the difference between the observed and expectedfrequency in a particular cell, and divide it by the expectedfrequency

I We sum the result from each cell up (That’s what Σ does)I If H0 is true, then χ2 is quite smallI The larger the χ2 value, the greater the evidence against H0:

Independence

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 22: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How do we interpret the magnitude of χ2?

I The χ2 distribution

I Concentrated on the positive part of the real line (it cannot benegative!)

I What is the minimal value and why?I It is skewed to the rightI The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 23: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How do we interpret the magnitude of χ2?

I The χ2 distributionI Concentrated on the positive part of the real line (it cannot be

negative!)

I What is the minimal value and why?I It is skewed to the rightI The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 24: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How do we interpret the magnitude of χ2?

I The χ2 distributionI Concentrated on the positive part of the real line (it cannot be

negative!)I What is the minimal value and why?

I It is skewed to the rightI The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 25: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How do we interpret the magnitude of χ2?

I The χ2 distributionI Concentrated on the positive part of the real line (it cannot be

negative!)I What is the minimal value and why?I It is skewed to the right

I The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 26: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

How do we interpret the magnitude of χ2?

I The χ2 distributionI Concentrated on the positive part of the real line (it cannot be

negative!)I What is the minimal value and why?I It is skewed to the rightI The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 27: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

What are degrees of freedom?

I Given the marginal totals, the cell counts in a rectangularblock of size (r − 1) × (c − 1) within the contingency tabledetermine the other cell counts.

I More helpful: How many cells could I choose at freedom,before the marginal distributions determine the remaining cellvalues?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 28: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

What are degrees of freedom?

I Given the marginal totals, the cell counts in a rectangularblock of size (r − 1) × (c − 1) within the contingency tabledetermine the other cell counts.

I More helpful: How many cells could I choose at freedom,before the marginal distributions determine the remaining cellvalues?

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 29: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Where were we?

HERE!

I The χ2 distributionI Concentrated on the positive part of the real line (it cannot be

negative!)I It is skewed to the rightI The precise shape depends on the degrees of freedom (df).

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 30: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

The χ2 Distribution

Figure: The χ2 Distribution (k=df)

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 31: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Sample Size Requirements

I The χ2 test is a large sample testI Ergo: the χ2 distribution is the sampling distribution of the χ2

test only if the sample size is largeI Rogh guideline: the expected frequency fe in each cell should

exceed 5

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science

Page 32: QS101: IntroductiontoQuantitativeMethodsin SocialScience · Week14: CrosstabulationsandChi-Squared Dr. FlorianReiche Teaching Fellow in Quantitative Methods Course Director BA Politics

Outline Crosstabulations Independence and Dependence Chi-Squared Test of Independence

Queries

I How strong is the association if χ2 is returned significant?I With this alone, we cannot tellI We have no idea whether all cells deviate greatly from

independence, or only one or two cells do soI Solution: Agresti and Finlay, Sections 8.3.-8.4. –

HOMEWORK!

Dr. Florian Reiche

QS101: Introduction to Quantitative Methods in Social Science