simpson’s paradox: a data set and discrimination case study exercise stan taylor & amy mickel...

14
Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Upload: norah-manning

Post on 17-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Simpson’s Paradox: A Data Set and

Discrimination Case Study Exercise

Stan Taylor & Amy Mickel

CSU Sacramento

Page 2: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Background

• CA Department of Developmental Services • Fund allocation to over 250,000 developmentally-disabled

individuals (“consumers”)

• Questions of Discrimination in Fund Allocation based on Ethnicity

• Univariate Analyses: White non-Hispanics receiving more $ than Hispanics

• Other sources of variation: AGE

• Classic case of Simpson’s Paradox

Page 3: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Case Objectives

• Increase students’ knowledge of statistical concepts• specific variation, outliers, univariate vs. bivariate

analyses, weighted averages, Simpson’s paradox

• Enhance students’ analytical and critical thinking skills

• Demonstrate importance of performing rigorous statistical analysis & how interpretations of data can impact decision outcomes

Page 4: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Data Set: 1, 000 DDS Consumers

• ID (unique identification code)

• Age Cohort/Age• Six age cohorts (binned): 0-5, 6-12, 13-17, 18-21, 22-50, 51+

• Age (unbinned)

• Gender

• Expenditures (annual $ amount spent for each consumer)

• Ethnicity (Hispanic, White non-Hispanic, and six others)

Page 5: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Instructions & Analytical Tools

• INSTRUCTIONS• TASK: To determine if discrimination exists by examining expenditures &

submit report with findings.

• DEFINITION: “Discrimination” exists if $ amount for typical person in one group (male) is significantly different compared to person in another group (female).

• ANALYTICAL TOOLS• SOFTWARE: Any statistical software package (We use: Microsoft Excel)

• STATISTICS & TOOLS: Means or medians (We use: Means & Pivot Tables)

Page 6: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Typical Table 1: Ethnicity & Average Expenditures

Ethnicity of Consumers Average of Expenditures ($)

American Indian $ 36,438

Asian $ 18,392

Black $ 20,885

Hispanic $ 11,066

Multi Race $ 4,457

Native Hawaiian $ 42,782

Other $ 3,317

White non-Hispanic $ 24,698

All Consumers $ 18,066

Page 7: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Typical Table 2: Gender & Average Expenditures

Gender Average of Expenditures ($)Female $ 18,130Male $ 18,001

All Consumers $ 18,066

Page 8: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Typical Table 3: Age Cohort & Average Expenditures

Age Cohort Average of Expenditures ($)

0 – 5 $ 1,415

6 - 12 $ 2,227

13 - 17 $ 3,923

18 - 21 $ 9,889

22 - 50 $ 40,209

51 + $ 53,522

All Consumers $ 18,066

Page 9: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Average Expenditures: % of Consumers by Ethnicity

Ethnicity Average of Expenditures ($) % of Consumers

White non-Hispanic $ 24,698 40% Hispanic $ 11,066 38%

Asian $ 18,392 13%

Black $ 20,885 6%

Multi Race $ 4,457 3%

American Indian $ 36,438 0%

Native Hawaiian $ 42,782 0%

Other $ 3,317 0%

All Consumers $ 18,066 100%

Page 10: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Average Expenditures: Ethnicity and Age Cohort

Age CohortHispanic

(avg. of expenditures)White non-Hispanic

(avg. of expenditures)

0 - 5 $ 1,393 $1,367

6-12 $ 2,312 $2,052

13-17 $ 3,955 $3,904

18-21 $ 9,960 $10,133

22-50 $ 40,924 $40,188

51 + $ 55,585 $52,670

All Consumers $11,066 $24,698

Page 11: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

0 - 5 6-12 13-17 18-21 22-50 51 + -

10,000

20,000

30,000

40,000

50,000

60,000

HispanicWhite, Non Hispanic

$

Page 12: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Bivariate Table: Percentages by Ethnicity and Age Cohort

Age Cohort Hispanic (%) White non-Hispanic (%)

0 - 5 12% 5%

6-12 24% 11%

13-17 27% 17%

18-21 21% 17%

22-50 11% 33%

51 + 5% 16%

All Consumers 100% 100%

Page 13: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Weighted Average

Page 14: Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise Stan Taylor & Amy Mickel CSU Sacramento

Concluding Remarks

• Relevant topic: Discrimination

• Statistical Concepts• Specific variation

• Univariate vs bivariate analyses

• Weighted averages

• Simpson’s Paradox

• Analytical and critical thinking

• Importance of rigorous analyses & different views of data