simpson’s paradox: a data set and discrimination case study exercise stan taylor & amy mickel...
TRANSCRIPT
Simpson’s Paradox: A Data Set and
Discrimination Case Study Exercise
Stan Taylor & Amy Mickel
CSU Sacramento
Background
• CA Department of Developmental Services • Fund allocation to over 250,000 developmentally-disabled
individuals (“consumers”)
• Questions of Discrimination in Fund Allocation based on Ethnicity
• Univariate Analyses: White non-Hispanics receiving more $ than Hispanics
• Other sources of variation: AGE
• Classic case of Simpson’s Paradox
Case Objectives
• Increase students’ knowledge of statistical concepts• specific variation, outliers, univariate vs. bivariate
analyses, weighted averages, Simpson’s paradox
• Enhance students’ analytical and critical thinking skills
• Demonstrate importance of performing rigorous statistical analysis & how interpretations of data can impact decision outcomes
Data Set: 1, 000 DDS Consumers
• ID (unique identification code)
• Age Cohort/Age• Six age cohorts (binned): 0-5, 6-12, 13-17, 18-21, 22-50, 51+
• Age (unbinned)
• Gender
• Expenditures (annual $ amount spent for each consumer)
• Ethnicity (Hispanic, White non-Hispanic, and six others)
Instructions & Analytical Tools
• INSTRUCTIONS• TASK: To determine if discrimination exists by examining expenditures &
submit report with findings.
• DEFINITION: “Discrimination” exists if $ amount for typical person in one group (male) is significantly different compared to person in another group (female).
• ANALYTICAL TOOLS• SOFTWARE: Any statistical software package (We use: Microsoft Excel)
• STATISTICS & TOOLS: Means or medians (We use: Means & Pivot Tables)
Typical Table 1: Ethnicity & Average Expenditures
Ethnicity of Consumers Average of Expenditures ($)
American Indian $ 36,438
Asian $ 18,392
Black $ 20,885
Hispanic $ 11,066
Multi Race $ 4,457
Native Hawaiian $ 42,782
Other $ 3,317
White non-Hispanic $ 24,698
All Consumers $ 18,066
Typical Table 2: Gender & Average Expenditures
Gender Average of Expenditures ($)Female $ 18,130Male $ 18,001
All Consumers $ 18,066
Typical Table 3: Age Cohort & Average Expenditures
Age Cohort Average of Expenditures ($)
0 – 5 $ 1,415
6 - 12 $ 2,227
13 - 17 $ 3,923
18 - 21 $ 9,889
22 - 50 $ 40,209
51 + $ 53,522
All Consumers $ 18,066
Average Expenditures: % of Consumers by Ethnicity
Ethnicity Average of Expenditures ($) % of Consumers
White non-Hispanic $ 24,698 40% Hispanic $ 11,066 38%
Asian $ 18,392 13%
Black $ 20,885 6%
Multi Race $ 4,457 3%
American Indian $ 36,438 0%
Native Hawaiian $ 42,782 0%
Other $ 3,317 0%
All Consumers $ 18,066 100%
Average Expenditures: Ethnicity and Age Cohort
Age CohortHispanic
(avg. of expenditures)White non-Hispanic
(avg. of expenditures)
0 - 5 $ 1,393 $1,367
6-12 $ 2,312 $2,052
13-17 $ 3,955 $3,904
18-21 $ 9,960 $10,133
22-50 $ 40,924 $40,188
51 + $ 55,585 $52,670
All Consumers $11,066 $24,698
0 - 5 6-12 13-17 18-21 22-50 51 + -
10,000
20,000
30,000
40,000
50,000
60,000
HispanicWhite, Non Hispanic
$
Bivariate Table: Percentages by Ethnicity and Age Cohort
Age Cohort Hispanic (%) White non-Hispanic (%)
0 - 5 12% 5%
6-12 24% 11%
13-17 27% 17%
18-21 21% 17%
22-50 11% 33%
51 + 5% 16%
All Consumers 100% 100%
Weighted Average
Concluding Remarks
• Relevant topic: Discrimination
• Statistical Concepts• Specific variation
• Univariate vs bivariate analyses
• Weighted averages
• Simpson’s Paradox
• Analytical and critical thinking
• Importance of rigorous analyses & different views of data