cdc data analytics project

13
INSURANCE PREMIUMS INSURANCE PREMIUMS CONTINUE TO RISE IN THE CONTINUE TO RISE IN THE NORTHEAST COMPARED TO NORTHEAST COMPARED TO THE MIDWEST THE MIDWEST Why? What did our research uncover? And what’s to Why? What did our research uncover? And what’s to blame? blame? Presented by: Andrew Kim April 20, 2011

Upload: kimandrew

Post on 18-Jan-2015

490 views

Category:

Technology


1 download

DESCRIPTION

Data Analysis of CDC data on various risk factors and the effect it has on insurance premiums in the Midwest vs. Northeast.

TRANSCRIPT

Page 1: CDC Data Analytics Project

INSURANCE PREMIUMS INSURANCE PREMIUMS CONTINUE TO RISE IN THE CONTINUE TO RISE IN THE

NORTHEAST COMPARED TO NORTHEAST COMPARED TO THE MIDWESTTHE MIDWEST

Why? What did our research uncover? And what’s to blame?Why? What did our research uncover? And what’s to blame?

Presented by: Andrew Kim

April 20, 2011

Page 2: CDC Data Analytics Project

FACT:FACT:

Insurance premiums are higher for those living in the Northeast compared to those living in the Midwest.

When asked to explain the discrepancy, insurance companies had this to say:

““Those residing in the Northeast compared to Those residing in the Northeast compared to those those

residing in the Midwest live a more risky lifestyle residing in the Midwest live a more risky lifestyle in in

terms of alcohol consumption, driving habits, etc. terms of alcohol consumption, driving habits, etc.

As such their premiums unfortunately are going As such their premiums unfortunately are going up up

while those living in the Midwest are not.”while those living in the Midwest are not.”

Page 3: CDC Data Analytics Project

OBJECTIVE:OBJECTIVE:

Through extensive research, we plan to prove whether or not insurance companies are telling us the truth - that Northeasterners do live a riskier lifestyle compared to Midwesterners.

RELEVANT MATERIALS: RELEVANT MATERIALS: Center for Disease Control’s “Behavior Risk Factor Surveillance Survey” (2000)

2000 U.S. Census

2008 U.S. Census

Page 4: CDC Data Analytics Project

DATA ANALYSIS:DATA ANALYSIS:COLLECTION METHODSCOLLECTION METHODS

Using survey results from CDC, we compiled a spreadsheet of Risky Behavior %ages. Divided states into respective regions as defined by U.S. Census. This failed to account for a state’s size, so we weighed each metric (State pop./Region pop.). Not satisfied, we chose to test for poverty levels and ages (<5 & >65) using the 2008 Census.

CONFIDENCE INTERVALSCONFIDENCE INTERVALSUsing this data, we were able to calculate the differences

in percentages with 95% confidence. Using Drake Direct’s Plan-Analyzer application, we calculated Confidence Intervals for all risky behavior metrics and determined whether the difference was statistically significant enough to be a determining factor in calculating insurance premiums.

Page 5: CDC Data Analytics Project

DATA:DATA:

Risky Behavior in Northeast, USA (Survey Results, %s)STATE Connecticut Maine Massachusetts New Hampshire New York Pennsylvania Rhode Island Vermont

POP 3405607 1274915 6349119 1235791 18976811 12281071 1048315 608821 45180450

POP WTD 8% 3% 14% 3% 42% 27% 2% 1% 100%

SMK 22.000% 27.000% 24.000% 22.000% 23.000% 24.000% 26.000% 22.000%

SMK WTD 1.658% 0.762% 3.373% 0.602% 9.661% 6.524% 0.603% 0.296% 23.479%

WEI 23.000% 24.000% 19.000% 21.000% 20.000% 25.000% 22.000% 20.000%

WEI WTD 1.734% 0.677% 2.670% 0.574% 8.400% 6.796% 0.510% 0.270% 21.631%

SED 52.000% 60.000% 50.000% 47.000% 63.000% 55.000% 55.000% 51.000%

SED WTD 3.920% 1.693% 7.026% 1.286% 26.461% 14.950% 1.276% 0.687% 57.300%

ACT 26.000% 36.000% 23.000% 20.000% 33.000% 27.000% 26.000% 25.000%

ACT WTD 1.960% 1.016% 3.232% 0.547% 13.861% 7.339% 0.603% 0.337% 28.895%

ALC 17.000% 10.000% 18.000% 16.000% 12.000% 18.000% 18.000% 21.000%

ALC WTD 1.281% 0.282% 2.530% 0.438% 5.040% 4.893% 0.418% 0.283% 15.164%

DWI 3.000% 1.000% 3.000% 2.000% 1.000% 3.000% 2.000% 4.000%

DWI WTD 0.226% 0.028% 0.422% 0.055% 0.420% 0.815% 0.046% 0.054% 2.066%

SEA 23.000% 41.000% 46.000% 40.000% 20.000% 26.000% 49.000% 34.000%

SEA WTD 1.734% 1.157% 6.464% 1.094% 8.400% 7.067% 1.137% 0.458% 27.512%

POV 2008 9.100% 12.600% 10.100% 7.800% 13.700% 12.100% 12.100% 10.400%

POV WTD 0.686% 0.356% 1.419% 0.213% 5.754% 3.289% 0.281% 0.140% 12.138%

AGE 2009 (<5 + >65) 19.900% 21.000% 19.500% 19.100% 19.700% 21.300% 20.000% 19.700%

AGE WTD 1.500% 0.593% 2.740% 0.522% 8.274% 5.790% 0.464% 0.265% 20.149%

N 998 1,001 993 1,000 999 1,000 998 998 7,987

Page 6: CDC Data Analytics Project

DATA:DATA:

Risky Behavior in Midwest, USA (Survey Results, %s)STATE Illinois Indiana Iowa Michigan Minnesota Missouri Nebraska North Dakota Ohio South Dakota Wisconsin

POP 12419658 6080520 2926380 9938492 4919492 5596684 1711265 642195 11353150 754835 5363708 61706379

POP WTD 20.127% 9.854% 4.742% 16.106% 7.972% 9.070% 2.773% 1.041% 18.399% 1.223% 8.692% 100.000%

SMK 24.000% 27.000% 22.000% 29.000% 21.000% 26.000% 23.000% 20.000% 26.000% 21.000% 25.000%

SMK WTD 4.830% 2.661% 1.043% 4.671% 1.674% 2.358% 0.638% 0.208% 4.784% 0.257% 2.173% 25.297%

WEI 21.000% 26.000% 25.000% 26.000% 21.000% 23.000% 24.000% 23.000% 23.000% 23.000% 23.000%

WEI WTD 4.227% 2.562% 1.186% 4.188% 1.674% 2.086% 0.666% 0.239% 4.232% 0.281% 1.999% 23.339%

SED 60.000% 61.000% 61.000% 59.000% 55.000% 61.000% 55.000% 56.000% 69.000% 57.000% 54.000%

SED WTD 12.076% 6.011% 2.893% 9.503% 4.385% 5.533% 1.525% 0.583% 12.695% 0.697% 4.694% 60.594%

ACT 32.000% 27.000% 34.000% 32.000% 25.000% 33.000% 25.000% 27.000% 33.000% 29.000% 25.000%

ACT WTD 6.441% 2.661% 1.612% 5.154% 1.993% 2.993% 0.693% 0.281% 6.072% 0.355% 2.173% 30.427%

ALC 16.000% 13.000% 13.000% 18.000% 21.000% 16.000% 17.000% 17.000% 9.000% 16.000% 27.000%

ALC WTD 3.220% 1.281% 0.617% 2.899% 1.674% 1.451% 0.471% 0.177% 1.656% 0.196% 2.347% 15.989%

DWI 4.000% 3.000% 3.000% 3.000% 3.000% 3.000% 5.000% 4.000% 3.000% 4.000% 6.000%

DWI WTD 0.805% 0.296% 0.142% 0.483% 0.239% 0.272% 0.139% 0.042% 0.552% 0.049% 0.522% 3.540%

SEA 29.000% 28.000% 24.000% 21.000% 24.000% 27.000% 51.000% 60.000% 24.000% 57.000% 29.000%

SEA WTD 5.837% 2.759% 1.138% 3.382% 1.913% 2.449% 1.414% 0.624% 4.416% 0.697% 2.521% 27.151%

POV 2008 12.200% 12.900% 11.400% 14.400% 9.600% 13.500% 10.800% 11.500% 13.300% 12.700% 10.500%

POV WTD 2.455% 1.271% 0.541% 2.319% 0.765% 1.224% 0.300% 0.120% 2.447% 0.155% 0.913% 12.511%

AGE 2009 (<5 + >65) 19.300% 19.800% 21.600% 19.600% 19.600% 20.400% 20.900% 21.400% 20.300% 21.800% 19.900%

AGE WTD 3.885% 1.951% 1.024% 3.157% 1.563% 1.850% 0.580% 0.223% 3.735% 0.267% 1.730% 19.963%

N 1,001 1,000 1,000 991 990 1,000 1,002 999 998 1,002 1,000 10,983

Page 7: CDC Data Analytics Project

Simple Linear Regression Model:Simple Linear Regression Model:

Page 8: CDC Data Analytics Project

Simple Linear Regression Model:Simple Linear Regression Model:

Page 9: CDC Data Analytics Project

CONFIDENCE INTERVALS:CONFIDENCE INTERVALS:%age Differences between Northeast (Control) and Midwest (Test)(w/ 95% confidence)Smokers: (.5850, 3.0550)Overweight: (.5095, 2.9105)Sedentary Lifestyle: (1.8716, 4.7084)Leisure Time: (.2253, 2.8548)Binge Drinking: (-.2133, 1.8733)DWI: (1.0042, 1.9358)Seatbelt Use: (-1.6449, .9249)According to findings, Seatbelt Use and Binge Drinking are not statistically significant

%age Differences between Northeast (Control) and Midwest (Test)(No error associated)Living below Poverty level: .37Ages (<5 & >65): -.19 According to findings, more Midwesterners are living below poverty level.According to findings, more Northeasterners are under the age of 5 and over the age of 65

Page 10: CDC Data Analytics Project

FINDINGS:FINDINGS:We found that the following Risk Behavior Factors are higher in the Midwest compared to the Northeast:

Smoking Overweight Sedentary Lifestyle No Leisure Time Drinking And Driving Live Below Poverty Line

Northeast Midwest % Differences

SMK WTD 23.479% 25.297% 1.818%

WEI WTD 21.631% 23.339% 1.708%

SED WTD 57.300% 60.594% 3.294%

ACT WTD 28.895% 30.427% 1.532%

DWI WTD 2.066% 3.540% 1.474%

POV WTD 12.138% 12.511% 0.373%

Page 11: CDC Data Analytics Project

FINDINGS:FINDINGS:Meaning that the following Risk Behavior Factors are higher in the Northeast compared to the Midwest:

Binge Drinking

No Seatbelt Use

We also determined that there are more individuals per state under the age of 5 and over the age of 65 in the Northeast. This means that far more eligible insurance holders to be covered in the Midwest than Northeast.

ASSUMPTION:ASSUMPTION:I assume that if insurance companies want to keep total premiums per state the same across the country, they will charge a higher rate for those in the Northeast (Pop.: 45M+) compared to the Midwest (Pop.:61M+).

Page 12: CDC Data Analytics Project

GOODNESS OF FIT TEST:GOODNESS OF FIT TEST:Using the Northeast Weighted %ages as our expected values compared to the Midwest Weighted %ages as our observed values, we conducted Goodness of Fit Chi-Squared tests to determine whether to reject or accept the following hypotheses (for statistically significant data).

H0: Risk Behaviors for Midwest are equal to Northeast.H1: Risk Behaviors for Midwest are not equal to Northeast.

TS = 147558011-CHIDSIT (14755801, 5) = 1 - 0.00000000000000000000 = 1

RESULT:RESULT:We reject H0 which states that Risk Behaviors for Midwest are equal to Northeast. We accept H1 which states that Risk Behaviors for Midwest are not equal to Northeast.

Insurance companies are wrong in assessing that those living in the Northeast engage in a “riskier” lifestyle than those living in the Midwest. Clearly, according to our data and analysis, this is not true.

Page 13: CDC Data Analytics Project

CONCLUSION:CONCLUSION:In conclusion, we can say with close to 100% certainty that Risk Behaviors are not equal across the different regions of the United States, thereby concluding that the insurance company we investigated gave us a false statement.

““Based on our data and analysis, we assume that insurance companies Based on our data and analysis, we assume that insurance companies are unfairly charging policy holders higher premiums in the Northeast are unfairly charging policy holders higher premiums in the Northeast

compared to those in the Midwest.”compared to those in the Midwest.”

A reason for this may be that insurance companies are using different metrics with which to measure their policy holders. They may give each risk behavior a different weight which would greatly manipulate data. They may also take into account that since there are less eligible policy-holders in the Northeast (as proved by our Age Metric research).

Without a more detailed explanation, our search ends here.