lifeexpectancyinamerica
TRANSCRIPT
Running Head: Predicting Life Expectancy
Predicting Life Expectancy – A stepwise regression model
A Project Report for STAT 4950
Submitted to: Dr Shenghua (Kelly) Fan
Submitted by: Akhil Raman HS4623, Berhane Desta GH6462, Bui Toan
Running Head: Predicting Life Expectancy
Introduction and Literature Review
Life Expectancy is a statistical measure of how long a person or organism may live, based on the year of
their birth, their current age, demographic factors, and health variables. Globally, life expectancy
exhibits a broad range, from an average of 49.42 years in Swaziland, to 82.6 years in Japan, with the
disparity attributed to public health, medical care, diet, and various socio-economic factors.
In the United States, life expectancy has increased over the generations. Americans today are living
twice as long as citizens in the early 1900’s. However, despite the increase, there are variables which
may lead to a reduction in an individual’s lifespan.
Public Health reports suggest education exhibits a connection with life expectancy. Researchers have
examined the direct (better stress management, healthier lifestyle) and indirect (better paying jobs,
social privilege) to life expectancy. Men and women who pursue higher education have average longer
life expectancies compared to individuals who do not (maximum high school).
Health factors linked to lifestyle habits have been linked to life expectancy. Obesity, smoking,
alcoholism, and diabetes can shorten an individual’s lifespan. Obesity can reduce an individual’s lifespan
by 5 years, smoking by 12, and alcoholism by 20 and diabetes by 10. Along with lifestyle habits, access to
medical attention is crucial. For example, child vaccination is second only to clean water as a source of
fighting infectious disease and can increase the life expectancy of a child.
Racial disparities in life expectancy have been a point of debate. African-Americans represent the low
end of the scale, and Asian-Americans living the longest. Researchers link the difference to socio-
economic, health, and education factors. This project is designed to check the effects of Education, Race,
health and poverty on life expectancy using multiple linear regression. According to the outputs of the
analysis, African Americans, Asian Americans, school enrollment, smoking, binge and diabetes have an
effect on a life expectancy.
Methods
Data was collected from the “American Human Development Project.” All states, including Puerto Rico,
were analyzed. The provided life expectancy for each state was compared across four different domains;
poverty, education, race, and health. Gender differences were not accounted for in the model. The
Running Head: Predicting Life Expectancy
independent variables were selected based on prior literature on life expectancy. As a first step, we
perform a correlation matrix to see the relationship of the independent variables with the dependent
variable. We observe that smoking, binge, diabetes, obesity, African American, Asian American,
education level variables and poverty variables have significant relationship with life expectancy at 0.05
significant level (Table1.1-1.4).
Next, we performed four multiple linear regression models for the four groups, health, poverty, and race
and education level. Afterwards, we combined the remaining variables from all four models and
conducted a final stepwise regression. For health variables, smoking, binge drinking and diabetes are
included in the model. (see Table1.1). For race group, all the independent variables are included in the
model based. (see Table1.3). For the poverty group, only children under 6 living in poverty remained.
(Table1.2). For the education model, the percentage of the population with bachelor’s degrees ,
graduate degrees and the percentage of the population enrolled in school remained (Table 1.4). When
performing the final model, African American, Asian American, School enrollment, binge, smoking,
diabetes and immunization remained in the model based on 0.15 entry level (Table 6.1).The regression
models for Health, Race, Poverty and Education are given in the appendix under Equations 1,2, 3 and 4,
respectively. The final regression model is given in the appendix under Equation 5.
After finding the models for the four groups and one final model, we conducted normality and equal
variance checks for the four groups and the final model. For the health groups since most of the points
are close to the line and since all the p-values of the normality check are greater than 0.05, the
normality holds true( Figure 3.1 and Table 1.1). Since there is no pattern, equal variance holds true
(Figure 2.1). For the race group, since there are outliers and since all the p-values are not greater than
0.05 the normality assumption is failed( Figure 3.3 and Table 1.3). Since there is no pattern, equal
variance holds true (Figure 2.3). For the Poverty group, since most of the points are close to the line and
since all the p-values of the normality check are greater than 0.05, the normality holds true( Figure 3.2
and Table 1.2). Since there is no pattern, equal variance holds true (Figure 2.2). For Education group,
since most of the points are close to the line and since all the p-values of the normality check are greater
than 0.05, the normality holds true( Figure 3.4 and Table 1.4). Since there is no pattern, equal variance
holds true (Figure2.4).
For the final model, since most of the points are close to the line and since all the p-values of the
normality check are greater than 0.05, normality assumption holds true (Figure 7.1 and Table 7.1). Since
there is no pattern, equal variance holds true (Figure 7.1).
Running Head: Predicting Life Expectancy
Conclusion
Separately, our stepwise models were consistent with the prior literature, unhealthy habits negatively
affect life expectancy, higher education is correlated with longer life, poverty negatively impacted life
expectancy and all races have an effect on a life expectancy. When we combined the four groups, we
observed that variables African American, Asian American, School enrollment, Smoking, binge, diabetes
and immunization entered the final model based on 0.15 entry level (Equation 5). From the final model,
it is observed that African American, Smoking have negative slopes while the rest have positive slopes.
The ones with negative slopes affect life expectancy negatively and those with positive slopes affect the
life expectancy positively. To further investigate we can start including quadratic and interaction terms.
Running Head: Predicting Life Expectancy
Appendix
Table 1.1 -1.4 (Pearson Correlation Coefficients for all variables).
Table1.1: Health Table1.2: Poverty
Table 1.3: Race Table 1.4: Education
Smoking = % of adults smoking Binge = % of adults binge drinking Diabetes = % of adults w/diabetes Obesity = % of adults w/obesity
Immunization = %of children immunized X7 = %children under 6 living in poverty X8 = %adults over 65 living in poverty
X9 = %economically disadvantage k-12 students X10 = % children enrolled in preschool Wht = % of population which is white
Lat = % of population which is Latino Afam = %of population which is black Asam = %of population which is Asian
Natam = % of population which is Native American Less_hs = % of population high school dropouts Atleast_hs = %of population w/Hs diploma
Atleast_ba = %of population w/bachelor’s degree Grad_degree = %of population w/graduate degree S_enroll = %of population enrolled in school.
Running Head: Predicting Life Expectancy
Table 2.1-2.4 (Initial Parameter estimates after separate stepwise regression models)
Table 2.1: Health Table 2.2: Race
Table 2.3: Education Table 2.4: Poverty
Table 3.1-3.4 (Normality Checking for each stepwise regression model)
Table 3.1:Health Table 3.2: Poverty
Figure 1.1:Health Figure 1.2:Poverty
Running Head: Predicting Life Expectancy
Table 3.3:Race Table 3.4: Education
Figure 1.3:Race Figure1.4: Education
Table 4.1-4.4 (Equal Variances Checking for each stepwise regression model)
Figure2.1: Health Figure 2.2:Poverty
Running Head: Predicting Life Expectancy
Figure 2.3:Race Figure 2.4: Education
Table 5.1 (Remaining Variables after separate stepwise regression models)
Running Head: Predicting Life Expectancy
Table 6.1 (Final Stepwise model parameter estimates)
Figure 7.1 (Assumption Checking for Final Model)
Running Head: Predicting Life Expectancy
Table 7.1
Equation 1: Health:
LE= 80.98619-0.25622*smoking + 0.08958*binge -0.43044*diabetes
Equation 2: Race:
LE= 31.37453 + 0.48144*white + 0.47116*latino +0.38513*AfricanAmerican +0.90752*AsianAmerican
0.49848*NativeAmerican
Equation 3: Poverty
Running Head: Predicting Life Expectancy
LE= 34.3130-0.23659*under6-poverty
Equation 4: Education
LE = 59.161 +52.807*Atleast_BA -71.605*Grad.Deg + 15.594*S-enroll
Equation 5: Final Model
LE=70.84127 -0.04853*AfAm +0.04204*AsAm +0.12519*S_enrol -0.21581*Smoking +0.09621*binge -
0.23291*diabetes + 0.03929*immunization
SAS Code
Final Model
data MLR;input life under_6 AfAm caucasian latino NatAm AsianAm bachelors_plus grad_degree school_enrollment smoking binge diabetes immunization;datalines;75.4 31.9 26.0 67.0 3.9 0.5 1.1 21.9 8.0 76.1 24.3 13.7 11.2
83.178.3 17.8 3.1 64.1 5.5 14.4 5.3 27.9 9.4 71.2 22.9 20.8 6.7
79.279.6 27.5 3.7 57.8 29.6 4.0 2.7 25.9 9.2 74.1 19.3 17.6 7.8
84.676.0 32.5 15.3 74.5 6.4 0.7 1.2 19.5 6.3 75.7 27.0 14.1 9.5
81.480.8 23.6 5.8 40.1 37.6 0.4 12.8 30.1 11.0 78.4 13.7 18.6 8.5
86.880.0 20.9 3.8 70.0 20.7 0.6 2.7 36.4 13.0 77.5 18.3 20.1 6.0
86.580.8 15.8 9.4 71.2 13.4 0.2 3.8 35.5 15.3 81.9 17.1 17.9 6.8
88.278.4 22.4 20.8 65.3 8.2 0.3 3.2 27.8 11.3 78.3 21.8 20.3 8.3
84.376.5 27.4 50.0 34.8 9.1 0.2 3.5 50.1 26.9 74.6 20.8 25 8.0
84.679.4 26.5 15.2 57.9 22.5 0.3 2.4 25.8 9.2 77.5 19.3 17.1 9.5
88.5
Running Head: Predicting Life Expectancy
77.2 28.8 30.0 55.9 8.8 0.2 3.2 27.3 9.8 77.2 21.2 16.6 9.979.0
81.3 16.0 1.5 22.7 8.9 0.2 37.7 29.5 9.6 74.9 16.8 21.5 8.281.5
79.5 23.9 0.6 84.0 11.2 1.1 1.2 24.4 7.7 76.1 17.2 16.6 7.077.6
79.0 22.1 14.3 63.7 15.8 0.1 4.5 30.8 11.5 79.7 20.9 23 8.382.2
77.6 25.3 9.0 81.5 6.0 0.2 1.6 22.7 8.1 76.9 25.6 17.8 9.685.3
79.7 19.4 2.9 88.7 5.0 0.3 1.7 24.9 7.9 79.0 20.4 23.1 7.084.2
78.7 22.0 5.7 78.2 10.5 0.8 2.3 29.8 10.5 78.4 22.0 17 8.185.7
76.0 31.6 7.7 86.3 3.1 0.2 1.1 20.5 8.1 74.7 29.0 16.1 9.986.0
75.7 32.1 31.8 60.3 4.2 0.6 1.5 21.4 7.0 75.2 25.7 16.1 10.787.7
79.2 22.0 1.1 94.4 1.3 0.6 1.0 26.8 9.5 77.7 22.8 17.3 8.390.3
78.8 15.2 29.0 54.7 8.2 0.2 5.5 36.1 16.4 78.5 19.1 18 8.789.1
80.5 17.0 6.0 76.1 9.6 0.2 5.3 39.0 16.7 81.2 18.2 20.6 7.287.2
78.2 27.6 14.0 76.6 4.4 0.6 2.4 25.2 9.6 79.2 23.3 19.7 9.186.4
81.1 17.4 5.1 83.1 4.7 1.0 4.0 31.8 10.3 79.2 19.1 22.1 5.987.3
75.0 37.8 36.9 58.0 2.7 0.5 0.9 19.5 7.1 76.1 26.0 14.2 11.382.4
77.5 26.0 11.5 81.0 3.5 0.4 1.6 25.6 9.5 76.6 25.0 19.2 9.182.0
78.5 24.2 0.4 87.8 2.9 6.1 0.6 28.8 9.0 75.7 22.1 20.8 6.574.4
79.8 22.2 4.4 82.1 9.2 0.8 1.7 28.6 9.0 80.2 20.0 22.7 7.884.9
78.1 26.4 7.7 54.1 26.5 0.9 7.1 21.7 7.4 71.8 22.9 18.6 8.676.0
80.3 13.8 1.0 92.3 2.8 0.2 2.1 32.8 12.4 79.4 19.4 18.7 7.290.0
80.3 16.8 12.8 59.3 17.7 0.1 8.2 35.4 13.3 81.3 16.8 18.2 8.480.6
78.4 32.9 1.7 40.5 46.3 8.5 1.3 25.0 10.8 74.7 21.5 16.4 7.985.2
80.5 23.7 14.4 58.3 17.6 0.3 7.3 32.5 14.0 79.1 18.1 19.6 8.484.4
77.8 28.8 21.2 65.3 8.4 1.1 2.2 26.5 8.7 76.2 21.8 15.2 9.384.1
79.5 20.5 1.1 88.9 2.0 5.3 1.0 27.6 7.9 74.1 21.9 23.8 7.681.0
77.8 28.2 12.0 81.1 3.1 0.2 1.7 24.6 8.9 77.9 25.1 20.1 9.986.1
Running Head: Predicting Life Expectancy
75.9 28.4 7.3 68.7 8.9 8.2 1.7 22.9 7.5 75.7 26.1 16.5 10.178.7
79.5 24.7 1.7 78.5 11.7 1.1 3.6 28.8 10.5 76.0 19.7 16.5 6.979.7
78.5 21.8 10.4 79.5 5.7 0.1 2.7 27.1 10.4 78.6 22.4 18.3 8.888.1
79.9 22.7 4.9 76.4 12.4 0.4 2.8 30.2 12.2 79.1 20.0 19.7 7.488.4
77.0 31.9 27.7 64.1 5.1 0.4 1.3 24.5 8.8 76.4 23.1 15.4 10.184.7
79.5 21.4 1.2 84.7 2.7 8.5 0.9 26.3 7.7 76.2 23.0 22.1 6.684.3
76.3 29.4 16.5 75.6 4.6 0.3 1.4 23.1 8.5 75.3 23.0 10 10.487.7
78.5 28.7 11.5 45.3 37.6 0.3 3.8 25.9 8.6 76.3 19.2 18.9 9.783.0
80.2 16.7 0.9 80.4 13.0 1.0 2.0 29.3 9.4 76.9 11.8 12 6.183.1
80.5 23.5 0.9 94.3 1.5 0.3 1.3 33.6 13.3 77.6 19.1 18.5 6.479.8
79.0 17.5 19.0 64.8 7.9 0.3 5.5 34.2 14.2 76.9 20.9 17.9 7.980.3
79.9 21.1 3.4 72.5 11.2 1.3 7.1 31.1 11.1 74.9 17.5 17.8 6.982.7
75.4 30.2 3.4 93.2 1.2 0.2 0.7 17.5 6.6 75.5 28.6 10.1 11.984.8
80.0 22.8 6.2 83.3 5.9 0.9 2.3 26.3 9.0 78.0 20.9 24.3 7.288.2
78.3 18.5 0.8 85.9 8.9 2.1 0.8 24.1 8.4 73.4 23.0 18.9 7.473.7
;run;
proc print data = mlr;run;
proc corr data = mlr;run;
proc reg data = mlr;model life = under_6 AfAm caucasian latino NatAm AsianAm bachelors_plus grad_degree school_enrollment smoking binge diabetes immunization/selection = stepwise;run;PROC UNIVARIATE DATA= mlr NORMAL PLOT;
TITLE "NORMALITY CHECKING";var residual;
RUN;
Running Head: Predicting Life Expectancy
Race
DATA PROJECT; INPUT LE WHT LAT AFAM ASAM NATAM;DATALINES;75.4 67.0 3.9 26.0 1.1 0.5 78.3 64.1 5.5 3.1 5.3 14.4 79.6 57.8 29.6 3.7 2.7 4.0 76.0 74.5 6.4 15.3 1.2 0.7 80.8 40.1 37.6 5.8 12.8 0.4 80.0 70.0 20.7 3.8 2.7 0.6 80.8 71.2 13.4 9.4 3.8 0.2 78.4 65.3 8.2 20.8 3.2 0.3 76.5 34.8 9.1 50.0 3.5 0.2 79.4 57.9 22.5 15.2 2.4 0.3 77.2 55.9 8.8 30.0 3.2 0.2 81.3 22.7 8.9 1.5 37.7 0.2 79.5 84.0 11.2 0.6 1.2 1.1 79.0 63.7 15.8 14.3 4.5 0.1 77.6 81.5 6.0 9.0 1.6 0.2 79.7 88.7 5.0 2.9 1.7 0.3 78.7 78.2 10.5 5.7 2.3 0.8 76.0 86.3 3.1 7.7 1.1 0.2 75.7 60.3 4.2 31.8 1.5 0.6 79.2 94.4 1.3 1.1 1.0 0.6 78.8 54.7 8.2 29.0 5.5 0.2 80.5 76.1 9.6 6.0 5.3 0.2 78.2 76.6 4.4 14.0 2.4 0.6 81.1 83.1 4.7 5.1 4.0 1.0 75.0 58.0 2.7 36.9 0.9 0.5 77.5 81.0 3.5 11.5 1.6 0.4 78.5 87.8 2.9 0.4 0.6 6.1 79.8 82.1 9.2 4.4 1.7 0.8 78.1 54.1 26.5 7.7 7.1 0.9 80.3 92.3 2.8 1.0 2.1 0.2 80.3 59.3 17.7 12.8 8.2 0.1 78.4 40.5 46.3 1.7 1.3 8.5 80.5 58.3 17.6 14.4 7.3 0.3 77.8 65.3 8.4 21.2 2.2 1.1 79.5 88.9 2.0 1.1 1.0 5.3 77.8 81.1 3.1 12.0 1.7 0.2 75.9 68.7 8.9 7.3 1.7 8.2 79.5 78.5 11.7 1.7 3.6 1.1 78.5 79.5 5.7 10.4 2.7 0.1 79.9 76.4 12.4 4.9 2.8 0.4 77.0 64.1 5.1 27.7 1.3 0.4 79.5 84.7 2.7 1.2 0.9 8.5 76.3 75.6 4.6 16.5 1.4 0.3 78.5 45.3 37.6 11.5 3.8 0.3 80.2 80.4 13.0 0.9 2.0 1.0 80.5 94.3 1.5 0.9 1.3 0.3
Running Head: Predicting Life Expectancy
79.0 64.8 7.9 19.0 5.5 0.3 79.9 72.5 11.2 3.4 7.1 1.3 75.4 93.2 1.2 3.4 0.7 0.2 80.0 83.3 5.9 6.2 2.3 0.9 78.3 85.9 8.9 0.8 0.8 2.1 ;PROC PRINT DATA=PROJECT;RUN;PROC CORR DATA=PROJECT;TITLE "CORRELATION MATRIX";VAR LE WHT LAT AFAM ASAM NATAM;RUN;
PROC REG DATA=PROJECT;MODEL LE = WHT LAT AFAM ASAM NATAM/SELECTION=STEPWISE;RUN;PROC REG DATA = PROJECT;TITLE "REGRESSION";MODEL LE = WHT LAT AFAM ASAM NATAM;OUTPUT OUT= GRAPH
P=PREDICTEDSTUDENT=ST_RESR=RESIDUAL;
RUN;
**GRAPHICAL SUMMARIES OF RESIDUALS TO CHECK ASSUMPTIONS;PROC UNIVARIATE DATA=GRAPH NORMAL PLOT;
TITLE "NORMALITY CHECKING";VAR RESIDUAL;
RUN;
SYMBOL VALUE=DOT COLOR=RED I=R;
PROC GPLOT DATA=GRAPH;TITLE "RESIDUALS VERSUS PREDICTED";PLOT ST_RES * PREDICTED;
RUN;
Health
DATA PROJECT; INPUT life smoking binge diabetes obesity immunization;DATALINES;24.3 13.7 11.2 32.0 83.1 24.322.9 20.8 6.7 27.4 79.2 22.919.3 17.6 7.8 25.1 84.6 19.327.0 14.1 9.5 30.9 81.4 27.013.7 18.6 8.5 23.8 86.8 13.718.3 20.1 6.0 20.7 86.5 18.317.1 17.9 6.8 24.5 88.2 17.121.8 20.3 8.3 28.8 84.3 21.820.8 25 8.0 23.8 84.6 20.8
Running Head: Predicting Life Expectancy
19.3 17.1 9.5 26.6 88.5 19.321.2 16.6 9.9 28.0 79.0 21.216.8 21.5 8.2 21.9 81.5 16.817.2 16.6 7.0 27.1 77.6 17.220.9 23 8.3 27.1 82.2 20.925.6 17.8 9.6 30.8 85.3 25.620.4 23.1 7.0 29.0 84.2 20.422.0 17 8.1 29.6 85.7 22.029.0 16.1 9.9 30.4 86.0 29.025.7 16.1 10.7 33.4 87.7 25.722.8 17.3 8.3 27.8 90.3 22.819.1 18 8.7 28.3 89.1 19.118.2 20.6 7.2 22.7 87.2 18.223.3 19.7 9.1 31.3 86.4 23.319.1 22.1 5.9 25.7 87.3 19.126.0 14.2 11.3 34.9 82.4 26.025.0 19.2 9.1 30.3 82.0 25.022.1 20.8 6.5 24.6 74.4 22.120.0 22.7 7.8 28.4 84.9 20.022.9 18.6 8.6 24.5 76.0 22.919.4 18.7 7.2 26.2 90.0 19.416.8 18.2 8.4 23.7 80.6 16.821.5 16.4 7.9 26.3 85.2 21.518.1 19.6 8.4 24.5 84.4 18.121.8 15.2 9.3 29.1 84.1 21.821.9 23.8 7.6 27.8 81.0 21.925.1 20.1 9.9 29.7 86.1 25.126.1 16.5 10.1 31.1 78.7 26.119.7 16.5 6.9 26.7 79.7 19.722.4 18.3 8.8 28.6 88.1 22.420.0 19.7 7.4 25.4 88.4 20.023.1 15.4 10.1 30.8 84.7 23.123.0 22.1 6.6 28.1 84.3 23.023.0 10 10.4 29.2 87.7 23.019.2 18.9 9.7 30.4 83.0 19.211.8 12 6.1 24.4 83.1 11.819.1 18.5 6.4 25.4 79.8 19.120.9 17.9 7.9 29.2 80.3 20.917.5 17.8 6.9 26.5 82.7 17.528.6 10.1 11.9 32.4 84.8 28.620.9 24.3 7.2 27.7 88.2 20.923.0 18.9 7.4 25.0 73.7 23.0;
PROC PRINT DATA=PROJECT;RUN;PROC CORR DATA=PROJECT;TITLE "CORRELATION MATRIX";VAR life smoking binge diabetes obesity immunization;RUN;
PROC REG DATA=PROJECT;MODEL life = smoking binge diabetes obesity immunization /SELECTION=STEPWISE;
Running Head: Predicting Life Expectancy
RUN;PROC REG DATA = PROJECT;TITLE "REGRESSION";MODEL life = smoking binge diabetes obesity immunization;OUTPUT OUT= GRAPH
P=PREDICTEDSTUDENT=ST_RESR=RESIDUAL;
RUN;
**GRAPHICAL SUMMARIES OF RESIDUALS TO CHECK ASSUMPTIONS;PROC UNIVARIATE DATA=GRAPH NORMAL PLOT;
TITLE "NORMALITY CHECKING";VAR RESIDUAL;
RUN;
SYMBOL VALUE=DOT COLOR=RED I=R;
PROC GPLOT DATA=GRAPH;TITLE "RESIDUALS VERSUS PREDICTED";PLOT ST_RES * PREDICTED;
RUN;
Poverty
DATA q1;INPUT Y X7 X8 X9 X10;DATALINES;75.4 31.9 10.7 53.0 46.9 78.3 17.8 5.7 38.0 43.0 79.6 27.5 7.7 45.0 32.4 76.0 32.5 10.2 60.0 49.8 80.8 23.6 9.7 53.0 50.8 80.0 20.9 8.1 40.0 48.9 80.8 15.8 6.6 34.0 64.4 78.4 22.4 7.7 48.0 47.3 76.5 27.4 13.1 72.0 73.6 79.4 26.5 9.9 56.0 51.6 77.2 28.8 10.7 57.0 50.9 81.3 16.0 6.8 47.0 54.6 79.5 23.9 7.9 45.0 36.6 79.0 22.1 8.4 44.0 56.0 77.6 25.3 6.8 47.0 40.2 79.7 19.4 6.7 39.0 50.6 78.7 22.0 7.7 48.0 47.5 76.0 31.6 11.2 57.0 48.5 75.7 32.1 11.5 66.0 55.0 79.2 22.0 9.5 43.0 39.1 78.8 15.2 7.7 40.0 51.7 80.5 17.0 8.7 34.0 62.3 78.2 27.6 8.0 46.0 49.3 81.1 17.4 8.3 37.0 46.3
Running Head: Predicting Life Expectancy
75.0 37.8 11.9 71.0 54.5 77.5 26.0 9.1 44.0 43.0 78.5 24.2 7.0 41.0 37.2 79.8 22.2 7.5 43.0 45.0 78.1 26.4 7.6 48.0 28.4 80.3 13.8 6.1 25.0 51.5 80.3 16.8 7.2 33.0 67.3 78.4 32.9 12.0 67.0 42.9 80.5 23.7 10.9 48.0 60.1 77.8 28.8 9.9 50.0 49.5 79.5 20.5 12.1 32.0 37.6 77.8 28.2 7.7 43.0 46.3 75.9 28.4 9.3 60.0 46.0 79.5 24.7 7.9 51.0 43.7 78.5 21.8 7.9 39.0 49.7 79.9 22.7 8.2 43.0 56.5 77.0 31.9 9.8 55.0 48.9 79.5 21.4 11.1 37.0 38.1 76.3 29.4 9.7 55.0 44.5 78.5 28.7 10.7 50.0 43.0 80.2 16.7 6.0 38.0 40.5 80.5 23.5 6.8 37.0 55.4 79.0 17.5 7.4 37.0 49.5 79.9 21.1 6.9 40.0 43.2 75.4 30.2 9.9 51.0 32.2 80.0 22.8 7.1 39.0 45.3 78.3 18.5 6.8 37.0 40.7 ;RUN;PROC PRINT DATA = q1;TITLE "Question 1";RUN;*CORRELATION ANALYSIS;PROC CORR DATA =q1;TITLE "q1a CORRELATION MATRIX";VAR Y X7 X8 X9 X10;RUN;
PROC REG DATA=q1;MODEL Y=X7 X8 X9 X10/SELECTION=STEPWISE;RUN;PROC REG DATA = q1;TITLE "REGRESSION";MODEL LE = WHT LAT AFAM ASAM NATAM;OUTPUT OUT= GRAPH
P=PREDICTEDSTUDENT=ST_RESR=RESIDUAL;
RUN;
**GRAPHICAL SUMMARIES OF RESIDUALS TO CHECK ASSUMPTIONS;PROC UNIVARIATE DATA=GRAPH NORMAL PLOT;
TITLE "NORMALITY CHECKING";
Running Head: Predicting Life Expectancy
VAR RESIDUAL;RUN;
SYMBOL VALUE=DOT COLOR=RED I=R;
PROC GPLOT DATA=GRAPH;TITLE "RESIDUALS VERSUS PREDICTED";PLOT ST_RES * PREDICTED;
RUN;
Education
data MLR;input y x1 x2 x3 x4;datalines;17.9 82.1 21.9 8.0 76.19.0 91.0 27.9 9.4 71.214.4 85.6 25.9 9.2 74.117.1 82.9 19.5 6.3 75.719.3 80.7 30.1 11.0 78.410.3 89.7 36.4 13.0 77.511.4 88.6 35.5 15.3 81.912.3 87.7 27.8 11.3 78.312.6 87.4 50.1 26.9 74.614.5 85.5 25.8 9.2 77.515.7 84.3 27.3 9.8 77.210.1 89.9 29.5 9.6 74.911.7 88.3 24.4 7.7 76.113.1 86.9 30.8 11.5 79.713.0 87.0 22.7 8.1 76.99.4 90.6 24.9 7.9 79.010.8 89.2 29.8 10.5 78.418.1 81.9 20.5 8.1 74.718.1 81.9 21.4 7.0 75.29.7 90.3 26.8 9.5 77.711.9 88.1 36.1 16.4 78.510.9 89.1 39.0 16.7 81.211.3 88.7 25.2 9.6 79.28.2 91.8 31.8 10.3 79.219.0 81.0 19.5 7.1 76.113.1 86.9 25.6 9.5 76.68.3 91.7 28.8 9.0 75.79.6 90.4 28.6 9.0 80.215.3 84.7 21.7 7.4 71.88.5 91.5 32.8 12.4 79.412.0 88.0 35.4 13.3 81.316.7 83.3 25.0 10.8 74.715.1 84.9 32.5 14.0 79.115.3 84.7 26.5 8.7 76.29.7 90.3 27.6 7.9 74.111.9 88.1 24.6 8.9 77.913.8 86.2 22.9 7.5 75.711.2 88.8 28.8 10.5 76.011.6 88.4 27.1 10.4 78.616.5 83.5 30.2 12.2 79.1
Running Head: Predicting Life Expectancy
15.9 84.1 24.5 8.8 76.410.4 89.6 26.3 7.7 76.216.4 83.6 23.1 8.5 75.319.3 80.7 25.9 8.6 76.39.4 90.6 29.3 9.4 76.99.0 91.0 33.6 13.3 77.613.5 86.5 34.2 14.2 76.910.2 89.8 31.1 11.1 74.916.8 83.2 17.5 6.6 75.59.9 90.1 26.3 9.0 78.07.7 92.3 24.1 8.4 73.4
;run;
proc print data = mlr;run;
proc corr data = mlr;run;
proc reg data = mlr;model y = x1 x2 x3 x4/selection = stepwise;run;
Running Head: Predicting Life Expectancy
References
1) Olshansky, S., Antonucci, T., Berkman, L., Binstock, R., Boersch-Supan, A., Cacioppo, J., . . . Rowe, J.
(2012). Differences In Life Expectancy Due To Race And Educational Differences Are Widening, And
Many May Not Catch Up. Health Affairs, 1803-1813.
2) Olshansky, S., Passaro, D., Hershow, R., Layden, J., Carnes, B., Brody, J., . . . Ludwig, D. (2010). A
Potential Decline in Life Expectancy in the United States in the 21st Century. Obstetrical & Gynecological
Survey, 450-452.
3) Lowery, A. (2014, March 15). Income Gap, Meet the Longevity Gap. New York TImes.