deer-vehicle crashes

21
Deer-Vehicle Deer-Vehicle Crashes Crashes Hui Hui Anne Anne Ben Ben

Upload: ivrit

Post on 24-Jan-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Deer-Vehicle Crashes. Hui Anne Ben. Goal. Create a model that will be useful in predicting the number of deer vehicle crashes on a given section of roadway. The Response Variable. Y = number of deer-vehicle crashes per half-mile section of roadway over 1 year period - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Deer-Vehicle Crashes

Deer-Vehicle Deer-Vehicle CrashesCrashesHuiHui

AnneAnne

BenBen

Page 2: Deer-Vehicle Crashes

GoalGoal

Create a model that will be useful in Create a model that will be useful in predicting the number of deer vehicle predicting the number of deer vehicle crashes on a given section of roadwaycrashes on a given section of roadway

Page 3: Deer-Vehicle Crashes

The Response VariableThe Response Variable

Y = number of deer-vehicle crashes per Y = number of deer-vehicle crashes per half-mile section of roadway over 1 year half-mile section of roadway over 1 year periodperiod

Location – Ashtabula CountyLocation – Ashtabula County

Page 4: Deer-Vehicle Crashes

The Predictor VariablesThe Predictor Variables

XX11 = no. of vertical curves = no. of vertical curves

XX22 = no. of horizontal curves = no. of horizontal curves

XX33 = no. of ditches = no. of ditches

XX44 = no. of residences = no. of residences

XX55 = no. driveways = no. driveways

XX66 = % of adjacent forest land = % of adjacent forest land

Page 5: Deer-Vehicle Crashes

PreviewPreview

40 observations total40 observations total 6 candidate regressors6 candidate regressors X’s are clearly knownX’s are clearly known

Lurking Variable(s) seem possible howeverLurking Variable(s) seem possible however

Y is a count per unit timeY is a count per unit time

Page 6: Deer-Vehicle Crashes

Lurking VariableLurking Variable

Residual plot from full linear regression reveals Residual plot from full linear regression reveals two distinct groupstwo distinct groupsData is divided in halfData is divided in half

Page 7: Deer-Vehicle Crashes

Enter a New VariableEnter a New Variable

Create an unknown variable by grouping Create an unknown variable by grouping data into two group based on the two data into two group based on the two groups from the residuals plotgroups from the residuals plot

Noticeable difference between Y values Noticeable difference between Y values for the first 20 observations and the last for the first 20 observations and the last 2020

Page 8: Deer-Vehicle Crashes

Unknown VariableUnknown Variable

New variable is XNew variable is X77 = unknown = unknown Variable is unknown to usVariable is unknown to us Was not considered during the collection of Was not considered during the collection of

datadata

Page 9: Deer-Vehicle Crashes

Variable selectionVariable selection

Best Subsets method in conjunction with Best Subsets method in conjunction with Several Extra Sum of Squares TestsSeveral Extra Sum of Squares Tests Four variables XFour variables X11, X, X55, X, X66, X, X77 are chosen are chosen

Page 10: Deer-Vehicle Crashes

Linear regression Linear regression analysisanalysis

We run linear regressionWe run linear regression Y vs. XY vs. X11, X, X55, X, X66, X, X77

Model :Model : RR2 2 = 88.2%= 88.2% SSE = 51.17SSE = 51.17 Decent Decent

Page 11: Deer-Vehicle Crashes

Residuals Plot for the Residuals Plot for the Linear RegressionLinear Regression

Fitted Value

Resi

dual

876543210

5

4

3

2

1

0

-1

-2

-3

Residuals Versus the Fitted Values(response is Crashes)

Page 12: Deer-Vehicle Crashes

Correlation AnalysisCorrelation Analysis Prob > |r| under H0: Rho=0 CRASHES VERTCURVES DRIVEWAYS PCTFOREST UNKNOWN CRASHES 1.00000 0.36526 0.07332 0.60198 -0.93378 0.0205 0.6530 <.0001 <.0001 VERTCURVES 0.36526 1.00000 0.31108 0.57227 -0.31984 0.0205 0.0507 0.0001 0.0442 DRIVEWAYS 0.07332 0.31108 1.00000 0.49194 -0.10556 0.6530 0.0507 0.0013 0.5168 PCTFOREST 0.60198 0.57227 0.49194 1.00000 -0.59642 <.0001 0.0001 0.0013 <.0001 UNKNOWN -0.93378 -0.31984 -0.10556 -0.59642 1.00000 <.0001 0.0442 0.5168 <.0001

• Noticeable Correlation between XX77 and X and X66

• Unknown variable is associated with forested Unknown variable is associated with forested landland

Page 13: Deer-Vehicle Crashes

A Thought About the A Thought About the Unknown VariableUnknown Variable

Unknown variable negatively correlated Unknown variable negatively correlated with % of forested landwith % of forested land

possible values of Xpossible values of X77=Unknown: 0 and 1=Unknown: 0 and 1 Might correspond to section of countyMight correspond to section of county

0 -> rural part of county0 -> rural part of county 1 -> urban part of county1 -> urban part of county

Page 14: Deer-Vehicle Crashes

A TransformationA Transformation

Many transformations were attemptedMany transformations were attempted Best one:Best one:

YY** = ln( Y + = ln( Y + ee22 ) ) RR2 2 = 87.9% (untransformed)= 87.9% (untransformed) SSE = 51.00 (untransformed)SSE = 51.00 (untransformed) Conclusion: not better than original linear Conclusion: not better than original linear

modelmodel

Page 15: Deer-Vehicle Crashes

Poisson RegressionPoisson Regression

Recall: Y is a count per unit of timeRecall: Y is a count per unit of time A Poisson Model is now derivedA Poisson Model is now derived

Proc GENMODProc GENMOD Link function ln(Y)Link function ln(Y)

Page 16: Deer-Vehicle Crashes

Poisson Regression Poisson Regression AnalysisAnalysis

Fits and Residuals were collected from Fits and Residuals were collected from work library in SASwork library in SAS

RR2 2 = 89.15%= 89.15% SSE = 45.04SSE = 45.04 Not badNot bad

Page 17: Deer-Vehicle Crashes

Residuals Plot for Residuals Plot for Poisson ModelPoisson Model

Resr aw

- 3

- 2

- 1

0

1

2

3

4

5

Pr ed

0 1 2 3 4 5 6 7 8

Page 18: Deer-Vehicle Crashes

Dominant VariableDominant Variable

Type I and and Type III analysis in SASType I and and Type III analysis in SAS Suggests that the unknown variable is the Suggests that the unknown variable is the

only significant contributoronly significant contributor

Decision: do not throw out the other Decision: do not throw out the other regressorsregressors Unknown variable is just a dominating Unknown variable is just a dominating

variablevariable

Page 19: Deer-Vehicle Crashes

Type I and Type III AnalysisType I and Type III Analysis Model Information Data Set WORK.DEERCRASH Distribution Poisson Link Function Log Dependent Variable YStar Number of Observations Read 40 Number of Observations Used 40 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 35 20.7289 0.5923 Scaled Deviance 35 20.7289 0.5923 Pearson Chi-Square 35 19.4909 0.5569 Scaled Pearson X2 35 19.4909 0.5569 Log Likelihood 99.4896 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 1.8199 0.1648 1.4968 2.1430 121.88 <.0001 VERTCURVES 1 0.0692 0.1524 -0.2295 0.3679 0.21 0.6500 DRIVEWAYS 1 -0.0827 0.1072 -0.2929 0.1274 0.60 0.4403 PCTFOREST 1 0.0027 0.0047 -0.0065 0.0119 0.34 0.5622 UNKNOWN 1 -2.8304 0.4056 -3.6253 -2.0355 48.70 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. LR Statistics For Type 1 Analysis Chi- Source Deviance DF Square Pr > ChiSq Intercept 156.3864 VERTCURVES 142.2147 1 14.17 0.0002 DRIVEWAYS 142.0744 1 0.14 0.7080 PCTFOREST 107.7562 1 34.32 <.0001 UNKNOWN 20.7289 1 87.03 <.0001 The GENMOD Procedure LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq VERTCURVES 1 0.20 0.6510 DRIVEWAYS 1 0.60 0.4403 PCTFOREST 1 0.33 0.5656 UNKNOWN 1 87.03 <.0001

Page 20: Deer-Vehicle Crashes

The Winning ModelThe Winning Model

The Poisson Model gets our voteThe Poisson Model gets our vote

72.8304X6.0027X5.0827X10.0692X1.8199eY

Page 21: Deer-Vehicle Crashes

Thank YouThank You

Abdullah Alhomidan (civil engineering) Abdullah Alhomidan (civil engineering) gave permission for us to use his data.gave permission for us to use his data.

FINFIN