poisson regression caution flags (crashes) in nascar winston cup races 1975-1979 l. winner (2006)....
TRANSCRIPT
Poisson Regression
Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-
1979
L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal of Statistics Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html
Data Description
• Units: NASCAR Winston Cup Races (1975-1979) n=151 Races
• Dependent Variable: Y=# of Caution Flags/Crashes (CAUTIONS)
• Independent Variables:X1=# of Drivers in race (DRIVERS)
X2=Circumference of Track (TRKLENGTH)
X3=# of Laps in Race (LAPS)
Generalized Linear Model
• Random Component: Poisson Distribution for # of Caution Flags
• Density Function:
• Link Function: g(= log(• Systematic Component:
,...2,1,0!
,,,, 321
,,
321
321
yy
XXXeXXXyYP
yXXX
3322110321
3322110
,,
)log()(XXXeXXX
XXXg
Testing For Overall Model
• H0: (# Cautions independent of all predictors)
• HA: Not all j = 0 (# Cautions associated with at least 1
predictor)
• Test Statistic: Xobs2 = -2(L0-L1)
• Rejection Region: Xobs2 ≥ 2
,3
• P-Value: P(23 ≥ Xobs
2)
• Where: L0 is maximized log likelihood under model H0
L1 is maximized log likelihood under model HA
NASCAR Caution Flag Example
Criterion DF Value Value/ DFDeviance 150 215.4915 1.4366Scaled Deviance 150 215.4915 1.4366Pearson Chi-Square 150 201.6050 1.3440Scaled Pearson X2 150 201.6050 1.3440Log Likelihood 410.8784
Criterion DF Value Value/ DFDeviance 147 171.2162 1.1647Scaled Deviance 147 171.2162 1.1647Pearson Chi-Square 147 158.8281 1.0805Scaled Pearson X2 147 158.8281 1.0805Log Likelihood 433.0160
0)( :Model g
3322110)( :Model XXXg
02752.44 :value
815.7:)05.0(Region Rejection
2752.44)0160.4338784.410(22 :StatisticTest
23
23,05.
2
102
PP
X
LLX
obs
obs
Statistical output obtained from SAS PROC GENMOD
Testing for Individual (Partial) Regression Coefficients
half.in value-P cut"" then correct, is ofsign Confirm :Tests Sided-1
:value
SE :StatisticTest
2 :value)(
SE :StatisticTest
0:0:
^
221
2
2
^
^
22
^
^
0
j
obs
j
jobs
obs
j
jobs
jAj
XPP
X
zZPPZ
zZ
HH
NASCAR Caution Flag Example
Parameter DF Estimate Std Error Chi-Square Pr>ChiSqIntercept 1 -0.7963 0.4117 3.74 0.0531Drivers 1 0.0365 0.0125 8.55 0.0035TrkLength 1 0.1145 0.1684 0.46 0.4966Laps 1 0.0026 0.0008 10.82 0.0010
Conclude the following:
• Controlling for Track Length and Laps, as Drivers Cautions
• Controlling for Drivers and Laps, No association between Cautions and Track Length
• Controlling for Drivers and Track Length, as Laps Cautions Reduced Model: log(Crashes) = -0.6876+0.0428*Drivers+0.0021*Laps
Testing Model Goodness-of-Fit• Two Common Measures of Goodness of Fit:
– Pearson’s Chi-Square– Deviance
• Both measures have approximate Chi-Square Distributions under the hypothesis that the current model is appropriate for fixed number of combinations of independent variables and large counts
n
ii
ii
ii
n
ii
ii
yyG
VV
yX
1^
2
^^^
1^^
2^
2
log2 :Deviance
onDistributiPoisson for where :Square-Chi sPearson'
NASCAR Caution Flags Example
Null ModelCriterion DF Value Value/ DF P-ValuePearson X2 150 201.6050 1.3440 0.0032Deviance 150 215.4915 1.4366 0.0004
Full ModelCriterion DF Value Value/ DF P-ValuePearson X2 147 158.8281 1.0805 0.2386Deviance 147 171.2162 1.1647 0.0838
Note that the null model clearly does not fit well, and the full model fails to reject the null hypothesis of the model being appropriate (however, we have many combinations of Laps, Track Length, and Drivers)
SAS Program options ps=54 ls=76;data one;input serrace 6-8 year 13-16 searace 23-24 drivers 31-32 trklength 34-40 laps 46-48 road 56 cautions 63-64 leadchng 71-72;cards; 1 1975 1 35 2.54 191 1 5 13... 151 1979 31 37 2.5 200 0 6 35 ;run;/* Data set one contains the data for analysis. Variable names andcolumn specs are given in INPUT statement. I have included onyfirst and last observations *//* The following model fits a Generalized Linear model,with poisson random component, and a constant mean:g(mu)=alpha is systematic component, g(mu)=log(mu) is the link function:mu=e**alpha */proc genmod;model Cautions = / dist=poi link=log;run;/* The following model fits a Generalized Linear model,with poisson random component,g(mu)=alpha + beta1*drivers + beta2*trkength + beta3*laps is systematic component, g(mu)=log(mu) is the link function:mu=e**alpha + beta1*drivers + beta2*trkength + beta3*laps */proc genmod;model Cautions = drivers trklength laps / dist=poi link=log;run;quit;
SPSS Output
Hosmer-Lemeshow Test
• Used when there are “many” distinct levels of explanatory variables
• Based on “lumping” together cases based on their predicted values into J (often 10 is used) groups
• Compares observed and expected counts by group based on Deviance and Pearson residuals. For Poisson model (where obs is observed, exp is expected): Pearson: ri = (obsi-expi)/√expi X2=ri
2
Deviance: di = √(obsi* log(obsi/expi)) G2=2 di2
Degrees of Freedom: J- p-1 where p=#Predictor Variables
NASCAR Caution Flags Example
ii LDi e 0021.00428.06876.0
^
Group Fitted #Races #Crashes Expected Pearson1 <3.50 15 37 46.05 -1.332 3.50-3.80 14 60 50.37 1.363 3.80-4.08 18 72 71.24 0.094 4.08-4.25 20 68 84.03 -1.755 4.25-4.42 12 51 52.35 -0.196 4.42-5.15 17 100 81.39 2.067 5.15-5.50 15 88 78.19 1.118 5.50-6.25 15 91 87.40 0.389 6.25-6.70 14 94 90.81 0.3310 >6.70 11 63 78.46 -1.75
Pearson X2 15.5119P-value 0.0300
Note that there is evidence that the Poisson model does not provide a good fit
Computational Approach
i
i
iiiiii
n
ii
n
ii
n
i
n
i
n
i i
y
i
y
n
nnni
i
i
iiii
XXXg
y
X
X
Xyyye
l
yyeLl
y
ee
y
eyyL
y
y
y
X
X
X
XXXg
eeg
XXXg
yy
eyYP
iii
i
3
2
1
111
1 11
2
1
2
1
'
'2
'1
3
2
1
0
3
2
1
3322110
)(
3322110
1
!ln)ln(
!
exp
!,..., :Function Likelihood
1
:where)( :iSubject For
)log()( :FunctionLink
)( :Component Systematic
,...2,1,0!
)( :Function Massy ProbabilitPoisson
3322110
iiβx
i
'i
βx
βxβx
i
'i
xxxβ
βx
β
μY
x
x
x
Xβx
βx
'i
'i
'i
'i
Computational
Approach
1^1
^
^Old^Old^Old^New^
'i
βxii
βxi
iiβx
i
XWX'Gβ
ββgβGββ
βμYXgWXXG
μWWXXxxxxβ'β'β
0μYX00β
xxxβ
'i
'i
'i
^
i
i
i
iii
i
i
iiiiii
V
y
eyel
X
X
Xy
l
X
X
Xyyye
l
:matrix covariance- varianceestimated sample-large eapproximatwith
0
0
0
ln
of vector staring reasonable a with
:algorithmRaphson -Newton via of estimate the the toleads )(' and ' :Setting
diag where'
)('
1
Setting
1
01
2
3
2
1
3
2
1