poisson regression caution flags (crashes) in nascar winston cup races 1975-1979 l. winner (2006)....

15
Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal of Statistics Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html

Upload: magdalene-simmons

Post on 24-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Poisson Regression

Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-

1979

L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal of Statistics Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html

Page 2: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Data Description

• Units: NASCAR Winston Cup Races (1975-1979) n=151 Races

• Dependent Variable: Y=# of Caution Flags/Crashes (CAUTIONS)

• Independent Variables:X1=# of Drivers in race (DRIVERS)

X2=Circumference of Track (TRKLENGTH)

X3=# of Laps in Race (LAPS)

Page 3: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Generalized Linear Model

• Random Component: Poisson Distribution for # of Caution Flags

• Density Function:

• Link Function: g(= log(• Systematic Component:

,...2,1,0!

,,,, 321

,,

321

321

yy

XXXeXXXyYP

yXXX

3322110321

3322110

,,

)log()(XXXeXXX

XXXg

Page 4: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Testing For Overall Model

• H0: (# Cautions independent of all predictors)

• HA: Not all j = 0 (# Cautions associated with at least 1

predictor)

• Test Statistic: Xobs2 = -2(L0-L1)

• Rejection Region: Xobs2 ≥ 2

,3

• P-Value: P(23 ≥ Xobs

2)

• Where: L0 is maximized log likelihood under model H0

L1 is maximized log likelihood under model HA

Page 5: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

NASCAR Caution Flag Example

Criterion DF Value Value/ DFDeviance 150 215.4915 1.4366Scaled Deviance 150 215.4915 1.4366Pearson Chi-Square 150 201.6050 1.3440Scaled Pearson X2 150 201.6050 1.3440Log Likelihood 410.8784

Criterion DF Value Value/ DFDeviance 147 171.2162 1.1647Scaled Deviance 147 171.2162 1.1647Pearson Chi-Square 147 158.8281 1.0805Scaled Pearson X2 147 158.8281 1.0805Log Likelihood 433.0160

0)( :Model g

3322110)( :Model XXXg

02752.44 :value

815.7:)05.0(Region Rejection

2752.44)0160.4338784.410(22 :StatisticTest

23

23,05.

2

102

PP

X

LLX

obs

obs

Statistical output obtained from SAS PROC GENMOD

Page 6: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Testing for Individual (Partial) Regression Coefficients

half.in value-P cut"" then correct, is ofsign Confirm :Tests Sided-1

:value

SE :StatisticTest

2 :value)(

SE :StatisticTest

0:0:

^

221

2

2

^

^

22

^

^

0

j

obs

j

jobs

obs

j

jobs

jAj

XPP

X

zZPPZ

zZ

HH

Page 7: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

NASCAR Caution Flag Example

Parameter DF Estimate Std Error Chi-Square Pr>ChiSqIntercept 1 -0.7963 0.4117 3.74 0.0531Drivers 1 0.0365 0.0125 8.55 0.0035TrkLength 1 0.1145 0.1684 0.46 0.4966Laps 1 0.0026 0.0008 10.82 0.0010

Conclude the following:

• Controlling for Track Length and Laps, as Drivers Cautions

• Controlling for Drivers and Laps, No association between Cautions and Track Length

• Controlling for Drivers and Track Length, as Laps Cautions Reduced Model: log(Crashes) = -0.6876+0.0428*Drivers+0.0021*Laps

Page 8: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Testing Model Goodness-of-Fit• Two Common Measures of Goodness of Fit:

– Pearson’s Chi-Square– Deviance

• Both measures have approximate Chi-Square Distributions under the hypothesis that the current model is appropriate for fixed number of combinations of independent variables and large counts

n

ii

ii

ii

n

ii

ii

yyG

VV

yX

1^

2

^^^

1^^

2^

2

log2 :Deviance

onDistributiPoisson for where :Square-Chi sPearson'

Page 9: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

NASCAR Caution Flags Example

Null ModelCriterion DF Value Value/ DF P-ValuePearson X2 150 201.6050 1.3440 0.0032Deviance 150 215.4915 1.4366 0.0004

Full ModelCriterion DF Value Value/ DF P-ValuePearson X2 147 158.8281 1.0805 0.2386Deviance 147 171.2162 1.1647 0.0838

Note that the null model clearly does not fit well, and the full model fails to reject the null hypothesis of the model being appropriate (however, we have many combinations of Laps, Track Length, and Drivers)

Page 10: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

SAS Program options ps=54 ls=76;data one;input serrace 6-8 year 13-16 searace 23-24 drivers 31-32 trklength 34-40 laps 46-48 road 56 cautions 63-64 leadchng 71-72;cards; 1 1975 1 35 2.54 191 1 5 13... 151 1979 31 37 2.5 200 0 6 35 ;run;/* Data set one contains the data for analysis. Variable names andcolumn specs are given in INPUT statement. I have included onyfirst and last observations *//* The following model fits a Generalized Linear model,with poisson random component, and a constant mean:g(mu)=alpha is systematic component, g(mu)=log(mu) is the link function:mu=e**alpha */proc genmod;model Cautions = / dist=poi link=log;run;/* The following model fits a Generalized Linear model,with poisson random component,g(mu)=alpha + beta1*drivers + beta2*trkength + beta3*laps is systematic component, g(mu)=log(mu) is the link function:mu=e**alpha + beta1*drivers + beta2*trkength + beta3*laps */proc genmod;model Cautions = drivers trklength laps / dist=poi link=log;run;quit;

Page 11: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

SPSS Output

Page 12: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Hosmer-Lemeshow Test

• Used when there are “many” distinct levels of explanatory variables

• Based on “lumping” together cases based on their predicted values into J (often 10 is used) groups

• Compares observed and expected counts by group based on Deviance and Pearson residuals. For Poisson model (where obs is observed, exp is expected): Pearson: ri = (obsi-expi)/√expi X2=ri

2

Deviance: di = √(obsi* log(obsi/expi)) G2=2 di2

Degrees of Freedom: J- p-1 where p=#Predictor Variables

Page 13: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

NASCAR Caution Flags Example

ii LDi e 0021.00428.06876.0

^

Group Fitted #Races #Crashes Expected Pearson1 <3.50 15 37 46.05 -1.332 3.50-3.80 14 60 50.37 1.363 3.80-4.08 18 72 71.24 0.094 4.08-4.25 20 68 84.03 -1.755 4.25-4.42 12 51 52.35 -0.196 4.42-5.15 17 100 81.39 2.067 5.15-5.50 15 88 78.19 1.118 5.50-6.25 15 91 87.40 0.389 6.25-6.70 14 94 90.81 0.3310 >6.70 11 63 78.46 -1.75

Pearson X2 15.5119P-value 0.0300

Note that there is evidence that the Poisson model does not provide a good fit

Page 14: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Computational Approach

i

i

iiiiii

n

ii

n

ii

n

i

n

i

n

i i

y

i

y

n

nnni

i

i

iiii

XXXg

y

X

X

Xyyye

l

yyeLl

y

ee

y

eyyL

y

y

y

X

X

X

XXXg

eeg

XXXg

yy

eyYP

iii

i

3

2

1

111

1 11

2

1

2

1

'

'2

'1

3

2

1

0

3

2

1

3322110

)(

3322110

1

!ln)ln(

!

exp

!,..., :Function Likelihood

1

:where)( :iSubject For

)log()( :FunctionLink

)( :Component Systematic

,...2,1,0!

)( :Function Massy ProbabilitPoisson

3322110

iiβx

i

'i

βx

βxβx

i

'i

xxxβ

βx

β

μY

x

x

x

Xβx

βx

'i

'i

'i

'i

Page 15: Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races 1975-1979 L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal

Computational

Approach

1^1

^

^Old^Old^Old^New^

'i

βxii

βxi

iiβx

i

XWX'Gβ

ββgβGββ

βμYXgWXXG

μWWXXxxxxβ'β'β

0μYX00β

xxxβ

'i

'i

'i

^

i

i

i

iii

i

i

iiiiii

V

y

eyel

X

X

Xy

l

X

X

Xyyye

l

:matrix covariance- varianceestimated sample-large eapproximatwith

0

0

0

ln

of vector staring reasonable a with

:algorithmRaphson -Newton via of estimate the the toleads )(' and ' :Setting

diag where'

)('

1

Setting

1

01

2

3

2

1

3

2

1