investigating the influence of dependence between ... the influence of dependence between variables...

18
Paper No.: 16-2432 Investigating the Influence of Dependence between Variables on Crash Modification Factors Developed using Regression Models By Lingtao Wu* Ph.D. Candidate, Zachry Department of Civil Engineering Texas A&M University, 3136 TAMU College Station, Texas 77843-3136 Phone: 979-587-3518, fax: 979-845-6481 Email: [email protected] and Dominique Lord, Ph.D. Professor Zachry Department of Civil Engineering Texas A&M University, 3136 TAMU College Station, Texas 77843-3136 Phone: 979-458-3949, fax: 979-845-6481 Email: [email protected] Word count: 6,620 Words (4,870 Text + 5 Tables + 2 Figures) November 15, 2015 *Corresponding author

Upload: hoangtram

Post on 24-Mar-2018

224 views

Category:

Documents


3 download

TRANSCRIPT

Paper No.: 16-2432

Investigating the Influence of Dependence between Variables on Crash

Modification Factors Developed using Regression Models

By

Lingtao Wu*

Ph.D. Candidate, Zachry Department of Civil Engineering

Texas A&M University, 3136 TAMU

College Station, Texas 77843-3136

Phone: 979-587-3518, fax: 979-845-6481

Email: [email protected]

and

Dominique Lord, Ph.D.

Professor

Zachry Department of Civil Engineering

Texas A&M University, 3136 TAMU

College Station, Texas 77843-3136

Phone: 979-458-3949, fax: 979-845-6481

Email: [email protected]

Word count: 6,620 Words (4,870 Text + 5 Tables + 2 Figures)

November 15, 2015

*Corresponding author

Wu and Lord 2

ABSTRACT 1

Cross-sectional studies (particularly regression models) are commonly used for developing crash 2

modification factors (CMFs), especially when data for before-after studies are not available. A 3

major assumption with the frequently used regression models is that variables are independent of 4

each other. However, some studies have shown that this assumption may not always be true. The 5

quality of CMFs may potentially be affected under such circumstances. This study examined the 6

accuracy of CMFs estimated using regression models under the conditions that some variables 7

influenced crashes dependently, and quantified the amount of bias as a function of the 8

dependence. An adjustment factor (AF) was introduced to capture the degree of the dependence. 9

Using similar approach proposed in the authors’ previous work, various fixed CMFs and AFs 10

were assumed for two variables. Crash counts were randomly generated and CMFs were 11

estimated using regression models. Their qualities were evaluated. The main findings are 12

summarized as follows: (1) the commonly used regression models can produce biased CMFs if 13

the considered variables are not independent; (2) the bias is significantly correlated with the AF 14

(i.e., degree of the dependence of variables) and higher dependence leads to significant bias; and 15

(3) while the coefficients for the variables of interests are over- or underestimated, other 16

variables may be under- or overestimated to compensate for the biased estimated coefficients. 17

Safety analysts are recommended to examine the independence of variables when estimating 18

multiple CMFs using regression models. 19

Wu and Lord 3

INTRODUCTION 1

To address specific safety concerns, it is common for agencies to implement multiple 2

countermeasures at a given problematic entity (e.g., a hotspot) simultaneously. The safety 3

effectiveness of these treatments is usually evaluated after a certain period of the improvements. 4

Before-after studies are frequently used to estimate the effectiveness of treatments, mostly 5

quantified in forms of crash modification factors (CMFs). In general, a before-after study is only 6

able to estimate the combined CMF of multiple treatments, even though a few attempts have 7

been made to separate them using adjusted before-after methods (1). Alternatively, safety 8

analysts have utilized cross-sectional studies, particularly using regression models (or safety 9

performance functions, SPFs), to develop CMFs for individual treatments (2). 10

Although SPFs have been widely used for developing CMFs, some researchers have 11

pointed out that those CMFs may not be reliable because the cause and effect relationship 12

between crashes and variables cannot be properly captured in SPFs under some conditions (3-9). 13

For example, a critical assumption within the commonly used generalized linear models (GLMs) 14

is that the variables considered are influencing crashes independently. The safety effect of one 15

treatment will not be influenced by other measures implemented simultaneously. Under this 16

assumption, the combined CMF of multiple treatments is calculated as the product of CMFs for 17

individual treatments, as documented in the Highway Safety Manual (HSM) (10). However, this 18

assumption may be invalid in practice. The simultaneously implemented treatments might 19

overlap each other in reducing crashes, especially when the target crashes are the same. The 20

HSM also cautions users that the method of multiplying individual CMFs may over- or 21

underestimate the actual CMF for a combined improvement. Researchers have proposed 22

alternative methods to calculate combined CMFs from individuals to address this problem (11; 23

12). Because of this, estimating the combined effects of safety countermeasures has become an 24

emerging topic in recent years (13-15). Most of these studies confirmed the fact that the 25

combined CMF for multiple treatments does not equal to the simple product of single CMFs, 26

indicating these countermeasures are not independent. In this condition, the individual CMFs 27

derived from regression models may be biased. Considering the large number of CMFs in the 28

HSM and CMF Clearinghouse derived from regression models, it is necessary to examine the 29

accuracy of CMFs developed using regression models under the conditions that simultaneously 30

implemented treatments are not independently influencing crashes. 31

This study expands on recent work (i.e., (16)) preformed on the validation of CMFs 32

derived from cross-sectional studies using regression models, particularly focusing on the 33

conditions when variables are not independent. The primary objectives are to examine the 34

accuracy of CMFs derived from SPFs considering variables overlapping in reducing crashes and 35

to quantify the bias. 36

BACKGROUND 37

A number of CMFs for various single treatments of roadway segments and intersections are 38

provided in the HSM. No CMFs for combined treatments are available in the current version. 39

However, it is common in practice that multiple countermeasures are implemented 40

simultaneously at a site to reduce the number and severity of collisions. The recommended 41

approach (HSM method) of calculating the combined CMF for multiple treatments is multiplying 42

Wu and Lord 4

the CMFs for individual elements or treatments together, as shown in Equation 1. Very limited 1

combined safety effects have been reported in the CMF Clearinghouse (17). 2

1 2 ncomb X X XCMF CMF CMF CMF (1) 3

Where, 4

combCMF = the combined CMF for n elements or treatments ( 1X , 2X , , nX ); and, 5

iXCMF = the specific CMF for element or treatment iX . 6

The main concept of this approach is that the simultaneously implemented treatments are 7

considered independent. The safety effect of various countermeasures will not overlap when 8

implemented at the same time. But this is not always true, especially when the target crashes of 9

these countermeasures are the same. In such cases, the expected reduction in number of crashes 10

will usually be lower than the sum of individual treatments. And the product of individual CMFs 11

will underestimate the true combined CMF (i.e., safety benefits are overestimated) (18-20). To 12

address this problem, researchers have proposed a couple of alternatives for estimating combined 13

effects of multiple treatments, e.g., reducing the safety effects of less effective treatments, 14

applying only the most effective CMF, multiplying weighted factor (Turner method), weighted 15

average of multiple CMFs (also known as meta-analysis method), etc. Readers are referred to 16

(11), (12) and (21) for more details of these methods. A common concept within these 17

approaches is that simultaneously implemented treatments usually have overlapped safety 18

effects. 19

Park et al. (13) estimated CMFs for two single treatments (installing shoulder rumble 20

strips, and widening shoulder width) and the combined CMF for implementing the two 21

simultaneously on rural multilane highways. The results confirmed that the combined CMFs, in 22

general, did not equal to the product of the two single CMFs. The researchers further calculated 23

CMFs for multiple treatments using various combining methods and compared them with those 24

estimated using real data. It was found that each method applied to different crash types and 25

injury levels. 26

Park and Abdel-Aty (22) later developed adjustment functions for combined CMFs. An 27

adjustment factor (AF) or adjustment function (A-Function) was introduced to assess the 28

combined safety effects of two treatments (installing shoulder rumble strips, and widening 29

shoulder width) on rural two-lane highways. An AF higher than 1.0 indicated the combined 30

amount of crash reduction will be lower than the sum of individual treatments. And vice versa if 31

it was less than 1.0. Particularly, when it equaled to 1.0, the treatments were independent of each 32

other. The AF (or A-Function) used in the study is shown in Equation 2. 33

1 2 ncomb X X XCMF CMF CMF CMF AF (2) 34

Where, 35

AF = the adjustment factor for treatments 1X , 2X , , nX , AF > 0. 36

Three nonlinear A-Functions for the combined CMFs were developed considering 37

different crash types and severities. All of them were higher than 1.0, which indicated the 38

combined CMFs calculated using HSM method were underestimated. The amount of 39

underestimation varied based on crash types and severities. In addition, the AFs also varied as 40

Wu and Lord 5

the original shoulder width changed rather than kept as constant values. That means the level of 1

dependence between the two treatments was not identical among all conditions. 2

More recently, Park and Abdel-Aty (15) assessed the safety effects of multiple roadside 3

treatments (i.e., poles, trees, etc.) using GLM, GNM, and multivariate adaptive regression splines 4

(MARS) model. The MARS model could capture both nonlinear relationships and interaction 5

impacts between variables. Results generally showed that GNMs provided a slightly better fit 6

than GLMs, and MARS model outperformed the other two. The results from the MARS model 7

showed that the HSM combining method overestimated the combined safety effects of roadside 8

treatments by about eight to ten percent. The authors recommended the MARS model to be 9

considered when assessing safety effects of multiple treatments. 10

Although only a few studies estimated the combined effects of multiple safety treatments 11

(13; 15; 22-25), it has shown that some treatments or highway characteristics do influence 12

crashes dependently. Under such conditions, the independence assumption of commonly used 13

regression models cannot be met. This might potentially reduce the quality of CMFs. No matter 14

which CMF combination method is used, reliable individual CMFs are critical for estimating 15

safety effects of both combined and single treatments. So, it is necessary to examine the accuracy 16

of individual CMFs derived from regression models considering the dependence of variables. 17

METHODOLOGY 18

To examine the accuracy of CMFs derived from SPFs when variables influence crash risk 19

dependently, this study used simulated data. A similar approach as the one used in the authors’ 20

previous work (i.e., (16)) was adopted here, but was modified to fit the specific characteristics of 21

the study. The approach was first proposed by Hauer (3). The main concepts are: (1) assume 22

CMFs and the dependence for variables; (2) generate random crash counts; and (3) estimate 23

CMFs using SPFs and compare them with the assumed true values. The detailed characteristics 24

of the simulation procedures are not described in this paper due to the space limitation. Curious 25

readers are referred to (16). 26

The major difference in this study was the use of adjustment factors. An adjustment 27

factor was assumed to capture the combination effect of multiple treatments. This was similar to 28

the method used in the recent study by Park and Abdel-Aty (22). The combined CMF for 29

multiple treatments is calculated by Equation 3. 30

11 1

1

nX X X Xbase n nbase

n

I X I X

comb X XCMF CMF CMF AF (3) 31

Where, 32

combCMF = the combined CMF for a segment; 33

jXCMF = the assumed specific CMF for variable jX of the segment; 34

AF = assumed adjustment factor for variables 1X , 2X , , nX ; 35

jbaseX = the baseline condition for variable jX ; and, 36

Wu and Lord 6

j jbase

jX XI X

= indicator function for variable

jX . It equals to zero if variable jX of 1

the segment is equal to the baseline condition, otherwise 1.0. 2

The indicator functions made the adjustment factor to be working or not based on specific 3

conditions of the segment and the presumed dependence relationships between variables. 4

To simplify the analysis, only two variables, lane width and shoulder width, were 5

considered in this study. And each variable in the dataset was assigned one of two values: the 6

baseline and improved, respectively. For lane width, it was either 12 ft (baseline) or 13 ft (wider 7

lane). And for shoulder width, it was either 6 ft (baseline) or 7 ft (wider shoulder). This way, the 8

total segments could be classified into four categories: (1) baseline; (2) wider lane; (3) wider 9

shoulder; and (4) wider lane and wider shoulder. They are described in TABLE 1. 10

The CMF for lane width was assumed to be CMFLW with baseline equal to 12 ft. So, the 11

specific CMFs for lane widths of 12 ft and 13 ft were 1.0 and CMFLW, respectively. Similarly, 12

the CMF for shoulder width was assumed to be CMFSW with baseline equal to 6 ft. The specific 13

CMFs for shoulder widths of 6 ft and 7 ft were 1.0 and CMFSW, respectively. We assumed 14

neither CMFLW nor CMFSW equaled to 1.0. Furthermore, the adjustment factor was used to 15

capture the dependence of the safety effects of the two variables. That is to say, if a segment was 16

wider in both lane and shoulder, the combined CMF was multiplied by the adjustment factor. 17

The CMFs for lane width, shoulder width and combined CMFs for the four groups of segments 18

are shown in the last three columns of TABLE 1. 19

TABLE 1 Summary of Four Groups of Segments 20

Group LW

(ft)

SW

(ft)

CMF for

LW

CMF for

SW Combined CMF

Baseline 12 6 1.0 1.0 1.0

Wider Lane 13 6 CMFLW 1.0 CMFLW

Wider Shoulder 12 7 1.0 CMFSW CMFSW

Wider Lane and

Wider Shoulder 13 7 CMFLW CMFSW CMFLW CMFSW AF

Note: LW - lane width; SW - shoulder width. 21

22

Specifically, the assumed CMF for lane width (i.e., CMFLW) varied between 0.8 and 0.9. 23

And that for shoulder width (i.e., CMFSW) varied between 0.85 and 0.9. The adjustment factor 24

changed from 0.80, 0.90, 0.95, 1.05, 1.10 to 1.20. When the adjustment factor is less than 1.0, it 25

means widening both lane and shoulder width simultaneously will bring more safety benefits 26

than the “sum” of the two single treatments. The smaller the adjustment factor is, the more 27

benefit will be. In contrast, if it is more than 1.0, taking the two treatments simultaneously will 28

have a lower effect than their “sum”. The higher the adjustment factor is, the lower the combined 29

safety effect will be. In total, there were 24 scenarios in this study, shown in TABLE 2. The 30

inverse dispersion parameter (Phi) varied between 0.5, 1.0 and 2.0 in each scenario to reflect 31

different traffic characteristics. 32

Wu and Lord 7

TABLE 2 Summary of Scenarios 1

Scenario CMF for LW CMF for SW AF

1 0.8 0.85 0.80

2 0.8 0.85 0.90

3 0.8 0.85 0.95

4 0.8 0.85 1.05

5 0.8 0.85 1.10

6 0.8 0.85 1.20

7 0.8 0.9 0.80

8 0.8 0.9 0.90

9 0.8 0.9 0.95

10 0.8 0.9 1.05

11 0.8 0.9 1.10

12 0.8 0.9 1.20

13 0.9 0.85 0.80

14 0.9 0.85 0.90

15 0.9 0.85 0.95

16 0.9 0.85 1.05

17 0.9 0.85 1.10

18 0.9 0.85 1.20

19 0.9 0.9 0.80

20 0.9 0.9 0.90

21 0.9 0.9 0.95

22 0.9 0.9 1.05

23 0.9 0.9 1.10

24 0.9 0.9 1.20

Note: LW - lane width; SW - shoulder width. 2

3

The theoretical function of the generated crash counts in this study is shown in 4

Equation 4. 5

4

, , , ,2.67 10true i spf i comb i i i comb iN N CMF L AADT CMF (4) 6

Where, 7

,true iN = true crash mean for roadway segment i during a certain time period (i.e., one 8

year). The true crash mean was the theoretical number of crashes that may occur on a segment 9

during the period. This number was used to generate random crash counts; 10

iAADT = average annual daily traffic volume (AADT) of segment i (vehicles per day); 11

iL = length of segment i (mile); and, 12

,comb iCMF = the combined CMF for lane width and shoulder width of segment i . It was 13

calculated by the methods shown in TABLE 1 (the last column). 14

Wu and Lord 8

The CMFs for the two variables were derived from SPFs with similar procedures utilized 1

in the previous study (16). The considered functional form is shown in Equation 5, in which the 2

two variables (i.e., lane width and shoulder width) were assumed to influence crashes 3

independently. 4

1

0 2 3( ) ( )i i i iE L AADT exp LW SW (5) 5

Where, 6

( )iE = the estimated crash mean during a period (i.e., one year) for segment i ; 7

iLW = lane width of segment i (ft); 8

iSW = shoulder width of segment i (ft); and 9

0 1 2 3, , , = coefficients to be estimated. 10

The two coefficients for lane width and shoulder width (i.e., 2 and 3 in Equation 5) 11

were used to estimate the CMFs for the two variables, respectively. This study used two indexes, 12

estimation bias and error percentage, to evaluate the CMFs derived from SPFs. Actually, they 13

were the same as those in the previous study (16), and are reproduced here as Equations 6 and 7. 14

The higher the error percentage is, the less accurate the CMFs derived from SPFs are. 15

_ _=j j Assumed j SPFCMF CMF (6) 16

_

100j

j

j Assumed

eCMF

(7) 17

Where, 18

j = estimation bias of CMF for variable j ; 19

je = error percentage of CMF for variable j , (%); 20

_j AssumedCMF = assumed CMF value for variable j ; and 21

_j SPFCMF = CMF derived from the SPF for variable j . 22

The same goodness-of-fit (GOF) and prediction measures for the models were used in 23

this study: (1) Akaike information criterion (AIC), (2) Mean absolute deviance (MAD), and 24

(3) Mean-squared predictive error (MSPE). For detailed information about MAD and MSPE, 25

readers are referred to (26). 26

DATA DESCRIPTION 27

This study utilized the same roadway segments as those in the previous study (16). The segment 28

length and AADT were observed real data, while the two variables, lane width and shoulder 29

width, were generated from discrete uniform distributions, respectively. In total, there were 30

1,492 segments. TABLE 3 provides the summary statistics of the highway segments used in this 31

study. Since both lane width and shoulder width had a discrete uniform distribution with two 32

Wu and Lord 9

numbers, and they were independently generated, the four types of segment groups were equally 1

distributed among all the segments. Each accounted for approximately 25%. 2

3

TABLE 3 Summary Statistics of Highway Segments 4

Variable Sample Size Min. Max Mean (SD)

Length (mile) 1492 0.1 6.3 0.55 (0.67)

AADT 1492 502 24800 6643.9 (3996.4)

Lane Width (ft) 1492 12 13 12.5 (0.50)

Shoulder Width (ft) 1492 6 7 6.5 (0.50)

Note: SD - standard deviation. 5

6

RESULTS 7

The CMFs for the two variables and other modeling results of each scenario with an inverse 8

dispersion parameter equal to 0.5 are documented in TABLE 4. The results with other inverse 9

dispersion parameters are not presented in this paper due to the space limitation. When compared 10

with the previous study (16), the bias and error percentage were relatively high in this study. The 11

average error percentage was around 5.3% for CMFs of both lane width and shoulder width. The 12

maximum was about 10%. When the adjustment factor was less than 1.0, the CMFs for both lane 13

width and shoulder width were consistently underestimated. For example, the true CMFs for lane 14

width and shoulder width were 0.8 and 0.85, respectively, in Scenario 1 (adjustment factor 15

equaled to 0.80). Those derived from regression models were 0.73 and 0.77, respectively. Safety 16

analysts may misleadingly overestimate the safety benefits of widening the lane and that of 17

widening the shoulder. The results were contrary when the adjustment factor was more than 1.0. 18

CMFs were overestimated and benefits of widening lane or shoulder individually were both 19

underestimated. So, neither the CMFs for lane width nor those for shoulder width can reflect 20

their true individual safety effectiveness in this scenario. 21

Further, the relationship between the accuracy of CMFs and the presumed adjustment 22

factors were investigated. The relationship between error percentage and adjustment factor are 23

illustrated in FIGURES 1 and 2. FIGURE 1 shows the error percentage of CMFs for lane width 24

and FIGURE 2 shows that for shoulder width. The two figures clearly indicate that the error 25

percentage was highly related to the adjustment factor. The error percentage was consistently the 26

highest when the adjustment factor was 0.80 or 1.20. And the lowest when it was 0.95 or 1.05. 27

The error percentage became small as the adjustment factor became closer to 1.0. A special case 28

can be seen when the adjustment factor equaled to 1.0, the scenario configuration fell into that in 29

Scenario II (with two variables) of the previous study (16). The error percentage should be much 30

lower (close to zero) based on the findings in that study. So the adjustment factor considerably 31

influenced the CMFs for both lane width and shoulder width. When it was close to 1.0, this 32

influence might be minor. But when it became far from 1.0 (i.e., less than or more than 1.0), the 33

accuracy of CMFs can be significantly affected. The further away it is from 1.0, the lower is the 34

quality of the CMFs. In other words, the CMFs were biased when the multiple treatments were 35

actually not affecting crash risk independently. The rate at which the value became biased was 36

actually very high when the adjustment factor went away from 1.0. 37

Wu and Lord 10

The row of “Phi” in TABLE 4 lists the estimated inverse dispersion parameters from the 1

regression models for each scenario. All of them were very close to the corresponding true 2

values regardless of the assumed CMFs for variables. No significant influence of the adjustment 3

factor on the estimate of inverse dispersion parameters was found in this study. Similar results 4

were observed for other inverse dispersion parameters. 5

6

Wu and Lord 11

TABLE 4 Results of CMFs for Lane Width and Shoulder Width (Phi = 0.5)

# AF a

LW b SW b

Phi b AIC d MAD e MSPE f Theo

SPF

(SD) Bias E Theo

SPF

(SD) Bias E

1 0.80 0.8 0.73

(0.048) -0.07 9.26 0.85

0.77

(0.046) -0.08 8.92 0.49 8750.40 0.052 0.014

2 0.90 0.8 0.77

(0.047) -0.03 4.06 0.85

0.81

(0.048) -0.04 4.19 0.50 8908.33 0.043 0.012

3 0.95 0.8 0.79

(0.049) -0.01 1.46 0.85

0.83

(0.046) -0.02 1.77 0.49 8917.64 0.040 0.010

4 1.05 0.8 0.82

(0.05) 0.02 2.57 0.85

0.88

(0.05) 0.03 3.17 0.50 9056.37 0.039 0.010

5 1.10 0.8 0.83

(0.054) 0.03 3.77 0.85

0.89

(0.054) 0.04 4.44 0.49 9106.98 0.044 0.012

6 1.20 0.8 0.87

(0.047) 0.07 8.72 0.85

0.93

(0.052) 0.08 9.20 0.50 9200.09 0.051 0.014

7 0.80 0.9 0.82

(0.043) -0.08 9.08 0.85

0.77

(0.049) -0.08 9.68 0.50 9046.82 0.055 0.016

8 0.90 0.9 0.86

(0.046) -0.04 4.50 0.85

0.8

(0.044) -0.05 5.55 0.49 9162.17 0.043 0.011

9 0.95 0.9 0.88

(0.049) -0.02 2.08 0.85

0.83

(0.048) -0.02 2.64 0.50 9223.60 0.042 0.011

10 1.05 0.9 0.92

(0.045) 0.02 2.57 0.85

0.86

(0.045) 0.01 1.76 0.49 9307.60 0.040 0.009

11 1.10 0.9 0.94

(0.055) 0.04 4.11 0.85

0.89

(0.049) 0.04 4.80 0.50 9366.01 0.043 0.010

12 1.20 0.9 0.98

(0.06) 0.08 9.04 0.85

0.93

(0.05) 0.08 9.95 0.50 9451.37 0.054 0.015

13 0.80 0.8 0.73

(0.035) -0.07 9.00 0.9

0.82

(0.047) -0.08 8.48 0.49 8927.76 0.054 0.016

14 0.90 0.8 0.76

(0.046) -0.04 5.51 0.9

0.87

(0.047) -0.03 3.66 0.50 9003.28 0.042 0.011

Wu and Lord 12

Table 4 Continued

# AF a

LW b SW b

Phi b AIC d MAD e MSPE f Theo

SPF

(SD) Bias E Theo

SPF

(SD) Bias E

15 0.95 0.8 0.78

(0.04) -0.02 2.92 0.9

0.88

(0.047) -0.02 2.19 0.49 9072.50 0.038 0.009

16 1.05 0.8 0.81

(0.045) 0.01 1.65 0.9

0.92

(0.055) 0.02 2.31 0.49 9184.27 0.041 0.009

17 1.10 0.8 0.84

(0.046) 0.04 4.45 0.9

0.94

(0.051) 0.04 4.11 0.49 9232.53 0.041 0.009

18 1.20 0.8 0.87

(0.057) 0.07 9.33 0.9

0.98

(0.054) 0.08 8.58 0.49 9330.62 0.053 0.014

19 0.80 0.9 0.81

(0.05) -0.09 9.75 0.9

0.81

(0.048) -0.09 9.64 0.51 9155.48 0.058 0.018

20 0.90 0.9 0.86

(0.044) -0.04 4.74 0.9

0.87

(0.047) -0.03 3.64 0.50 9287.17 0.044 0.011

21 0.95 0.9 0.88

(0.047) -0.02 2.13 0.9

0.88

(0.053) -0.02 2.18 0.50 9358.16 0.042 0.010

22 1.05 0.9 0.92

(0.058) 0.02 2.70 0.9

0.93

(0.055) 0.03 3.43 0.51 9447.52 0.044 0.012

23 1.10 0.9 0.95

(0.049) 0.05 5.29 0.9

0.94

(0.053) 0.04 4.60 0.49 9490.10 0.044 0.011

24 1.20 0.9 0.99

(0.061) 0.09 9.80 0.9

0.99

(0.054) 0.09 10.12 0.50 9627.21 0.056 0.016

Note: # - scenario number; a – AF is the assumed adjustment factor; b – LW is for lane width, SW is for shoulder width, Theo means

the true CMF value, SPF is the mean of CMFs from 100 experiments, SD is the standard deviation of the 100 CMFs, E is error

percentage (%); c - the mean of inverse dispersion parameter estimated from 100 experiments; d, e, f – each is the mean of the

corresponding GOF of the 100 results.

Wu and Lord 13

1

FIGURE 1 Error percentage of CMFs for lane width (Phi = 0.5). 2

3

FIGURE 2 Error percentage of CMFs for shoulder width (Phi = 0.5). 4

0.0

3.0

6.0

9.0

12.0

15.0

0.7 0.8 0.9 1.0 1.1 1.2 1.3

Err

or

Per

ceta

ge

(%)

Adjustment Factor

Error Percentage of CMF for Lane Width

# 1 to 6 # 7 to 12 # 13 to 18 # 19 to 24

0.0

3.0

6.0

9.0

12.0

15.0

0.7 0.8 0.9 1.0 1.1 1.2 1.3

Err

or

Per

ceta

ge

(%)

Adjustment Factor

Error Percentage of CMF for Shoulder Width

# 1 to 6 # 7 to 12 # 13 to 18 # 19 to 24

Wu and Lord 14

Another interesting finding from this study was the GOF measurements. Both MAE and 1

MSPE were relatively small in each scenario, and they were very close to those in the previous 2

study (16). This indicated that the predicated crash number was quite close to the true crash 3

mean. However, this did not guarantee the quality of CMFs derived from regression models, as 4

has been described above. That is to say, although the fitting result seems to be good in terms of 5

GOF measurements, there can still be some substance issues with the models. A possible reason 6

is that some parameters may have been overestimated (or underestimated) while others may have 7

been underestimated (or overestimated) in the regression models. Take Scenario 1 as an 8

example. The specific theoretical function for generating crash counts is shown in Equation 8. 9

12 612 64

, 2.67 10 0.8 0.85i iLW SWi ii i

I LW I SWLW SW

true i i iN L AADT AF (8a) 10

Or equivalently, 11

12 6

, 0.103 0.16( 0.22 )i iLW SWi iI LW I SW

itrue i i i iAF SWN L AADT exp LW (8b) 12

Where, 13

12iiLW

I LW

= indicator function for lane width of segment i. It equals to 0 if the lane 14

width is 12 ft, otherwise 1.0, and, 15

6iiSW

I SW

= indicator function for shoulder width of segment i. It equals to 0 if the 16

shoulder width is 6 ft, otherwise 1.0. 17

The modeling output of one experiment in this scenario is shown in TABLE 5. It can be 18

seen that the coefficients for lane width and shoulder width were both obviously underestimated. 19

And that for AADT was slightly underestimated. But the intercept coefficient was overestimated. 20

Note that the specific theoretical value for the intercept is not directly given in TABLE 5 due to 21

the fact that it depends on the two indicator functions. In other words, the theoretical intercept 22

varied when the segment group changed. For segment Groups 1, 2 and 3, it was -2.27 (logarithm 23

of 0.103). But it was -2.49 (logarithm of the product of 0.103 and AF, 0.8) for segment Group 4. 24

The coefficient estimated from regression models was much higher than either of them. In this 25

experiment, the coefficients for lane width and shoulder width were both underestimated. It 26

seems the underestimation was compensated through overestimating the intercept coefficient. 27

Perhaps this explains the overall smaller MAE and MSPE values. 28

Wu and Lord 15

TABLE 5 Modeling Output of the an Experiment in Scenario 1 (Phi=0.5, NB Model) 1

Model Variable Theo. Value a Coef. Value b SE c p-Value

Intercept [ 0( )ln ] -2.27-0.223ILWISW d -1.810 0.713 0.0111

Ln(AADT) ( 1 ) 1.00 0.981 0.040 < 2e-16

Lane Width ( 2 ), ft -0.223 -0.352 0.045 3.26E-15

Shoulder Width ( 3 ), ft -0.162 -0.321 0.045 6.18E-13

AIC 8921.8

MAD 0.050

MSPE 0.014

a – theoretical value; b – estimated coefficient value; c – SE is standard error; 2

d –The theoretical value for intercept was calculated by taking the nature logarithm of the first 3

two terms of Equation 8b, 12 60.103i iLW SWi i

I LW I SWAF . ILW and ISW are the two indicator 4

functions of lane width and shoulder width, respectively. 5

6

CONCLUSIONS AND DISCUSSIONS 7

This paper has documented an extensive study on the validation of developing CMFs using 8

cross-sectional studies, particularly focusing on the independence assumption (i.e., variables 9

were influencing crashes independently) with the most commonly used regression models. The 10

main objective was to examine the accuracy of CMFs estimated from regression models when 11

the independence assumption cannot be satisfied. Two variables, lane width and shoulder width, 12

were considered in this study. And an adjustment factor was used to capture the dependence 13

between the two. An adjustment factor higher than 1.0 represents widening the two at the same 14

time will reduce fewer crashes than the sum of the individuals. And vice versa if the factor is 15

lower than 1.0. Various safety effects and adjustment factors for the two variables were assumed 16

and the accuracy of CMFs were evaluated using similar approach in the authors’ previous study 17

(16). The main conclusions are summarized as follows: (1) the commonly used regression 18

models can produce biased CMFs if the considered variables influence crashes dependently; 19

(2) the bias is highly correlated with the adjustment factor (i.e., degree of the dependence of 20

variables). When the factor is close to 1.0, indicating the dependence is weak, the bias of CMFs 21

derived using regression models is relatively small, and the CMFs may still be acceptable. 22

However, higher or lower adjustment factor reduces the quality of CMFs significantly; and 23

(3) the coefficients for both treatments of interests and other variables may be over- and/or 24

underestimated under the conditions of dependent variables. The dependence relationship may be 25

absorbed into the estimates of coefficients for other variables. In one example of this study, the 26

estimated intercept coefficient was overestimated, while the coefficients for the two variables 27

were underestimated. 28

The findings in this study raised cautions to safety analysts while developing CMFs or 29

modeling crashes with multiple variables. It is recommended to examine the dependence of 30

variables. If they potentially have significant overlap in reducing crashes (i.e., the independent 31

assumption is not matched), the CMFs for individual variables derived from the SPFs are much 32

Wu and Lord 16

likely to be biased. One may ask how to examine whether two countermeasures are independent 1

or not when dealing with real crash data. Admittedly, it is not easy to conclude whether or how 2

much two or more safety treatments are dependent of each other with limited studies on the 3

safety effects of multiple treatments. Therefore, more analyses on the combined safety effects are 4

needed in the future. Before enough solid theoretical supports are available, engineering 5

judgment and experiences may need to be considered. 6

There are a few limitations with this study. First, several aspects can influence the 7

modeling result and hence the CMFs, such as the functional form and the error distribution of the 8

statistical model, sample size, etc. (4; 16; 27-30). This study only considered the most frequently 9

used one, the NB model with a linear relationship between variables, and the sample size was 10

assumed to be large enough. Second, only two variables were used and such models may be 11

influenced by the omitted-variable bias in practice. In addition, their correlation was not 12

considered, which could potentially influence the results (31). To estimate reliable CMFs, these 13

questions need further consideration when dealing with real observed data. Finally, this study 14

only raised the problem that might affect the quality of CMFs derived using regression models. 15

Safety practitioners may be more interested in solutions to this problem. So, more sophisticated 16

approaches that can assess the overlap effects of multiple treatments need to be developed in the 17

future. 18

19

REFERENCES 20

[1] Richard, K. R., and R. Srinivasan. Separation of Safety Effects of Multiple Improvements by 21

Alternate Empirical Bayes Methods. Transportation Research Record: Journal of the 22

Transportation Research Board, No. 2236, 2011, pp. 27-40. 23

[2] Carter, D., R. Srinivasan, F. Gross, and F. Council. Recommended Protocols for Developing 24

Crash Modification Factors. http://www.cmfclearinghouse.org/collateral/CMF_Protocols.pdf. 25

Accessed June 26, 2014. 26

[3] Hauer, E. Trustworthiness of Safety Performance Functions.In the 93rd Annual Meeting of 27

the Transportation Research Board (TRB), Transportation Research Board, Washington, D.C., 28

2014. 29

[4] Lord, D., and F. Mannering. The Statistical Analysis of Crash-Frequency Data: A Review 30

and Assessment of Methodological Alternatives. Transportation Research Part A, Vol. 44, No. 31

5, 2010, pp. 291-305. 32

[5] Hauer, E. Cause and Effect in Observational Cross-Section Studies on Road Safety.In the 33

84th Annual Meeting of the Transportation Research Board (TRB), Transportation Research 34

Record, Washington D.C., 2005. 35

[6] Hauer, E. Cause, Effect and Regression in Road Safety: A Case Study. Accident Analysis & 36

Prevention, Vol. 42, No. 4, 2010, pp. 1128-1135. 37

[7] Hauer, E. The Art of Regression Modeling in Road Safety. Springer, 2015. 38

Wu and Lord 17

[8] Hauer, E. Fishing for Safety Information in Murky Waters. Journal of Transportation 1

Engineering, Vol. 131, No. 5, 2005, pp. 340-344. 2

[9] Davis, G. A. Accident Reduction Factors and Causal Inference in Traffic Safety Studies: A 3

Review. Accident Analysis & Prevention, Vol. 32, No. 1, 2000, pp. 95-109. 4

[10] AASHTO. Highway Safety Manual. American Association of State Highway and 5

Transportation Officials, Washington, D.C., 2010. 6

[11] Gross, F., A. Hamidi, and K. Yunk. Issues Related to the Combination of Multiple Cmfs. 7

Presented at the 91st Annual Meeting of the Transportation Research Board, Washington D.C., 8

2012. 9

[12] Elvik, R. An Exploratory Analysis of Models for Estimating the Combined Effects of Road 10

Safety Measures. Accident Analysis & Prevention, Vol. 41, No. 4, 2009, pp. 876-880. 11

[13] Park, J., M. Abdel-Aty, and C. Lee. Exploration and Comparison of Crash Modification 12

Factors for Multiple Treatments on Rural Multilane Roadways. Accident Analysis & Prevention, 13

Vol. 70, 2014, pp. 167-177. 14

[14] Park, J., M. Abdel-Aty, J. Lee, and C. Lee. Developing Crash Modification Functions to 15

Assess Safety Effects of Adding Bike Lanes for Urban Arterials with Different Roadway and 16

Socio-Economic Characteristics. Accident Analysis & Prevention, Vol. 74, 2015, pp. 179-191. 17

[15] Park, J., and M. Abdel-Aty. Assessing the Safety Effects of Multiple Roadside Treatments 18

Using Parametric and Nonparametric Approaches. Accident Analysis & Prevention, Vol. 83, 19

2015, pp. 203-213. 20

[16] Wu, L., D. Lord, and Y. Zou. Validation of Crash Modification Factors Derived from Cross-21

Sectional Studies Using Regression Models. Presented at the 94th Annual Meeting of the 22

Transportation Research Board (TRB), Washington D.C., 2015. 23

[17] CMFClearinghouse. Installation of Fixed Combined Speed and Red Light Cameras. 24

http://www.cmfclearinghouse.org/study_detail.cfm?stid=401. Accessed July 23, 2015. 25

[18] Harkey, D. L., R. Srinivasan, J. Baek, F. M. Council, K. Eccles, N. Lefler, F. Gross, B. 26

Persaud, C. Lyon, E. Hauer, and J. A. Bonneson. Accident Modification Factors for Traffic 27

Engineering and Its Improvements. Washington, D.C. : Transportation Research Board, 28

Washington, D.C., 2008. 29

[19] Bonneson, J., and D. Lord. Role and Application of Accident Modification Factors in the 30

Highway Design Process. Report FHWA/TX-05/0-4703-2, Texas Tranportation Institute, 31

College Station, TX, 2005. 32

[20] Roberts, P., and B. Turner. Estimating the Crash Reduction Factor from Multiple Road 33

Engineering Countermeasures. Presented at 3rd International Road Safety Conference, Perth, 34

Australia, 2007. 35

Wu and Lord 18

[21] Gross, F., and A. Hamidi. Investigation of Existing and Alternative Methods for Combining 1

Multiple Cmfs. 2

http://www.cmfclearinghouse.org/collateral/Combining_Multiple_CMFs_Final.pdf. Accessed 3

July 14, 2015, 2015. 4

[22] Park, J., and M. Abdel-Aty. Development of Adjustment Functions to Assess Combined 5

Safety Effects of Multiple Treatments on Rural Two-Lane Roadways. Accident Analysis & 6

Prevention, Vol. 75, 2015, pp. 310-319. 7

[23] De Pauw, E., S. Daniels, T. Brijs, E. Hermans, and G. Wets. To Brake or to Accelerate? 8

Safety Effects of Combined Speed and Red Light Cameras. Journal of safety research, Vol. 50, 9

2014, pp. 59-65. 10

[24] Wang, X., T. Wang, A. Tarko, and P. J. Tremont. The Influence of Combined Alignments 11

on Lateral Acceleration on Mountainous Freeways: A Driving Simulator Study. Accident 12

Analysis & Prevention, Vol. 76, 2015, pp. 110-117. 13

[25] Bauer, K., and D. Harwood. Safety Effects of Horizontal Curve and Grade Combinations on 14

Rural Two-Lane Highways. Transportation Research Record: Journal of the Transportation 15

Research Board, No. 2398, 2013, pp. 37-49. 16

[26] Lord, D., S. D. Guikema, and S. R. Geedipally. Application of the Conway-Maxwell-17

Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. Accident Analysis & 18

Prevention, Vol. 40, No. 3, 2008, pp. 1123-1134. 19

[27] Lord, D. Modeling Motor Vehicle Crashes Using Poisson-Gamma Models: Examining the 20

Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed 21

Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, 2006, pp. 751-766. 22

[28] Lord, D., and L. F. Miranda-Moreno. Effects of Low Sample Mean Values and Small 23

Sample Size on the Estimation of the Fixed Dispersion Parameter of Poisson-Gamma Models for 24

Modeling Motor Vehicle Crashes: A Bayesian Perspective. Safety Science, Vol. 46, No. 5, 2008, 25

pp. 751-770. 26

[29] Wu, L., Y. Zou, and D. Lord. Comparison of Sichel and Negative Binomial Models in Hot 27

Spot Identification. Transportation Research Record: Journal of the Transportation Research 28

Board, No. 2460, 2014, pp. 107-116. 29

[30] Zou, Y., L. Wu, and D. Lord. Modeling over-Dispersed Crash Data with a Long Tail: 30

Examining the Accuracy of the Dispersion Parameter in Negative Binomial Models. Analytic 31

Methods in Accident Research, Vol. 5–6, 2015, pp. 1-16. 32

[31] Wu, L. Examining the Use of Regression Models for Developing Crash Modification 33

Factors.In the Zachry Department of Civil Engineering, Doctoral Dissertation, Texas A&M 34

University, College Station, TX, 2016. 35

36