a data mining approach to investigate the factors influencing the crash severity of motorcycle...

6
A data mining approach to investigate the factors inuencing the crash severity of motorcycle pillion passengers Ali Tavakoli Kashani, Rahim Rabieyan, Mohammad Mehdi Besharati School of Civil Engineering, Iran University of Science & Technology, Tehran, Iran abstract article info Article history: Received 15 April 2014 Received in revised form 21 July 2014 Accepted 17 September 2014 Available online 7 October 2014 Keywords: Motorcycle pillion passengers Crash severity Classication and regression trees Introduction: Motorcycle passengers comprise a considerable proportion of trafc crash victims. During a 5 year period (20062010) in Iran, an average of 3.4 pillion passengers are killed daily due to motorcycle crashes. This study investigated the main factors inuencing crash severity of this group of road users. Method: The Classication and Regression Trees (CART) method was employed to analyze the injury severity of pillion passen- gers in Iran over a 4 year period (20092012). Results: The predictive accuracy of the model built with a total of 16 variables was 74%, which showed a considerable improvement compared to previous studies. The results indi- cate that area type, land use, and injured part of the body (head, neck, etc.) are the most inuential factors affect- ing the fatality of motorcycle passengers. Results also show that helmet usage could reduce the fatality risk among motorcycle passengers by 28%. Practical Applications: The ndings of this study might help develop more targeted countermeasures to reduce the death rate of motorcycle pillion passengers. © 2014 National Safety Council and Elsevier Ltd. All rights reserved. 1. Introduction Trafc crashes are considered as a major public health problem worldwide, claiming 1.27 million annual deaths and between 20 and 50 million injuries (de Oña, López, & Abellán, 2013). Similarly, trafc crashes are one the main causes of death among the Iranian people. In addition, motorcycle is a popular transportation mode in Iran and is, un- fortunately, involved in a signicant proportion of fatal crashes. Accord- ing to the Iran Forensic Medicine Organization report, motorcycle passengers are involved in a substantial proportion of fatal crashes. Based on the statistics, 30,901 motorcyclists have been killed in the traf- c crashes during 2006 to 2010, which is about 25.7% of the total trafc crash fatalities occurring during this period. Furthermore, over 20% of the motorcyclists killed in these crashes are motorcycle passenger (and not riders) with an average age of only 26 years old. In other words, during this 5 year period, an average of 3.4 pillion passengers (except riders) have been killed due to motorcycle crashes every day in Iran. The high proportion of mortality and severe injury among this group of road users necessitates more investigations on their crash characteristics. Unfortunately, so far no studies have been conducted to address this issue in Iran. Therefore, the objective of this study was to identify the factors inuencing the crash severity of pillion passen- gers by the means of CART method. Classication and Regression Trees (CART) method applied in this study is a commonly used data mining technique. CART is a non- parametric model with no pre-dened relationships between the target variable and predictors. Moreover, decision tree models can identify and easily explain the complex patterns associated with crash risk and do not need to specify a functional form (Kashani & Mohaymany, 2011). Hence, CART models are good alternatives for analyzing the injury severity of trafc crashes (Chang & Chien, 2013). In the next section of this paper, previous literature on motorcy- cle crash severity is reviewed. Section 3, represents an introduction to CART models and evaluation methods as well as a description of the study data. Next, the model results and discussions are presented in Section 4. Finally, the conclusion is described based on the study results. 2. Literature review Although several previous studies have investigated the crash severity of vehicle passengers (Kashani & Mohaymany, 2011; Newgard, Lewis, & Jolly, 2002), the crash characteristics of the mo- torcyclists are substantially different from four-wheeled vehicles as the former use helmet rather than a safety belt and they are completely unprotected. So far, the effect of several variables on the injury severity of motorcyclists has been investigated in the pre- vious studies. Such variables can be categorized into four classes: human factors, vehicle factors, crash characteristics, and roadway environmental attributes. The human factors include the rider Journal of Safety Research 51 (2014) 9398 Corresponding author at: School of Civil Engineering Iran University of Science and Technology Narmak, Tehran, Iran, P.O. Box 16846. E-mail addresses: [email protected] (A. Tavakoli Kashani), rahimrabie@ civileng.iust.ac.ir (R. Rabieyan), [email protected] (M.M. Besharati). http://dx.doi.org/10.1016/j.jsr.2014.09.004 0022-4375/© 2014 National Safety Council and Elsevier Ltd. All rights reserved. Contents lists available at ScienceDirect Journal of Safety Research journal homepage: www.elsevier.com/locate/jsr

Upload: mohammad-mehdi

Post on 24-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Safety Research 51 (2014) 93–98

Contents lists available at ScienceDirect

Journal of Safety Research

j ourna l homepage: www.e lsev ie r .com/ locate / js r

A data mining approach to investigate the factors influencing the crashseverity of motorcycle pillion passengers

Ali Tavakoli Kashani, Rahim Rabieyan, Mohammad Mehdi Besharati ⁎School of Civil Engineering, Iran University of Science & Technology, Tehran, Iran

⁎ Corresponding author at: School of Civil EngineeringTechnology Narmak, Tehran, Iran, P.O. Box 16846.

E-mail addresses: [email protected] (A. Tavacivileng.iust.ac.ir (R. Rabieyan), mehdi_besharati@civileng

http://dx.doi.org/10.1016/j.jsr.2014.09.0040022-4375/© 2014 National Safety Council and Elsevier L

a b s t r a c t

a r t i c l e i n f o

Article history:

Received 15 April 2014Received in revised form 21 July 2014Accepted 17 September 2014Available online 7 October 2014

Keywords:Motorcycle pillion passengersCrash severityClassification and regression trees

Introduction: Motorcycle passengers comprise a considerable proportion of traffic crash victims. During a 5 yearperiod (2006–2010) in Iran, an average of 3.4 pillion passengers are killed daily due to motorcycle crashes. Thisstudy investigated the main factors influencing crash severity of this group of road users. Method: TheClassification and Regression Trees (CART)methodwas employed to analyze the injury severity of pillion passen-gers in Iran over a 4 year period (2009–2012). Results: The predictive accuracy of themodel builtwith a total of 16variables was 74%, which showed a considerable improvement compared to previous studies. The results indi-cate that area type, land use, and injured part of the body (head, neck, etc.) are themost influential factors affect-ing the fatality of motorcycle passengers. Results also show that helmet usage could reduce the fatality riskamong motorcycle passengers by 28%. Practical Applications: The findings of this study might help develop

more targeted countermeasures to reduce the death rate of motorcycle pillion passengers.

© 2014 National Safety Council and Elsevier Ltd. All rights reserved.

1. Introduction

Traffic crashes are considered as a major public health problemworldwide, claiming 1.27 million annual deaths and between 20 and50 million injuries (de Oña, López, & Abellán, 2013). Similarly, trafficcrashes are one the main causes of death among the Iranian people. Inaddition,motorcycle is a popular transportationmode in Iran and is, un-fortunately, involved in a significant proportion of fatal crashes. Accord-ing to the Iran Forensic Medicine Organization report, motorcyclepassengers are involved in a substantial proportion of fatal crashes.Based on the statistics, 30,901motorcyclists have been killed in the traf-fic crashes during 2006 to 2010, which is about 25.7% of the total trafficcrash fatalities occurring during this period. Furthermore, over 20% ofthe motorcyclists killed in these crashes are motorcycle passenger(and not riders) with an average age of only 26 years old. In otherwords, during this 5 year period, an average of 3.4 pillion passengers(except riders) have been killed due to motorcycle crashes every dayin Iran. The high proportion of mortality and severe injury among thisgroup of road users necessitates more investigations on their crashcharacteristics. Unfortunately, so far no studies have been conductedto address this issue in Iran. Therefore, the objective of this study wasto identify the factors influencing the crash severity of pillion passen-gers by the means of CART method.

Iran University of Science and

koli Kashani), [email protected] (M.M. Besharati).

td. All rights reserved.

Classification and Regression Trees (CART)method applied in thisstudy is a commonly used data mining technique. CART is a non-parametric model with no pre-defined relationships between thetarget variable and predictors. Moreover, decision tree models canidentify and easily explain the complex patterns associated withcrash risk and do not need to specify a functional form (Kashani &Mohaymany, 2011). Hence, CART models are good alternatives foranalyzing the injury severity of traffic crashes (Chang & Chien, 2013).

In the next section of this paper, previous literature on motorcy-cle crash severity is reviewed. Section 3, represents an introductionto CART models and evaluation methods as well as a description ofthe study data. Next, themodel results and discussions are presentedin Section 4. Finally, the conclusion is described based on the studyresults.

2. Literature review

Although several previous studies have investigated the crashseverity of vehicle passengers (Kashani & Mohaymany, 2011;Newgard, Lewis, & Jolly, 2002), the crash characteristics of the mo-torcyclists are substantially different from four-wheeled vehicles asthe former use helmet rather than a safety belt and they arecompletely unprotected. So far, the effect of several variables onthe injury severity of motorcyclists has been investigated in the pre-vious studies. Such variables can be categorized into four classes:human factors, vehicle factors, crash characteristics, and roadwayenvironmental attributes. The human factors include the rider

94 A. Tavakoli Kashani et al. / Journal of Safety Research 51 (2014) 93–98

characteristics such as age, gender, riding license, safety helmet usage,and alcohol consumption. Several previous studies have reported thatthe rider age might influence the motorcycle crash severity (Cafiso, LaCava, & Pappalardo, 2012; Jou, Yeh, & Chen, 2012; Savolainen &Mannering, 2007). Elsewhere, (Evans & Frick, 1988; Mayrose, 2008;Ranney, Mello, Baird, Chai, & Clark, 2010; Savolainen & Mannering,2007) studies have argued that helmet usage might play a significantrole in reducing the injury severity ofmotorcycle passengers.Moreover,unlicensed male riders and consumption of alcoholic drinks can also beregarded as the factors influencing the severity of motorcycle crashes(Huang & Lai, 2011; Jou et al., 2012; Kasantikul, Ouellet, Smith,Sirathranont, & Panichabhongse, 2005; Savolainen & Mannering,2007). However, in their studies, Quddus, Noland, and Chin (2002)and Rifaat, Tay, and de Barros (2012) have reported that the severityof crashes is higher for female riders.

On the other hand, some physical characteristics of motorcyclessuch as the size and production year were also found to significantlyaffect the severity of motorcycle crashes (Jou et al., 2012; Savolainen& Mannering, 2007). Vlahogianni, Yannis, and Golias (2012) sug-gested that roadway and environmental factors such as geometricdesign, road type, pavement condition, weather condition, area type(i.e., urban or rural), and illumination might also affect the severity ofmotorcycle crashes. For instance, we refer to studies conducted by Li,Doong, Huang, Lai, and Jeng (2009) and Savolainen and Mannering(2007) where they have found that the severity of crashes increasedin rural areas, suburban regions, and dark conditions. Furthermore,the severity of crashes is reported to substantially reduce in badweath-er, particularly in winters and wet pavements (Pai & Saleh, 2007; Rifaatet al., 2012; Savolainen & Mannering, 2007). Finally, the crash charac-teristics such as crash type (e.g., single- or multi-vehicle crash) and col-lision type (e.g., head-on or rear-end collisions)might also influence theseverity of crashes. For instance, Savolainen and Mannering (2007)have indicated that if the crash is caused due to head-on collisions inmulti-vehicle crashes, the probability of passenger fatality will be ap-proximately six times greater.

In their study, Vlahogianni et al. (2012) presented a review on theliterature pertaining to motorcycle crash. They showed that pillionpassengers are not adequately addressed in the previous studies.Thus, influential factors affecting the fatality of pillion passengerswere identified in the current study. Another innovation of thisstudy is its spatial extent. Urban and rural crashes were investigatedsimultaneously in order to identify the interaction effects of the areatype and other factors upon the crash severity.

Fig. 1. General structure

3. Materials and methods

3.1. Model

To investigate the influence of factors on the severity of crashes, thedependent variable (injury type)was categorized into two levels of fataland injury. Regarding the type of dependent variable in the presentstudy, the Classification and Regression Trees (CART) was employedtomodel the effect of each variable on probability of crashes being fatal.

Fig. 1 shows theprinciple of theCARTmethod in developing the clas-sification tree. First, all of the data are concentrated at a node located atthe top of the tree. Then this so-called “root node” is divided into twochild nodes on the basis of a predictor variable (splitter) thatmaximizesthe homogeneity (i.e., purity) of the two child nodes. In fact, the data ineach child node are more homogenous than those in the upper parentnode. This process is continued repeatedly for each child node until allthe data in each node have the greatest possible homogeneity. Thisnode is called a terminal node or “leaf” and has no branches. In otherwords, the principle behind tree growing is to recursively partition thetarget variable to minimize “impurity” in the terminal nodes. Themost common measure of node impurity is the Gini criterion.

The Gini criterion is used to quantify the homogeneity based oncomputing the proportion of data that belong to a class. The Giniindex is defined as:

g tð Þ ¼X

i≠ jp jjtð Þp ijtð Þ ð1Þ

where i and j are categories of the target field, and:

p jjtð Þ ¼ p j; tð Þp tð Þ ; p j; tð Þ ¼ π jð ÞNj tð Þ

N j; p tð Þ ¼

Xjp j; tð Þ ð2Þ

where π(j) is the prior probability value for category j, Nj(t) is the num-ber of records in category j of node t, and Nj is the number of records ofcategory j in the root node. Note that when the Gini index is used to findthe improvement for a split during tree growth, only those records innode t and the root node with valid values for the split-predictor areused to compute Nj(t) and Nj, respectively.

In the CART method, tree growth will continue until there are onlysimilar observations in each terminal node. To decrease its complexity,the tree is pruned using a cost-complexity measure that combines theprecision criteria as opposed to complexity in the number of nodes

of a decision tree.

Table 2Variable description.

Variable Description

Injury severity Target variable: 1. Injury, 2. FatalGender 1. Male, 2. FemaleAge ContinuousSafety helmet 1. Used, 2. Not-usedInjured part of the body 1. Head, 2. Neck, 3. Chest and abdomen, 4. Back and

spine, 5. OthersArea type 1. Urban, 2. RuralLighting condition 1. Daylight, 2. Dark, 3. Dusk/dawnGeometry 1. Curve, 2. StraightCrash location 1. On roadway, 2. On Shoulder, 3. In median, 4. On

roadside, 5. Outside traffic way, 6. OtherRoad surface condition 1. Dry, 2. Wet, 3. Gravel/sand, 4. OtherWeather condition 1. Clear, 2. Fog, 3. Rain, 4. Snow, 5. Stormy, 6.

Cloudy, 7. DustyRoad type 1. Highway, 2. Main roads, 3. Local roadsCollision type 1. Head on, 2. Rear end, 3. Angle (front-to-side), 4.

Angle (back-to-side), 5. Sideswipe, 6. OthersPrimary cause of crash 1. Speeding, 2. Running from the crash scene, 3.

95A. Tavakoli Kashani et al. / Journal of Safety Research 51 (2014) 93–98

and processing speed, searching for the tree that obtains the lowestvalue for this parameter. Amore detailed description of the CARTmeth-od can be found in Breiman (1998).

3.2. Variable importance

The importance of each variable X (with h levels) in themodel is de-fined by the following equation:

VIM X ¼Xh

i¼1

nxin

I CjX ¼ xið Þ− cð Þð Þ ð3Þ

where C is class variable (severity), nxi is the number of cases for whichX = xi, n is the number of total cases, and I is the Gini Index.

3.3. Model assessment

In the present study, the Receiver Operating Characteristics (ROC)curves and the classification accuracy measurement were employed toevaluate the model performance. These measures are both extractedbased on classification table. In case the aim is to classify the dependentvariable into two levels (e.g. positive and negative), then themodel willclassify the samples into two categories based on the predicted proba-bilities for any sample and the determined cut-off point. Accordingly,four different cases are summarized in Table 1.

In Table 1, TP and TN, respectively denote the number of positive andnegative samples that have been accurately predicted by the model. Inreturn, FP and FN indicate the number of positive and negative samplesthat have been wrongly classified. To assess the performance of theCART model using the classification table, the following measuresneed to be calculated first:

Sensitivity ¼ TPR ¼ TP=P ð4Þ

1−Specificity ¼ FPR ¼ FP=N ð5Þ

accuracy ¼ TP þ TNð Þ= P þ Nð Þ ð6Þ

ROC is a two-dimensional diagram in which the FPR and TPR valuesare represented on the horizontal and vertical axis, respectively. Todraw the diagram, the TPR and FPR values of the cut-off points between0 and 1 are calculated using Eqs. (4) and (5). A curve is then fitted to thepoints. The area under the curve that always varies between 0.5 and 1 isconsidered as an appropriatemeasure to investigate the performance ofthemodel in classifying the samples. Therefore, it can be inferred that ifthe area under the ROC curve is ≥0.9, the model is considered to haveoutstanding discrimination. If the area under the curve is N0.8 andb0.9, themodel is considered to have excellent discrimination and final-ly, if the area under the curve is≥0.7 and b0.8, then themodelwould beconsidered to have acceptable discrimination.

The accuracy of the model is obtained through Eq. (6). It can be in-ferred that as the accuracy approaches to 1, the model would be muchpowerful to classify the samples.

Table 1Classifying the model according to the model prediction.

Observed Predicted

Negative Positive

FP TP PositiveTN FN NegativeN P Total

3.4. Crash data

For this study, Iran crash data maintained by the Iran Traffic Policefrom 2009 to 2012 have been used. As the scope of the present studywas to identify the factors influencing fatalities of pillion passengers, allthe records pertaining to individual passengers involved in motorcyclecrashes were extracted from the original database. Finally, 43,538 pillionpassengers involved in motorcycle crashes were identified for theanalysis.

These data are obtained from the Traffic Accident Record form,KAM 114, which contains important information about the crashes.The information covers different aspects of a traffic crash, includingthe cause of crash, collision type, vehicle type, area type, lightingcondition, weather condition, road surface condition (e.g., dry,wet, etc.), shoulder type, as well as driver characteristics such asage, gender, seat belt/helmet usage, type of driving license, and itsissue date.

Moreover, there exists another dataset containing passengercharacteristics (i.e., age, gender, helmet usage, vehicle type) andtheir crash severity level. To extract the desired data, the passengercharacteristics dataset was joined to the crash characteristicsdataset. It should be noted that the data pertaining to the pillion pas-sengers with the “no-injury” class is not recorded on the previouslymentioned traffic crash form. After linking the two datasets (i.e., thecrash characteristics and the passenger characteristics datasets),one can estimate the fatality and/or injury risk of motorcycle pillionpassengers. Note that since each record in the passenger dataset cor-responds to an individual passenger, if a motorcycle has two passen-gers in a crash, then there would be two separate records reportingthe characteristics and the injury severity outcome of eachpassenger.

Study variables and their levels are shown in Table 2. The dependentvariable of the model is crash severity with two levels (i.e., injury andfatality).

Failure to obey traffic rules, 4. Failure to detectother vehicles on time

Land use 1. Residential, 2. Commercial/office, 3. Industrial/manufacturing, 4. Recreational, 5. Agricultural, 6.Educational, 7. Non-residential, 8. Others

Human reason 1. Fatigue, 2. Disability, 3. Weakness due to aging, 4.Alcohol or drug consumption 5. Violation of trafficregulations, 6. Speeding violations, 7. Right of wayviolations 8. Driver unfamiliar with roadway, 9.Intentional violation of traffic laws, 10. Improperpacking, 11. Others, 12. None

Terrain type 1. Level, 2. Rolling, 3. Mountainous

Table 4Importance of the variables.

Variables Normalizedimportance

Variables Normalizedimportance

Area type 100.0% Safety helmet 7.3%Land use 83.7% Human reason 6.5%Injured part of the body 56.1% Crash location 5.2%Road type 31.4% Age 4.7%Geometry 21.5% Weather condition 2.4%Collision type 19.0% Road surface condition 1.6%Terrain type 16.0% Lighting condition 1.0%Primary cause of crash 12.2% Gender 0.0%

96 A. Tavakoli Kashani et al. / Journal of Safety Research 51 (2014) 93–98

4. Results

4.1. Prediction accuracy of models

The CART method was employed to analyze binary models andthe Gini index was used for tree growth. Prior probabilities,π(j) were set to be equal for model. The prior probability shows theproportion of each class in the population, but in cases where theproportion of one class is much greater than that of another class(as in the case of this study), if their prior probabilities are also ad-justed on the basis of the proportion of each class in the trainingdata, the resulting model will predict all of the data in the dominantclass, and thus, the overall accuracy of the model is increased. Due toimbalance frequency of the data belonging to each crash severityclass (i.e., fatality data is generally less than the data on propertydamage only or injury), the prediction accuracy of the less frequentlevel (i.e., fatality) would decrease. To circumvent this problem, incases where levels of target variables have an unbalanced propor-tion, but the same prediction accuracy importance, it has been sug-gested to set equal prior probabilities such that the ones that havea lower proportion may also be taken into consideration in predic-tions (Kashani & Mohaymany, 2011). Although the overall accuracyof the model decreases, the prediction accuracy of the data withthe least proportion increases, which is more important for decisionmakers in most cases. Many previous studies on the crash severityused unbalanced data to build the tree with the CART technique(Montella, Aria, D'Ambrosio, & Mauriello, 2012; Newgard et al.,2002; Pakgohar, Tabrizi, Khalili, & Esmaeili, 2011; Scheetz, Zhang, &Kolassa, 2009).

Classification accuracy and the area under the ROC curve for thetraining and testing data are shown in Table 3. According to this table,73.8% of injured passengers (injuries) and 79.2% of dead passengers (fa-talities) were correctly predicted. In addition, the overall classificationaccuracy of the model is about 73.9% for training data and 73.7% forthe testing data. The area under the ROC curve is about 0.813 for thetraining data, which shows the high power of the model to classify thesamples. It should be noted that the prediction accuracy of the modelhas been improved compared with the previous studies. This suggeststhat the 16 variables chosen for this study might properly predict thecrash severity of pillion passengers, (See Table 4.)

4.2. Variable importance

The importance of the variables in the model is calculated usingEq. (3). Table 4 shows the normalized importance of the model var-iables. According to the table, area type is the most important vari-able in the model. Moreover, crash patterns are different for urbanand rural areas. This might be due to the different traffic patterns inthese two area types (higher mean speeds, different traffic combina-tion and lower police control on the rural roads compared to urbanareas) and therefore, the considerable influence of this variable inthe crash severity had been already expectable. Land use with83.7% normalized importance is the second most important variable

Table 3Model assessment results (classification table, accuracy and area under the ROC curve(AUC)).

Predicted injury

Sample Observed injury Injury Fatality Accuracy AUC

Training Injury 22,204 7883 73.8% 0.813Fatality 88 336 79.2%Overall Percentage 73.1% 26.9% 73.9%

Testing Injury 9474 3385 73.7% 0.758Fatality 47 121 72.0%Overall Percentage 73.1% 26.9% 73.7%

in the table. The third important variable is the injured parts of thebody with 56.1% normalized importance. Other variable were lessimportant with their normalized importance varying between31.4% and 0.0%.

An interesting issue on Table 4 is the low normalized importance ofthe helmet variable, while this variable is one of the primary splitters inthe classification tree (see Fig. 1). This might be attributed to the factthat only 0.3% of pillion passengers involved in the crashes have useda safety helmet. Although helmet usage imposes a considerable changein theGini index (as it is one of thefirst splitters); but theNt/N ratio (seeEq. (3)) is very low for the “use” level and therefore, according to Eq. (3),the importance of this variable would be low. In other words, the totalpercentage of passengers who used a helmet is low enough that non-use of helmet by this group of passengers would not change the fatalityto injury ratio.

The table also shows a low importance for lighting condition andsex in the model. This result confirms the study of Kashani andMohaymany (2011), where they reported that the importance ofthese two variables for the crash severity of passengers in the two-lane rural roads is approximately zero. Conversely, de Oña et al.(2013) argued that sex and lighting condition might influence thecrash severity of driver and passenger. The observed discrepanciesin the reported results of previous studies might be attributed tothe difference in the predictor variables used in each of these studies.For instance, safety belt was found to be an important variable in thestudy conducted by Kashani and Mohaymany (2011), while de Oñaet al. (2013) did not consider this variable in their study. As the im-portance estimated in these studies was relative, the presence of anew variable in the model could result in a considerable change inthe importance of other variables (Table 4).

4.3. Decision tree

Fig. 2 shows theDecision Tree built using the CARTmethodwith 70%of the data for training and the remaining data (30%) for testing. Thetree has 23 nodes and 12 terminal nodes.

As shown in Fig. 2, the root node is split into two child nodes base onthe variable of area type. This indicates that the death probability of pil-lion passengers involved in the crashes occurred on the rural roads issignificantly higher than the urban areas (4.8% vs. 0.6%). This is similarto the results of the study conducted by Montella et al. (2012), wherethey used the CART method and the root node was split based on thevariable of road type. They showed that the probability of being fatalfor crashes occurred in the urban municipal roads is less than otherroad types (5.2% vs. 1%) while 80% of total crashes had occurred in theurban municipal roads.

The higher risk of fatal crashes in rural areas might be attributed tothe lower police control and the higher presence of heavy vehicles onthe rural roads as well as denser traffic on the urban roads comparedto each other (Li et al., 2009). The results of several other previous stud-ies support the claim that the motorcycle crash severity is higher in therural areas compared with the urban road (Cafiso et al., 2012; Jou et al.,2012; Li et al., 2009).

Fig. 2. Decision tree built with CART. ⁎In the above Decision Tree the variable of injured part of the body is labeled as the “injuries”.

97A. Tavakoli Kashani et al. / Journal of Safety Research 51 (2014) 93–98

Node 2, on the right side of the DT, is divided into two child nodes(nodes 5 and 6) based on the variable of the injured part of the body.This indicates that in the urban roads, head and neck injuries aremore likely to be fatal compared with other parts of the body (node6). Node 6 is then split into two terminal nodes (13, 14) based on thevariable of helmet, indicating that the use of helmet reduces the riskof fatality to zero. Node 5 is split into a child node and a terminalnode, indicating that the fatality risk of pillion passengers is higher inrecreational, agricultural, or non-residential land uses of urban areas.For other land uses, CART continues to split node 12 into the terminalnode 17 and child node 18 based on the collision type. This indicatesthat in these land uses, head on, angle, and sideswipe collisions areless severe than other collision types. Finally, node 18 is split into twoterminal nodes based on the variable of road type. This indicates thatif the crashes were not occurring in recreational, educational, or non-residential land uses and collision typewere not head on, angle, or side-swipe, then the crash severity in the urban highways would be higherthan other urban road types.

On the left side of the tree, node 2 is split into two child nodes (nodes3 and 4) based on the variable of helmet usage. This indicates that thecrash severity among those pillion passengers who used helmet islower and none of them were killed due to a head and neck injury(node 10). On the other hand, having a head and neck injury increasesthe risk of being killed among those pillion passengers who did notuse a helmet (nodes 7 and 8). CART continues to split node 8 into a ter-minal node and a child node based on the collision type. Node 15 indi-cates that head on collisions are more likely to be fatal. Finally, node16 is split into two terminal nodes based on the weather condition, in-dicating that in the clear and cloudyweather, the crashes aremore likely

to be fatal. It is noteworthy that the variable of land use is less importantin the crashes occurred on the rural roads (left branch of the tree) com-pared with the urban roads, where it is a primary splitter.

Node13 is oneof themost important nodes in the built DT, since 28%of passenger fatalities in urban areas are classified in this node (45 out of160). According to the tree, node 13 is comprised of the head or neck in-juries for those pillion passengers who did not use a helmet. On theother hand, according to node 14, none of the pillion passengers whoused a helmet were killed due to having a head or neck injury. Thus,the death probability of pillion passengers on the urban roads wouldbe expected to reduce by 28% if all pillion passengers wear helmets.The situation is the same for rural roads. In node 7, 28% of pillion passen-ger deaths in the rural areas (74 out of 264)were related to those pillionpassengers who did not use helmets and have head injury. On the con-trary, according to node 10, none of pillion passengers using helmetswere killed due to having a head injury. Thus, the fatality risk of pillionpassenger would be expected to reduce by 28% provided that all pillionpassengers wear helmets, regardless of what the area type is. This con-firms the results of previous studies. The helmet has been found to re-duce the fatality risk of motorcyclist by 29% (Ulmer & Preusser, 2003).The National Highway Traffic Safety Administration (NHTSA) reportedthat helmet use has led to a 37% reduction in fatality risk as well as65% reduction in the probability of head injury for riders involved in atraffic crash (Mayrose, 2008; Traffic Safety Administration, 2001).

5. Conclusion

Many previous studies have investigated the factors influencingthe motorcycle crash severity. On the other hand, few studies have

98 A. Tavakoli Kashani et al. / Journal of Safety Research 51 (2014) 93–98

specifically focused on the crash severity of pillion passengers. How-ever, extracting the pillion passenger involved cases from the wholemotorcycle crashes might help improve the homogeneity of theanalysis data. Furthermore, motorcycle passengers comprise a con-siderable proportion of traffic crash victims. Hence, in this study,the main factors influencing the mortality of this group of roadusers have been investigated using the CART method. CART methodhas several advantages over other data mining techniques. The firstadvantage of the CART model is that it does not require variables tobe selected in advance. It will identify the most significant variablesand eliminate insignificant ones. For example, in this study passen-ger sex was identified to be less important (see Table 3) and there-fore, the CART algorithm excluded this variable from the model. Inaddition, CART method does not require any assumptions on thedata and is capable of identifying complicated relationships amongvariables. Moreover, there is no need to determine the dependenciesamong independent variables in the CART method.

Results show that area type, land use, and injured part of the bodymight be the most important factors influencing the pillion passengerfatalities. Crash severity in the rural roads was found to be higher thanurban areas. This might be due to a longer distance to get to and/orlower quality of medical services provided in the rural medical centers.Thus, further investigations on the appropriateness of intercity medicaltreatment centers and the effect of these centers on the fatality risk ofroad users might be an interesting topic for future research. To thisend, medical information and traffic police crash database should belinked together in order to perform a comprehensive analysis on themain factors that increase the fatality risk of motorcyclists in the ruralroads compared with urban areas. Moreover, according to the land usevariable, crash severity on urban roads was higher in recreational,non-residential, and agricultural land uses. Therefore, identifying thecharacteristics of motorcycle crashes occurring in these land useswould be an interesting topic for future studies. Finally, results indicatethat having a head injury might increase the probability of the pillionpassenger fatalities.

The helmet usage and injured part of the body were the two var-iables presented after each other in the Decision Tree. This indicatesthat use of helmet could significantly prevent the head to be injuredduring the crash. Results also show that the death probability of pil-lion passengers might reduce by 28% provided that all the passengerswear helmets. In other words, according to the crash statistics pro-vided in the first section, if all the passengers wear a helmet, one pil-lion passenger per day might survive. Since the pillion passengersmight be unfamiliar with traffic safety, launching campaigns to pro-mote awareness of passenger safety issues could help to persuadepeople to use a proper helmet. Stricter helmet usage laws could beanother effective countermeasure.

It should be noted thatmother and child comprise a large proportionof pillion passengers in Iran. In addition, over 11% (67 out of 592) of pas-senger fatalitieswere under 10 years old. Nevertheless, according to thepolice report, child helmet usage rate in Iran is near zero. Therefore, an-other precautionarymeasure could be to encourage helmet use by chil-dren. In terms of future work, an investigation on the interactionsbetween motorcycle rider and passenger characteristics that might in-fluence the crash severity of motorcyclists would be useful.

Acknowledgments

Weacknowledge the invaluable help byMr.Mishani andMr. Sabbaqfrom the Information and Technology Department of Iranian Traffic Po-lice for providing the crash data.

References

Breiman, L. (1998). Classification and regression trees. Chapman & Hall/CRC.Cafiso, S., La Cava, G., & Pappalardo, G. (2012). A logistic model for Powered Two-

Wheelers crash in Italy. Procedia—Social and Behavioral Sciences, 53, 881–890.Chang, L. -Y., & Chien, J. -T. (2013). Analysis of driver injury severity in truck-involved ac-

cidents using a non-parametric classification tree model. Safety Science, 51(1), 17–22.de Oña, J., López, G., & Abellán, J. (2013). Extracting decision rules from police accident re-

ports through decision trees. Accident Analysis & Prevention, 50, 1151–1160.Evans, L., & Frick, M. C. (1988). Helmet effectiveness in preventing motorcycle driver and

passenger fatalities. Accident Analysis & Prevention, 20(6), 447–458.Huang,W. -S., & Lai, C. -H. (2011). Survival risk factors for fatal injured car andmotorcycle

drivers in single alcohol-related and alcohol-unrelated vehicle crashes. Journal ofSafety Research, 42(2), 93–99.

Jou, R. C., Yeh, T. H., & Chen, R. S. (2012). Risk factors in motorcyclist fatalities in Taiwan.Traffic Injury Prevention, 13(2), 155–162.

Kasantikul, V., Ouellet, J. V., Smith, T., Sirathranont, J., & Panichabhongse, V. (2005). Therole of alcohol in Thailand motorcycle crashes. Accident Analysis & Prevention,37(2), 357–366.

Kashani, A. T., & Mohaymany, A. S. (2011). Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models. Safety Science,49(10), 1314–1320.

Li, M. -D., Doong, J. -L., Huang, W. -S., Lai, C. -H., & Jeng, M. -C. (2009). Survival hazards ofroad environment factors betweenmotor-vehicles andmotorcycles. Accident Analysis& Prevention, 41(5), 938–947.

Mayrose, J. (2008). The effects of amandatorymotorcycle helmet lawonhelmet use and in-jury patterns among motorcyclist fatalities. Journal of Safety Research, 39(4), 429–432.

Montella, A., Aria, M., D'Ambrosio, A., & Mauriello, F. (2012). Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accident Analysisand Prevention, 49, 58–72.

Newgard, C. D., Lewis, R. J., & Jolly, B. T. (2002). Use of out-of-hospital variables to predictseverity of injury in pediatric patients involved in motor vehicle crashes. Annals ofEmergency Medicine, 39(5), 481–491.

Pai, C. W., & Saleh, W. (2007). An analysis of motorcyclist injury severity under varioustraffic control measures at three-legged junctions in the UK. Safety Science, 45(8),832–847.

Pakgohar, A., Tabrizi, R. S., Khalili, M., & Esmaeili, A. (2011). The role of human factor inincidence and severity of road crashes based on the CART and LR regression: A datamining approach. Procedia Computer Science, 3, 764–769.

Quddus, M. A., Noland, R. B., & Chin, H. C. (2002). An analysis of motorcycle injury and ve-hicle damage severity using ordered probit models. Journal of Safety Research, 33(4),445–462.

Ranney, M. L., Mello, M. J., Baird, J. B., Chai, P. R., & Clark, M. A. (2010). Correlates of mo-torcycle helmet use among recent graduates of a motorcycle training course. AccidentAnalysis & Prevention, 42(6), 2057–2062.

Rifaat, S. M., Tay, R., & de Barros, A. (2012). Severity of motorcycle crashes in Calgary.Accident Analysis & Prevention, 49, 44–49.

Savolainen, P., & Mannering, F. (2007). Probabilistic models of motorcyclists' injury sever-ities in single- and multi-vehicle crashes. Accident Analysis and Prevention, 39(5),955–963.

Scheetz, L. J., Zhang, J., & Kolassa, J. (2009). Classification tree modeling to identify severeand moderate vehicular injuries in young and middle-aged adults. ArtificialIntelligence in Medicine, 45(1), 1–10.

Traffic Safety Administration, N.H. (2001). Evaluation of the repeal of motorcycle helmetlaws. Annals of Emergency Medicine, 37(2), 229–230.

Ulmer, R. G., & Preusser, D. F. (2003). Evaluation of the repeal of motorcycle helmet laws inKentucky and Louisiana. (In).

Vlahogianni, E. I., Yannis, G., & Golias, J. C. (2012). Overview of critical risk factors inPower-Two-Wheeler safety. Accident Analysis and Prevention, 49, 12–22.

Ali Tavakoli Kashani earned his Bachelor of Engineering degree in Civil Engineering fromIsfahanUniversity of Technology in 2001.He receivedhisMaster of Science degree in 2003and PhD degree in 2011 in Transportation Engineering from Iran University of Science &Technology. He has been Assistant Professor in the transportation department of IranUniversity of Science & Technology since 2011. He has also been the library director ofthe School of Civil Engineering and training manager of the Road Safety Applied ResearchCenter in Iran University of Science & Technology since 2006.His teaching courses include:Traffic Safety (for both MSc and PHD students), Advanced Engineering Statistics andEconometrics (for MSc students).He has publications in several journals including Traffic Injury Prevention, Safety Science,Journal PROMET—Traffic & Transportation, etc.

Rahim Rabieyan earned his Bachelor of Engineering degree in Civil Engineering fromUniversity of Isfahan in 2010. He received his Master of Science degree in 2014 inTransportation Engineering from Iran University of Science & Technology.

Mohammad Mehdi Besharati is currently PHD student in Iran University of Science andtechnology (IUST). He earned his Bachelor of Engineering degree in Civil Engineering fromShahid Chamran University in 2011. He received his Master of Science degree from IranUniversity of Science and technology in Transportation Engineering in 2014.