rough sets for analyzing road traffic …. nithya, s. sharmila jeyarani, a. pradeep kumar 377 rough...
Post on 27-Apr-2018
224 Views
Preview:
TRANSCRIPT
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya1, S. Sharmila Jeyarani2, A. Pradeep Kumar3
1Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 2Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 3Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India
1nithya26cs@gmail.com, 2sharmimca85@gmail.com, 3pradeepmahesh094@gmail.com
ABSTRACT
Rapid growth of population coupled with increased economic activities has favored in tremendous
growth of vehicles. This is one of the primary factors responsible for road accidents. In
Transportation system each and every data is important for the process of decision making. Traffic
accident data analysis is a very big and complex task. The Traffic accident data consists of imprecise,
(or) uncertainty, (or) incomplete data. In comparison to traditional techniques, rough set theory gives
the optimal result without loss of information through proper analysis. Due to this reason, in this
study, it is aimed to classify and detect the causes of traffic accidents that took place in the Tamilnadu
in 2012-2014 for 10 cities according to their occurrence reasons, to analyze the classified data and to
discover useful knowledge from the database. Rough sets theory, a mathematical tool, used for
knowledge discovery which enables us to analyze the accident data in more than one dimension and
come out with a reason for the accidents. This enables us to make comments without ignoring any
of the accident reasons. ROSETTA software was used to analyze accident data in order to reduce
redundant data and extract decision rules. This paper aims to predict the causes of traffic accidents
through Rough Sets methodologies. The obtained results help us to relate the causes with accidents
which could help authorities and decision makers to take necessary precautions to reduce and prevent
the traffic road accident.
Keywords: Rough set theory, Traffic Accident data analysis, Rosetta data analysis toolkit
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 378
1. INTRODUCTION
A traffic collision occurs when a road vehicle
collides with another vehicle, pedestrian, animal,
geographical or architectural obstacle. It can result
in injury, property damage, and death. Road
accidents have been the major cause of injuries and
fatalities in worldwide for the last few decades. It
is one the serious problem in the world. Due to the
accidents, injuries even deaths also occurred. It is
done beyond our expectation, no one can control it.
But to know about what are the causes of traffic
accidents is must. Through this, it will speed up the
decision making to avoid traffic accidents.
WHO reported that over 1.2 million people die each
year on the world’s roads and 50 million suffer
nonfatal injuries [1]. Road Traffic injuries are one
of the top three causes of death for people aged
between 5 and 44 years. In 2030, as report suggests,
the 5th leading cause of death will be the traffic
accidents and related injuries. Pedestrians, cyclists,
drivers of motorized two – wheelers and their
passengers account for almost half of global road
traffic deaths [1]. A total of 4, 00,517 accidental
deaths were reported in the country during 2013
showing an increase of 1.4% as compared to 2012.
A total of 3, 94,982 accidental deaths were reported
in the country during 2012 showing an increase of
1.0% as compared to 2011 [2]. In Tamilnadu, more
number of road accidents happened on national
highways in 2011 and state high ways in 2012
which primarily depends on Type of Road. Among
the accidents, two wheelers got majority of traffic
accidents in 2011 and 2012. Fault of the driver is
the major causes of traffic accidents in 2010 and
2012 [3]. To predict these causes is important and
also complex task. Traffic data is very informative
but imprecise.
1.3
1.2
1.1
1.2 1.2 1.2
2 6 7
2012 2013 2014
Fig 1. Population in India 2012-2014
The population has been increased every year in
India. In fig 1 describes the population of India. In
2012 and 2013 the population is 1.22 billion, 1.26
billion respectively. The population had been
increased in 2014 of 1.27 billion. This is one of the
causes for accidents.
Fig 2. Road Traffic Accidents in India 2008-2012
Fig 2 shows that accident rate per year between
2008-2012 in India. Among those years, the
accident rate had been increased in 2009. Incidents
of Road accidents have been steadily mounting in
Tamil Nadu from 2008 to 2012. Number of Road
Accidents according to Type of vehicles from 2008
to 2012 is given below.
Table .1 Number of Road Accidents According
to Type of vehicles from 2008 to 2012
37.1
37.9
37.2 37.3 37.4
36.5
37
37.5
38
2008 2009 2010 2011 2012
TRAFFIC ACCIDENTS 2008 - 2012
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 379
S.
No
TYPE OF
VEHICLES 2008 2009 2010 2011 2012
1 Bus 9,506 9,331 8890 8295 7479
2 Truck / lorry 11,201 10,555 10,712 10,556 10160
3
Car / Jeep /
Taxi /
Tempo
15,380 15,943 18,038 18,248 19533
4 Two
Wheelers 15,820 17,274 19,086 19,492 21947
5 Three
Wheelers 4,357 3,747 3,777 3,759 3260
6 Others 4,145 3,944 4,493 5,523 5,378
Total 60,409 60,409 60,794 65,873 67,757
The table. 1 reveals that in 2012 two wheelers make
21, 947 accidents than the others. Traditional
techniques are not capable for producing correct
results from the available incomplete / redundant
data through the analysis process. Rough set theory
produces the optimal result without loss of
information from the original set [4]. Rough set
theory is a mathematical approach primarily meant
for imperfect data to acquire perfect knowledge.
The main advantages of the rough set approach is
that it, unlike probability in statistics and
membership function in the fuzzy set theory, does
not require any preliminary or additional
information about data. It provides relatively
efficient methods, algorithms and tools for finding
hidden patterns in data. It allows reducing original
data, i.e. to find minimal sets of data with the same
knowledge as in the original data. It allows
evaluating the significance of data. It allows
generating in automatic way the sets of decision
rules from data. It is easy to understand. It offers
straightforward interpretation of obtained results. It
is suited for concurrent (parallel/distributed)
processing.
Rough sets have been proposed for variety of
applications. This is important for Artificial
Intelligence and cognitive sciences, especially in
machine learning, knowledge discovery, data
mining, expert systems, approximate reasoning and
pattern recognition [15]. In medical, rough set with
neural network algorithm is implemented on
medical data set to test the efficiency of an
algorithm [13].
A methodology of hybrid Rough Neural Network
(RNN) algorithm was proposed for medical data
processing and to predict animal fertility rate [14].
The Rough set theory (RST) is useful tool for
reduce the input to artificial neural network and
improve the classification and prediction of
semen quality [16]. In this work, the traffic
accident datasets are collected from various
sources including daily newspapers for top 10
cities of Tamilnadu. The main objective of this
work is to find the applicability of rough set for
analyzing the causes of road traffic accidents in
order to preventing and controlling vehicle
accidents.
The proposed rough set theory based reduct set
gives the information about the accident causes.
The ROSETTA data analysis tool contains
structures and different processing algorithms. In
this work, we use Johnson reduction algorithm
which produces optimal results without loss of
information in original set.
The section 2 contains a review of the literature.
Section 3 describes the methodology applied for
the analysis process. Section 4 deals with the
experimentation and section. 5 reveal results and
discussion. Section 6 ends with the conclusion
and future direction.
2. RELATED APPLICATIONS
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 380
To analyze the causes of traffic accidents various
methods and techniques are used. There are many
research articles that analyze and predict the
causes of traffic accidents. Some of the literature
which shows the work on traffic analysis and
prediction are described in this section.
Olutayo V.A, et al. [5] employed Artificial Neural
Networks and Decision Trees techniques to
discover hidden information from historical data on
accidents in one of Nigeria’s busiest roads to reduce
carnage on highways. Data of accidents records on
the first 40 kilo meters from Ibadan to Lagos were
collected from Nigeria Road Safety Corps. The
data were organized into continuous and categorical
data. The continuous data were analyzed using
Artificial Neural Networks technique and the
categorical data were analyzed using Decision
Trees technique. Sensitivity analysis was performed
and irrelevant inputs were eliminated. The
performance measures used to determine the
performance of the techniques include Mean
Absolute Error (MAE), Confusion Matrix,
Accuracy Rate, True Positive, False Positive and
Percentage correctly classified instances. Results
reveal that the Decision Tree approach
outperformed Artificial Neural Network with a
lower error rate and higher accuracy rate. This work
concludes that the three most important causes of
accidents are tyre burst, loss of control and over
speeding.
Khair S. Jadaan, et al. [6] developed a traffic
accident prediction model using Artificial Neural
Network (ANN) simulation. Artificial Neural
Networks (ANN) is a novel approach which proved
to be successful in solving engineering problems
and researchers. The author aims to identify its
suitability for predicting traffic accidents under
Jordanian conditions. Jordan is a developing
country and it has high and growing level of traffic
accidents resulting in more than 13000 fatalities
between 1989 and 2012 with an average annual cost
of over $500 million. They used MATLAB
software for developing the traffic accidents
prediction model. The results demonstrated that the
estimated traffic accidents, based on sufficient data,
are close enough to actual traffic accidents.
S. Vigneswaran, et al. [7] proposed a model to
predict the severity of injury that occurred during
traffic accidents using two machine-learning
approaches. Machine learning is concerned with
design and development of algorithms. Machine
learning technique is primarily used recognize the
complex patterns and make intelligent decisions
based on data. Their work was focused on
recognizing traffic accident patterns based on
transport data of Government of Hong Kong. The
dataset used for this study contains traffic accident
records of 2008, a total number of 34,575 cases.
The dataset contains drivers’ records only and does
not include passengers’ information. They used
Weka knowledge explorer for detecting the causes
of traffic accidents. This work compared Naive
Bayesian classifier and J48 decision tree classifier
for classifying the type of injury severity of
various traffic accidents and the results show that
J48 outperforms Naïve Bayesian.
S. Krishnaveni, et al. [8] developed a model to
predict the severity of injury that occurred during
traffic accidents based on classification models.
They also used transport Government of Hong
Kong dataset 2008. The experimental work is
carried out through the Weka knowledge explorer.
The author compared Naive Bayes Bayesian
classifier, AdaBoostM1 Meta classifier, PART
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 381
Rule classifier, J48 Decision Tree Classifier and
Random Forest Tree classifier for classifying the
type of injury severity of various traffic accidents.
The final result shows that the Random Forest
outperforms than other four algorithms.
F. Rezaie Moghaddam, et al. [9] used artificial
neural network (ANN) to predict the accident
severity. ANN approach has been utilized for crash
severity prediction in urban highways and
identifying significant crash-related factors. The
models to illustrate the simultaneous influence of
human factors, road, vehicle, weather conditions
and traffic features including traffic volume and
flow speed on the crash severity in urban highways.
The results illustrate that the variables such as
highway width, head-on collision, type of vehicle at
fault, ignoring lateral clearance, following distance,
inability to control the vehicle, violating the
permissible velocity and deviation to left by drivers
are the most significant factors that increase crash
severity in urban highways.
M. Durairaj, et al. [10] proposed an intelligent
technique which uses Rough set theory for
analyzing the imprecise medical data. ROSETTA
data analysis tool was used for analyzing tabular
data. The rough set framework uses Johnson
algorithm and Genetic algorithm for reduction.
Johnson algorithm gave accurate results than the
Genetic algorithm. In this work, the influential
parameters on predicting the success rate of IVF
treatment was identified using rough set
applications. This work concluded that the standard
voting classifier gave more accuracy than other
reduct algorithms. This work also observed that the
Rough set theory based ROSEETA tool is an
efficient tool for processing redundant / inaccurate
data.
Badreldin O, et al. [11] designed an accident
forecasting model for decision making and planning
before the accidental loss occur. They used rough
sets theory and the weighting coefficient of all the
forecast models and observed the results of
forecasting was more exact based on the mean
relative absolute error. Rough set model scored the
lowest reading which was 0.51% compared to
9.16% for ARMA, 14.41.2% for Expert and 8.38%
for Neural Network. The experimental results of
traffic accidents prediction showed that the
performance of the proposed RST combination is
more precise and accurate compared to other
models like ARMA, Expert and Neural Network.
In this work, we use Rough sets for analyzing
traffic data for predicting accident causes which
can be used for decision making and prevention.
3. METHODOLOGY
A. Rough Set Theory
Rough Sets (RS) theory proposed by Zdzislaw
Pawlak in 1982 [4]. This methodology is
concerned with the classification and analysis of
imprecise, uncertain or incomplete information. It
is one of the first non-statistical approaches in data
analysis. Rough set methodology is more efficient
and excellent mathematical technique. In
comparison with traditional techniques, rough set
theory giving the optimal result from the analysis
process without loss of information in original set.
Rough set theory consists of classification,
reduction, rule generation and feature selection
methods. These methodologies handle uncertainty
or imperfect data.
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 382
B. Basic Concepts of Rough Set Theory
RS mathematical tool is introduced to treat the
vague and the imprecise data. Rough Set concept
can be defined quite generally by means of interior
and closure topological operations known as
approximation. A set is a collection of similar
characteristics of objects, relations, and functions.
A set of objects that possesses the similar
characteristics it is a fundamental part of
mathematics. A set is considered to be grouping
without all elements are absent and is known as an
empty set. The table 2 illustrates that the concepts
of rough set theory.
Table 2. Concepts of Rough set theory
S. No CONCEPTS OF ROUGH SET
THEORY
1.
2.
3.
4.
4.
5.
6.
Information System
Indiscernibility Relation
Approximations
Decision Table and decision Algorithms
Dependency of Attributes
Reduction of Attributes
Accuracy
C. Information System
In RST, data sets are shown as a table format. Each
row in table represents one situation of an object
and each column represents attributes about that
object. Tables which are created in this way are
called as information system. An information
system consists of universal set, U and attribute set
A. For, all attributes describe an information
function.
IS = (U, A)
U={x1, x2, x3 ….. xn}
IS: Information System
Xi: ith variable
For variables α ε A;
Fa: describes an information function as U → Va.
Va: a shows attribute sets including an attribute; it
is also called as “domain”.
D. Indiscernibility Relation
Indiscernibility Relation is a central concept in
Rough Set Theory, and is considered as a relation
between two objects or more, where all the values
are identical in relation to a subset of considered
attributes. In this situation we use notation as
follows: A = (U, A) for each attribute.
Indiscernibility relation is shown as IND A (B).
Ind: Indiscernibility relation. For B sub attribute
sets in A attribute sets. B (xi) =B (xj) →objects xi
and xj are indiscernible. Because Ind (B)
represents the minimal group of object, B
attribute set is called as basic set of the object.
E. Approximations
Approximations is an another concept in rough
set theory, being associated with the meaning of
the approximations topological operations. The
lower and the upper approximations of a set
interior and closure operations in a topology
generated by the Indiscernibilty relation.
An approximation space is an ordered pair A= (U,
R), where U is a finite and non-empty set of
elements called attributes R is an equivalence
relation about U.
Any set B⊆A there is an associated equivalence
relation called B indiscernibility relation.
𝐈𝐍𝐃A (𝐁) = {(𝐱, 𝐲) ∈ 𝐔𝟐 | ∀𝐚 ∈ 𝐁, 𝐚(𝐱) = 𝐚(𝐲)}
………. (1)
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 383
If (x, y) ∈ INDA (B), then x and y are indiscernible
from each other by attributes from b. The
Indiscernibility is an equivalence relation.
F. Lower Approximation (B*)
Lower approximation (B*) is a description of the
domain objects that are known with certainty
belong to the subset of interest.
𝐁* (𝐗) = ∪ {𝒀 ∈ 𝑼| 𝑰𝑵𝑫 (𝑷): 𝒀 ⊆ 𝑿} ………. (2)
G. Upper Approximation B* (X)
Upper Approximation is a description of the objects
that possibly belong to the subset of interest. The
Upper Approximation of a set X regarding R is the
set of all of objects which can be possibly classified
with X regarding R.
𝑩∗(𝑿) = ∪ {𝒀 ∈ 𝑼| 𝑰𝑵𝑫 (𝑷): 𝒀 ∩ 𝑿 ≠ 𝝋}
………. (3)
H. Boundary Region
The difference between upper and lower
approximation referred to as boundary region. The
B- boundary of X in the information system I, is
defined as:
𝑩𝑵𝑫 (𝑿) = 𝑩∗(𝑿) – 𝐁* (𝐗) ………. (4)
I.Decision Table And Decision Algorithms
A decision table contains two types of attributes
designated as the condition attribute and decision
attribute. Each row of the table determines a
decision rule which specifies the decisions that must
be taken when conditions are indicated by condition
attributes. The condition attributes defines the
decision attributes. Some cases two or more
condition attribute values ars consist of same or
similar values. But the decision attribute values are
differing. These set of decision rules are known as
either inconsistency, non- determinant or
conflicting.
The number of consistency rules ccontained in the
decision table known as a factor of consistence,
which can be denoted by γ (C, D), where C is the
condition attribute and D is decision attribute. If 𝛾
(𝐶, 𝐷) = 1, the decision table is consistent but if 𝛾
(𝐶, 𝐷) ≠ 1, the decision table is inconsistent. A set
of decision rules is designated as decision
algorithms, because for each decision table it can be
associated with the decision algorithms. It may be
made distinction between decision algorithm and
decisin table. A decision table is a data set, where
as a deision algorithm is a collection of implications
that is logical expressions.
J. Dependency of Attributes
Analysis of data dependency of attributes is to
discover the dependence between attributes. A set
of attributes D depends totally on a set of attributes
C, denoted as 𝐶 => 𝐷 if all values of attributes
from D are uniquely determined by values of
attributes from C then D depends totally on C. The
partial dependency means that only some values of
D are determined by values of C. If D and C is
subsets of A can be affirmed that D depends on C
in degree 𝐾 ( 0 ≤ 𝑘 ≤ 1) denoted => kD. And if
𝑘 = 𝛾 (𝐶, 𝐷). if J = 1 then D depends totally on C.
This dependency denoted by I (C) ⊆ I (D). If 𝑘 <
1 it is said that D depends partially on C. if k = 0
the decision attribute D does not depends on
condition attributes C.
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 384
K. Reduction of Attributes
The process of reducing an informtion system such
that the set of attributes of reduced information
system is independent and no attribute can be
eliminated further without losing some
information from the system is known as reduct.
Reduct is a minimum attributes subset that retains
the decision attributes dependence degree to
conditional attributes. The R ⊆ 𝑩 ⊆ 𝑨 such that
𝑌B (𝑌) =
𝑌R (𝑌) is called Y-reduct of B denoted as 𝑅𝑒𝑑Y (𝐵)
The core is possessed by every legimate reduct and
cannot be removed from the information system
without deteriorating basic knowledge of the
system.
The set of all indispensible attributes of B is called
Y- Core. Formally
𝒄𝒐𝒓𝒆Y (𝑩) = ∩ 𝑹𝒆𝒅Y (𝑩) ………. (8)
The Y-core is intersection of all Y-reducts of B
included in every Y-reducts of B.
L. Accuracy
Accuracy measures how much a set is rough . If a
set has B(X) = B(X) = X, the set is precise called
crisp and for every element x ∈ X ∈ U. This is
expressed by the formula.
ɑB(𝑿) = | 𝑩* (𝑿) / 𝑩∗(𝑿)| ………. (9)
when 0 ≤ ɑB(X) ≤ 1 , and if ɑB(X) = 1 X is crisp
with respect to B.
4. Application Of Rough Set For Analyzing
Traffic Data
`
A. Traffic Accident Dataset
The popularity of the vehicles and subsequent road
traffics are increasing steadily in the today’s
modern world. As a result of growing vehicles,
transportation related problems such as traffic
congestion, traffic accident and environmental
pollution rise alarmingly [17]. To identify the
causes of accident is very important to avoid
accidents. Here, The dataset for the study contains
traffic accident records of 2012 - 2014. The traffic
data collected from different sources including
online materials. According to the variable
definitions for dataset, this dataset has accident
records only and does not include passengers’
information. The Traffic accident datasets contain
7 fields and 100 objects (or) records. Table 2
shows the attributes of traffic accident data.
Table. 2. Road Traffic Accident Attributes
SNO ATTRIBU
TES
VALUES
1. City Name 1-9
1-Chennai,
2-Coimbatore,
3-Madurai,
4-Trichy,
5-Salem,
6-Thirunelveli,
7-Trippur,
8-Erode,
9- Vellore,
10- Tuticorin
2.
No. of
Persons
Killed 0-N 0……………….N
3.
No. of
Persons
Injured 0-N 0…………….N
4.
No. of
Vehicle
Involved 0-N 0…………….N
5. Severity 1,2 Slight, 2- Serious
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 385
6. Causes 1-9
1-overturn, 2-Hit, 3- loss
of control, 4- over Speed,
5-mishap, 6-break failure,
7- Drunk and drive, 8-
tyre burst, 9- collision
7. Major
Cause 1,2 1-Driver Fault, 2-
Machine Fault
B. Data Analysis
ROSETTA (A Rough Set Toolkit for
Analysis the Data)
ROSETTA is a toolkit for analyzing data table
within an essential supporting or underlying
structure of rough set theory. ROSETTA is
especially designed to support the overall data
mining and knowledge discovery process (KDD).
Data mining is the process of extraction of hidden
information from large databases. KDD process
contains data cleaning, data integration, data
selection, data transformation, data mining, pattern
evaluation, knowledge presentation.
RS performs the basic operation such as
preprocessing, data cleaning, data splitting, data
discretization, data reduction, rule generation, and
classification.
Steps involved in processing data are:
a) Import/export
b) Preprocessing
c) Computation
d) Post processing
e) Validation and analysis
The figure 1 illustrates the proposed
framework for analyzing traffic data using
Rosetta tool. Rough set algorithms are
existed in ROSETTA toolkit to process
data analysis. This algorithm produce
accurate result compared with traditional
techniques. The steps are importing data
from any valid data source (excel) format,
applying the binary splitting algorithm in
the imported data to split the original
dataset into training and test data, filling
the mean value, finally applying the
reduction and classification algorithms.
The reduction algorithm is used to
compute the reduct set and the
classification algorithm is used to device
reduct rule and compute the classification
result.
The Rosetta tool contains many algorithms
for reduction, such as Genetic algorithm, Johnson
algorithm, Holte’s IR algorithm, and Manual
reducer algorithm, etc. In this process, we used the
Johnson algorithm for reduction in the accident
data. Because it produces the optimal reduct set
comparing other algorithms from large datasets.
And also it eliminates the unnecessary attributes in
the original dataset without loss.
JOHNSON ALGORITHM
“ Johnson (CA, Df)
CA - set of conditional attributes
Df - Discernibility function”
(1) R = Ø; bestc=0;
(2) While (Df! = empty)
(3) For each a ε CA that appears in Df
(4) c = heuristic (a)
(5) If (c > bestc )
(6) Bestc=c; // bestAttr_ a
(7) R = R ᴜ a
(8) Df_ remove Clauses (Df, a)
(9) Return R
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 386
Fig. 2. Johnson Algorithm
This is a simple greedy heuristic algorithm and it
is applied to discernibility functions to find a
single reduct. The algorithm begins by setting the
current reduct candidate, R, to the empty set.
Then, each conditional attribute appearing in the
discernibility function is evaluated according to
the heuristic measure. This algorithm count of the
number of appearances an attribute makes within
clauses.
The highest heuristic value of an attribute is added
to the reduct. Discernibility functions containing
this attribute are removed in all clauses. Then all
clauses have been removed, the algorithm
terminates and returns the reduct R.
S No Reduct Support Length
1 {causes} 100 1
2 {causes} 100 1
3 {causes} 100 1
4 {causes} 100 1
5 {causes} 100 1
6 {causes} 100 1
7 {causes} 100 1
8 {causes} 100 1
9 {causes} 100 1
10 {causes} 100 1
11 {causes} 100 1
12 {causes} 100 1
13 {causes} 100 1
14 {causes} 100 1
15 {Severity} 100 1
16 {Severity} 100 1
17 {causes} 100 1
18 {causes} 100 1
19 {causes} 100 1
20 {Severity} 100 1
21 {causes} 100 1
22 {causes} 100 1
Table 3.Johnson Reduction Output
Johnson reduction algorithm applied to predict the
Traffic accident causes which can be used to
estimate the faults. Johnson reduction algorithm
gave 22 combinations of reduct set. It also gave a
minimal set of combinations effectively. The
output of Johnson reduction algorithm is as shown
in table
3.
5. Experimental Results
The majority of the traffic accidents are caused by
the vehicles which are crossing the road
unlawfully[17]. The traffic data set used in this
work contains attributes such as City name, No. of
person killed, No. of persons injured, No. of
vehicles involved, Severity, Causes and Main
cause. The reduction rule explains the rule LHS
support, RHS support, RHS accuracy, LHS
coverage, RHS coverage, RHS stability, LHS
length and RHS Length. Each row of the reduction
rule is called
descriptors. The left hand side and right hand rule LHS Coverage: LHS support divided are called
antecedent and consequent respectively. by the number of objects in the training This resulted reduction rule
is used for classification set.
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 387
process. The reduction rules generated is as RHS Coverage: RHS Support divided illustrated in table 4. by
the number of objects in the
For each rule, the following statistics are given: decision class listed in the THEN part LHS support:
Number of objects in of the rule.
the training set matching the IF-part. RHS Stability: Not applicable for the RHS support:
Number of objects in Johnson algorithm (always 1.0).
the training set matching the IF-part LHS Length: Number of attributes in and the THEN-part
(LHS and RHS the IF-part of the rule.
support is the same unless the THEN- RHS Length: Number of decisions in part contains
several decisions). the THEN-part [12].
RHS Accuracy: RHS support divided by LHS
support (Accuracy is 1.0
unless the THEN-part contains several decisions). S.NO RULE LHS S RHS S RHS
ACC LHS
COV RHS COV RHS ST LHS L RHS L
1 causes(Overturn) => Main
Cause(Driver Fault)
2 2 1.0 0.04 0.066667 1.0 1 1
2 causes(Hit) => Main
Cause(Driver Fault)
7 7 1.0 0.14 0.233333 1.0 1 1
3 causes(hit) => Main
Cause(Driver Fault)
11 11 1.0 0.22 0.366667 1.0 1 1
4 causes(Loss of Control) =>
Main Cause(Machine Fault)
1 1 1.0 0.02 0.05 1.0 1 1
5 causes(mishap) => Main
Cause(Driver Fault)
3 3 1.0 0.06 0.1 1.0 1 1
6 causes(Driver Fault) => Main
Cause(Driver Fault)
1 1 1.0 0.02 0.033333 1.0 1 1
7 causes(Over Speed) => Main
Cause(Driver Fault)
1 1 1.0 0.02 0.033333 1.0 1 1
8 causes(Break Failure) => Main
Cause(Machine Fault)
5 5 1.0 0.1 0.25 1.0 1 1
9 causes(Collision) => Main
Cause(Machine Fault)
12 12 1.0 0.24 0.6 1.0 1 1
10 causes(Mishap) => Main
Cause(Driver Fault)
1 1 1.0 0.02 0.033333 1.0 1 1
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 388
Rosetta has several algorithms for classification. Those
algorithms classify all objects in the decision table. The
classifiers are such as naive bayes, batch classifier, and
standard voting classifier and object voting classifier. In this
work, we used the naive bayes classifier for classifying the
road accident labels. The reduction rule will help to classify
the result. A Naive Bayesian classifier is a simple
probabilistic classifier based on applying Bayesian theorem
(from Bayesian statistics) with strong (naive) independence
assumptions. The reduction rule is used to make the
confusion matrix.
Table 5. Confusion matrix
The confusion matrix is as depicted in table 5 and
generated confusion matrix is as depicted in table 6.
Confusion matrix is a table layout that gives the
performance of an algorithm. Each column represents
the instance of the predicate class. Each row
represents the instance of the actual class. It is also
called error matrix or contingency table. Those
contain two classes of positive and negative
observations,
i. false positives (FP) are negative observations
classified to the positive class, ii. false negatives
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 389
(FN) are positive observations classified to the negative
class,
iii. true positives (TP) are correctly classified
positive observations and
iv. True negatives (TN) are correctly classified
negative observations.
Sensitivity and specificity are the fractions of correctly
classified positive and negative observations, respectively
(i.e. TP/ (TP+FN) and TN/(TN+FP)).
PREDICTED
ACTUAL
DRIVER
FAULT
MACHIN
E FAULT
DRIVER FAULT 31 1 0.96875
MACHINE FAULT 0 17 1.0
1.0 0.944444 0.979592
THR. (0, 1) 0.628
THR.ACC.
0.628
Table 6. Generated Confusion Matrix
0
1
2
3
4
5
6
Series 1
Series 2
Series 3
ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 390
The classification result of TPR and FPR are obtained
as given below:
Sensitivity (True Positive Rate) TPR = TP /
(TP+FN)
1-Specificity (False Positive Rate) FPR = TN /
(TN+FP)
PPV = TP / (TP+FP)
FPV = TN / (TN+FN)
Accuracy = (TP+TN) / (TP+FP+TN+FN)
Sensitivity for actual = TP/(TP+FN) =
31/(31+0)= 0.96875
Specificity for actual = TN/(TN+FP)
=17/(17+0) = 1.0
Sensitivity for Predicted = TP/ (TP+FP) =
31/ (31+0) =1.0
Specificity for actual = TN/(TN+FN) = 17/
(17+1) =0.944444
Accuracy = (TP+TN)/(TP+FP+TN+FN) =
0.979592
Then the result shows that navie bayes algorithm gave
the accuracy of 97% in predicting causes of an
accident. The receiver operating characteristic
(ROC) curve is constructed by plotting sensitivity
against specificity for the full range of possible
threshold values. This is created by plotting the TPR
and FPR at various threshold settings. The graph
drawn on actual and predicted result is depicted in
figure 3, which indicates the higher accuracy level of
rough sets analysis in predicting causes of accidents
on traffic data sets.
Fig. 3. ROC curve drawn on actual and predicted
resultConclusion Economic growth in developing
country like India resulted in growth of vehicle
manufacturing sectors. The growth of vehicle
manufacturing and increase in purchase power of
people ultimately resulted in increased road traffic.
The traffic accidents occurred every minute
particularly in the
developing countries. To detect the causes
of accidents is very important, which helps us to
reduce / avoid traffic accidents in some extent.
Rough set is one of the tools which could be
efficiently applied for processing large data sets. In
this paper, we describe the Rough set approaches for
detecting and analyzing the causes of an accident.
The techniques such as data
cleaning, data reduction,
rule generation, and classification are used
in this work.
We used Johnson algorithm for reduction and Navies
bayes for classification. Johnson algorithm gave
minimal reduct set without loss of information of
original data set. Navie bayes gave the 93% of
accuracy in prediction. From this work, we conclude
that Driver Fault is the major cause of Traffic accident.
REFERENCES
[1] www.who.int
[2] www.ncrb.gov.in
[3] www.tn.gov.in
[4] Z. Pawlak, “Rough Sets”, International Journal
of Computer and Information Sciences, Vol.11,
pp.
341-356 (1982).
[5] Olutayo V.A, Eludire A.A, “Traffic Accident
Analysis Using Decision Trees and Neural
Networks”,
0.92 0.94 0.96 0.98
1 1.02
0.94 0.96 0.98 1 1.02
False Positive Rate
ROC GRAPH
Roc plot for actual
Roc Plot for Prdicted
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 391
I.J. Information Technology and Computer Science, Vol. 2,
pp. 22-28, 2014.
[6] Khair S. Jadaan, Muaath Al-Fayyad, and Hala F.
Gammoh, “ Prediction of Road Traffic Accidents in Jordan
using Artificial Neural Network (ANN)”, Journal of Traffic
and Logistics Engineering Vol.
2,(2), 2014
[7] S. Vigneswaran1, A. Arun Joseph, E.
Rajamanickam, “Efficient Analysis of Traffic
Accident Using Mining Techniques”, Intentional
journal of hardware and software in Engineering,
Vol. 2(3) , pp. 110-118, 2014.
[8] S. Krishnaveni, M. hemalatha, “A Perspective Analysis
of Traffic Accident using Data Mining
Techniques”, International Journal of Computer
Applications, Vol. 23(7), pp. 41-48, 2011.
[9] F. Rezaie Moghaddam, Sh. Afandizadeh, M.
Ziyadi, “Prediction of accident severity using
artificial neural networks”, International Journal of
Civil Engineering, Vol. 9(1), pp. 41-49, 2011.
[10] M. Durairaj, T. Sathyavathi, “Applying Rough
Set Theory for Medical Informatics Data Analysis”,
International Journal of Scientific Research in
Computer Science and Engineering, Vol. 1 (5), pp.
1-8 2013.
[11] Badreldin O. S. Elgabbanni, Mohamed Osama
Khozium, Mahmoud Ali Ahmed,” “Combination
prediction model of traffic accident using Rough Set
Technology approach,” International Journal of
Enhanced Research in Science Technology &
Engineering, Vol. 3(1), pp. 47-56, 2014. [12]
Torgeir R. Hvidsten, “A tutorial-based guide to the
ROSETTA system: A Rough Set Toolkit for
Analysis of Data” Edition 1: May, 2006 Edition 2:
April, 2010.
[13] M. Durairaj, K. Meena M. Durairaj and K.
Meena, “Intelligent classification using Rough set
and neural network”, The Icfai University Journal of
Science & Technology, pp. 75-85, 2007.
[14] M. Durairaj and K. Meena, “A Hybrid Approach
of Neural Network and Rough Set Theory for
Prediction of Fertility Rate from IVF
Outcomes”, The Icfai University Journal of
Science &
Technology, Vol. 5(2), pp. 72-82, 2009.
[15] Zbigniew Suraj, “An Introduction to Rough Set
Theory and Its Applications”, ICENCO, 27-30, 2004,
Cairo, Egypt.
[16] M. Durairaj and K. Meena, “A Hybrid Prediction
System Using Rough Sets and Artificial
Neural Networks”, International journal
of innovative technology & creative engineering, Vol.
1(7), pp. 16-23 , 2011.
[17] M. Durairaj, P. Nithya, “Analysis of
Transportation Systems for Reducing the Road
Traffic Using Soft Computing
Techniques”, International Journal of
Scientific Research in Computer Science
Applications and Management Studies, vol. 3(2),
ISSN 2319 – 1953, 2014.
top related