rough sets for analyzing road traffic …. nithya, s. sharmila jeyarani, a. pradeep kumar 377 rough...

15
International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469 P. Nithya , S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya 1, S. Sharmila Jeyarani 2 , A. Pradeep Kumar 3 1 Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 2 Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 3 Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 1 [email protected], 2 [email protected], 3 [email protected] ABSTRACT Rapid growth of population coupled with increased economic activities has favored in tremendous growth of vehicles. This is one of the primary factors responsible for road accidents. In Transportation system each and every data is important for the process of decision making. Traffic accident data analysis is a very big and complex task. The Traffic accident data consists of imprecise, (or) uncertainty, (or) incomplete data. In comparison to traditional techniques, rough set theory gives the optimal result without loss of information through proper analysis. Due to this reason, in this study, it is aimed to classify and detect the causes of traffic accidents that took place in the Tamilnadu in 2012-2014 for 10 cities according to their occurrence reasons, to analyze the classified data and to discover useful knowledge from the database. Rough sets theory, a mathematical tool, used for knowledge discovery which enables us to analyze the accident data in more than one dimension and come out with a reason for the accidents. This enables us to make comments without ignoring any of the accident reasons. ROSETTA software was used to analyze accident data in order to reduce redundant data and extract decision rules. This paper aims to predict the causes of traffic accidents through Rough Sets methodologies. The obtained results help us to relate the causes with accidents which could help authorities and decision makers to take necessary precautions to reduce and prevent the traffic road accident. Keywords: Rough set theory, Traffic Accident data analysis, Rosetta data analysis toolkit

Upload: lamcong

Post on 27-Apr-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya1, S. Sharmila Jeyarani2, A. Pradeep Kumar3

1Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 2Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India 3Asstistant Professor, NPR Arts and Science College, Natham, Dindigul, India

[email protected], [email protected], [email protected]

ABSTRACT

Rapid growth of population coupled with increased economic activities has favored in tremendous

growth of vehicles. This is one of the primary factors responsible for road accidents. In

Transportation system each and every data is important for the process of decision making. Traffic

accident data analysis is a very big and complex task. The Traffic accident data consists of imprecise,

(or) uncertainty, (or) incomplete data. In comparison to traditional techniques, rough set theory gives

the optimal result without loss of information through proper analysis. Due to this reason, in this

study, it is aimed to classify and detect the causes of traffic accidents that took place in the Tamilnadu

in 2012-2014 for 10 cities according to their occurrence reasons, to analyze the classified data and to

discover useful knowledge from the database. Rough sets theory, a mathematical tool, used for

knowledge discovery which enables us to analyze the accident data in more than one dimension and

come out with a reason for the accidents. This enables us to make comments without ignoring any

of the accident reasons. ROSETTA software was used to analyze accident data in order to reduce

redundant data and extract decision rules. This paper aims to predict the causes of traffic accidents

through Rough Sets methodologies. The obtained results help us to relate the causes with accidents

which could help authorities and decision makers to take necessary precautions to reduce and prevent

the traffic road accident.

Keywords: Rough set theory, Traffic Accident data analysis, Rosetta data analysis toolkit

Page 2: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 378

1. INTRODUCTION

A traffic collision occurs when a road vehicle

collides with another vehicle, pedestrian, animal,

geographical or architectural obstacle. It can result

in injury, property damage, and death. Road

accidents have been the major cause of injuries and

fatalities in worldwide for the last few decades. It

is one the serious problem in the world. Due to the

accidents, injuries even deaths also occurred. It is

done beyond our expectation, no one can control it.

But to know about what are the causes of traffic

accidents is must. Through this, it will speed up the

decision making to avoid traffic accidents.

WHO reported that over 1.2 million people die each

year on the world’s roads and 50 million suffer

nonfatal injuries [1]. Road Traffic injuries are one

of the top three causes of death for people aged

between 5 and 44 years. In 2030, as report suggests,

the 5th leading cause of death will be the traffic

accidents and related injuries. Pedestrians, cyclists,

drivers of motorized two – wheelers and their

passengers account for almost half of global road

traffic deaths [1]. A total of 4, 00,517 accidental

deaths were reported in the country during 2013

showing an increase of 1.4% as compared to 2012.

A total of 3, 94,982 accidental deaths were reported

in the country during 2012 showing an increase of

1.0% as compared to 2011 [2]. In Tamilnadu, more

number of road accidents happened on national

highways in 2011 and state high ways in 2012

which primarily depends on Type of Road. Among

the accidents, two wheelers got majority of traffic

accidents in 2011 and 2012. Fault of the driver is

the major causes of traffic accidents in 2010 and

2012 [3]. To predict these causes is important and

also complex task. Traffic data is very informative

but imprecise.

1.3

1.2

1.1

1.2 1.2 1.2

2 6 7

2012 2013 2014

Fig 1. Population in India 2012-2014

The population has been increased every year in

India. In fig 1 describes the population of India. In

2012 and 2013 the population is 1.22 billion, 1.26

billion respectively. The population had been

increased in 2014 of 1.27 billion. This is one of the

causes for accidents.

Fig 2. Road Traffic Accidents in India 2008-2012

Fig 2 shows that accident rate per year between

2008-2012 in India. Among those years, the

accident rate had been increased in 2009. Incidents

of Road accidents have been steadily mounting in

Tamil Nadu from 2008 to 2012. Number of Road

Accidents according to Type of vehicles from 2008

to 2012 is given below.

Table .1 Number of Road Accidents According

to Type of vehicles from 2008 to 2012

37.1

37.9

37.2 37.3 37.4

36.5

37

37.5

38

2008 2009 2010 2011 2012

TRAFFIC ACCIDENTS 2008 - 2012

Page 3: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 379

S.

No

TYPE OF

VEHICLES 2008 2009 2010 2011 2012

1 Bus 9,506 9,331 8890 8295 7479

2 Truck / lorry 11,201 10,555 10,712 10,556 10160

3

Car / Jeep /

Taxi /

Tempo

15,380 15,943 18,038 18,248 19533

4 Two

Wheelers 15,820 17,274 19,086 19,492 21947

5 Three

Wheelers 4,357 3,747 3,777 3,759 3260

6 Others 4,145 3,944 4,493 5,523 5,378

Total 60,409 60,409 60,794 65,873 67,757

The table. 1 reveals that in 2012 two wheelers make

21, 947 accidents than the others. Traditional

techniques are not capable for producing correct

results from the available incomplete / redundant

data through the analysis process. Rough set theory

produces the optimal result without loss of

information from the original set [4]. Rough set

theory is a mathematical approach primarily meant

for imperfect data to acquire perfect knowledge.

The main advantages of the rough set approach is

that it, unlike probability in statistics and

membership function in the fuzzy set theory, does

not require any preliminary or additional

information about data. It provides relatively

efficient methods, algorithms and tools for finding

hidden patterns in data. It allows reducing original

data, i.e. to find minimal sets of data with the same

knowledge as in the original data. It allows

evaluating the significance of data. It allows

generating in automatic way the sets of decision

rules from data. It is easy to understand. It offers

straightforward interpretation of obtained results. It

is suited for concurrent (parallel/distributed)

processing.

Rough sets have been proposed for variety of

applications. This is important for Artificial

Intelligence and cognitive sciences, especially in

machine learning, knowledge discovery, data

mining, expert systems, approximate reasoning and

pattern recognition [15]. In medical, rough set with

neural network algorithm is implemented on

medical data set to test the efficiency of an

algorithm [13].

A methodology of hybrid Rough Neural Network

(RNN) algorithm was proposed for medical data

processing and to predict animal fertility rate [14].

The Rough set theory (RST) is useful tool for

reduce the input to artificial neural network and

improve the classification and prediction of

semen quality [16]. In this work, the traffic

accident datasets are collected from various

sources including daily newspapers for top 10

cities of Tamilnadu. The main objective of this

work is to find the applicability of rough set for

analyzing the causes of road traffic accidents in

order to preventing and controlling vehicle

accidents.

The proposed rough set theory based reduct set

gives the information about the accident causes.

The ROSETTA data analysis tool contains

structures and different processing algorithms. In

this work, we use Johnson reduction algorithm

which produces optimal results without loss of

information in original set.

The section 2 contains a review of the literature.

Section 3 describes the methodology applied for

the analysis process. Section 4 deals with the

experimentation and section. 5 reveal results and

discussion. Section 6 ends with the conclusion

and future direction.

2. RELATED APPLICATIONS

Page 4: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 380

To analyze the causes of traffic accidents various

methods and techniques are used. There are many

research articles that analyze and predict the

causes of traffic accidents. Some of the literature

which shows the work on traffic analysis and

prediction are described in this section.

Olutayo V.A, et al. [5] employed Artificial Neural

Networks and Decision Trees techniques to

discover hidden information from historical data on

accidents in one of Nigeria’s busiest roads to reduce

carnage on highways. Data of accidents records on

the first 40 kilo meters from Ibadan to Lagos were

collected from Nigeria Road Safety Corps. The

data were organized into continuous and categorical

data. The continuous data were analyzed using

Artificial Neural Networks technique and the

categorical data were analyzed using Decision

Trees technique. Sensitivity analysis was performed

and irrelevant inputs were eliminated. The

performance measures used to determine the

performance of the techniques include Mean

Absolute Error (MAE), Confusion Matrix,

Accuracy Rate, True Positive, False Positive and

Percentage correctly classified instances. Results

reveal that the Decision Tree approach

outperformed Artificial Neural Network with a

lower error rate and higher accuracy rate. This work

concludes that the three most important causes of

accidents are tyre burst, loss of control and over

speeding.

Khair S. Jadaan, et al. [6] developed a traffic

accident prediction model using Artificial Neural

Network (ANN) simulation. Artificial Neural

Networks (ANN) is a novel approach which proved

to be successful in solving engineering problems

and researchers. The author aims to identify its

suitability for predicting traffic accidents under

Jordanian conditions. Jordan is a developing

country and it has high and growing level of traffic

accidents resulting in more than 13000 fatalities

between 1989 and 2012 with an average annual cost

of over $500 million. They used MATLAB

software for developing the traffic accidents

prediction model. The results demonstrated that the

estimated traffic accidents, based on sufficient data,

are close enough to actual traffic accidents.

S. Vigneswaran, et al. [7] proposed a model to

predict the severity of injury that occurred during

traffic accidents using two machine-learning

approaches. Machine learning is concerned with

design and development of algorithms. Machine

learning technique is primarily used recognize the

complex patterns and make intelligent decisions

based on data. Their work was focused on

recognizing traffic accident patterns based on

transport data of Government of Hong Kong. The

dataset used for this study contains traffic accident

records of 2008, a total number of 34,575 cases.

The dataset contains drivers’ records only and does

not include passengers’ information. They used

Weka knowledge explorer for detecting the causes

of traffic accidents. This work compared Naive

Bayesian classifier and J48 decision tree classifier

for classifying the type of injury severity of

various traffic accidents and the results show that

J48 outperforms Naïve Bayesian.

S. Krishnaveni, et al. [8] developed a model to

predict the severity of injury that occurred during

traffic accidents based on classification models.

They also used transport Government of Hong

Kong dataset 2008. The experimental work is

carried out through the Weka knowledge explorer.

The author compared Naive Bayes Bayesian

classifier, AdaBoostM1 Meta classifier, PART

Page 5: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 381

Rule classifier, J48 Decision Tree Classifier and

Random Forest Tree classifier for classifying the

type of injury severity of various traffic accidents.

The final result shows that the Random Forest

outperforms than other four algorithms.

F. Rezaie Moghaddam, et al. [9] used artificial

neural network (ANN) to predict the accident

severity. ANN approach has been utilized for crash

severity prediction in urban highways and

identifying significant crash-related factors. The

models to illustrate the simultaneous influence of

human factors, road, vehicle, weather conditions

and traffic features including traffic volume and

flow speed on the crash severity in urban highways.

The results illustrate that the variables such as

highway width, head-on collision, type of vehicle at

fault, ignoring lateral clearance, following distance,

inability to control the vehicle, violating the

permissible velocity and deviation to left by drivers

are the most significant factors that increase crash

severity in urban highways.

M. Durairaj, et al. [10] proposed an intelligent

technique which uses Rough set theory for

analyzing the imprecise medical data. ROSETTA

data analysis tool was used for analyzing tabular

data. The rough set framework uses Johnson

algorithm and Genetic algorithm for reduction.

Johnson algorithm gave accurate results than the

Genetic algorithm. In this work, the influential

parameters on predicting the success rate of IVF

treatment was identified using rough set

applications. This work concluded that the standard

voting classifier gave more accuracy than other

reduct algorithms. This work also observed that the

Rough set theory based ROSEETA tool is an

efficient tool for processing redundant / inaccurate

data.

Badreldin O, et al. [11] designed an accident

forecasting model for decision making and planning

before the accidental loss occur. They used rough

sets theory and the weighting coefficient of all the

forecast models and observed the results of

forecasting was more exact based on the mean

relative absolute error. Rough set model scored the

lowest reading which was 0.51% compared to

9.16% for ARMA, 14.41.2% for Expert and 8.38%

for Neural Network. The experimental results of

traffic accidents prediction showed that the

performance of the proposed RST combination is

more precise and accurate compared to other

models like ARMA, Expert and Neural Network.

In this work, we use Rough sets for analyzing

traffic data for predicting accident causes which

can be used for decision making and prevention.

3. METHODOLOGY

A. Rough Set Theory

Rough Sets (RS) theory proposed by Zdzislaw

Pawlak in 1982 [4]. This methodology is

concerned with the classification and analysis of

imprecise, uncertain or incomplete information. It

is one of the first non-statistical approaches in data

analysis. Rough set methodology is more efficient

and excellent mathematical technique. In

comparison with traditional techniques, rough set

theory giving the optimal result from the analysis

process without loss of information in original set.

Rough set theory consists of classification,

reduction, rule generation and feature selection

methods. These methodologies handle uncertainty

or imperfect data.

Page 6: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 382

B. Basic Concepts of Rough Set Theory

RS mathematical tool is introduced to treat the

vague and the imprecise data. Rough Set concept

can be defined quite generally by means of interior

and closure topological operations known as

approximation. A set is a collection of similar

characteristics of objects, relations, and functions.

A set of objects that possesses the similar

characteristics it is a fundamental part of

mathematics. A set is considered to be grouping

without all elements are absent and is known as an

empty set. The table 2 illustrates that the concepts

of rough set theory.

Table 2. Concepts of Rough set theory

S. No CONCEPTS OF ROUGH SET

THEORY

1.

2.

3.

4.

4.

5.

6.

Information System

Indiscernibility Relation

Approximations

Decision Table and decision Algorithms

Dependency of Attributes

Reduction of Attributes

Accuracy

C. Information System

In RST, data sets are shown as a table format. Each

row in table represents one situation of an object

and each column represents attributes about that

object. Tables which are created in this way are

called as information system. An information

system consists of universal set, U and attribute set

A. For, all attributes describe an information

function.

IS = (U, A)

U={x1, x2, x3 ….. xn}

IS: Information System

Xi: ith variable

For variables α ε A;

Fa: describes an information function as U → Va.

Va: a shows attribute sets including an attribute; it

is also called as “domain”.

D. Indiscernibility Relation

Indiscernibility Relation is a central concept in

Rough Set Theory, and is considered as a relation

between two objects or more, where all the values

are identical in relation to a subset of considered

attributes. In this situation we use notation as

follows: A = (U, A) for each attribute.

Indiscernibility relation is shown as IND A (B).

Ind: Indiscernibility relation. For B sub attribute

sets in A attribute sets. B (xi) =B (xj) →objects xi

and xj are indiscernible. Because Ind (B)

represents the minimal group of object, B

attribute set is called as basic set of the object.

E. Approximations

Approximations is an another concept in rough

set theory, being associated with the meaning of

the approximations topological operations. The

lower and the upper approximations of a set

interior and closure operations in a topology

generated by the Indiscernibilty relation.

An approximation space is an ordered pair A= (U,

R), where U is a finite and non-empty set of

elements called attributes R is an equivalence

relation about U.

Any set B⊆A there is an associated equivalence

relation called B indiscernibility relation.

𝐈𝐍𝐃A (𝐁) = {(𝐱, 𝐲) ∈ 𝐔𝟐 | ∀𝐚 ∈ 𝐁, 𝐚(𝐱) = 𝐚(𝐲)}

………. (1)

Page 7: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 383

If (x, y) ∈ INDA (B), then x and y are indiscernible

from each other by attributes from b. The

Indiscernibility is an equivalence relation.

F. Lower Approximation (B*)

Lower approximation (B*) is a description of the

domain objects that are known with certainty

belong to the subset of interest.

𝐁* (𝐗) = ∪ {𝒀 ∈ 𝑼| 𝑰𝑵𝑫 (𝑷): 𝒀 ⊆ 𝑿} ………. (2)

G. Upper Approximation B* (X)

Upper Approximation is a description of the objects

that possibly belong to the subset of interest. The

Upper Approximation of a set X regarding R is the

set of all of objects which can be possibly classified

with X regarding R.

𝑩∗(𝑿) = ∪ {𝒀 ∈ 𝑼| 𝑰𝑵𝑫 (𝑷): 𝒀 ∩ 𝑿 ≠ 𝝋}

………. (3)

H. Boundary Region

The difference between upper and lower

approximation referred to as boundary region. The

B- boundary of X in the information system I, is

defined as:

𝑩𝑵𝑫 (𝑿) = 𝑩∗(𝑿) – 𝐁* (𝐗) ………. (4)

I.Decision Table And Decision Algorithms

A decision table contains two types of attributes

designated as the condition attribute and decision

attribute. Each row of the table determines a

decision rule which specifies the decisions that must

be taken when conditions are indicated by condition

attributes. The condition attributes defines the

decision attributes. Some cases two or more

condition attribute values ars consist of same or

similar values. But the decision attribute values are

differing. These set of decision rules are known as

either inconsistency, non- determinant or

conflicting.

The number of consistency rules ccontained in the

decision table known as a factor of consistence,

which can be denoted by γ (C, D), where C is the

condition attribute and D is decision attribute. If 𝛾

(𝐶, 𝐷) = 1, the decision table is consistent but if 𝛾

(𝐶, 𝐷) ≠ 1, the decision table is inconsistent. A set

of decision rules is designated as decision

algorithms, because for each decision table it can be

associated with the decision algorithms. It may be

made distinction between decision algorithm and

decisin table. A decision table is a data set, where

as a deision algorithm is a collection of implications

that is logical expressions.

J. Dependency of Attributes

Analysis of data dependency of attributes is to

discover the dependence between attributes. A set

of attributes D depends totally on a set of attributes

C, denoted as 𝐶 => 𝐷 if all values of attributes

from D are uniquely determined by values of

attributes from C then D depends totally on C. The

partial dependency means that only some values of

D are determined by values of C. If D and C is

subsets of A can be affirmed that D depends on C

in degree 𝐾 ( 0 ≤ 𝑘 ≤ 1) denoted => kD. And if

𝑘 = 𝛾 (𝐶, 𝐷). if J = 1 then D depends totally on C.

This dependency denoted by I (C) ⊆ I (D). If 𝑘 <

1 it is said that D depends partially on C. if k = 0

the decision attribute D does not depends on

condition attributes C.

Page 8: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 384

K. Reduction of Attributes

The process of reducing an informtion system such

that the set of attributes of reduced information

system is independent and no attribute can be

eliminated further without losing some

information from the system is known as reduct.

Reduct is a minimum attributes subset that retains

the decision attributes dependence degree to

conditional attributes. The R ⊆ 𝑩 ⊆ 𝑨 such that

𝑌B (𝑌) =

𝑌R (𝑌) is called Y-reduct of B denoted as 𝑅𝑒𝑑Y (𝐵)

The core is possessed by every legimate reduct and

cannot be removed from the information system

without deteriorating basic knowledge of the

system.

The set of all indispensible attributes of B is called

Y- Core. Formally

𝒄𝒐𝒓𝒆Y (𝑩) = ∩ 𝑹𝒆𝒅Y (𝑩) ………. (8)

The Y-core is intersection of all Y-reducts of B

included in every Y-reducts of B.

L. Accuracy

Accuracy measures how much a set is rough . If a

set has B(X) = B(X) = X, the set is precise called

crisp and for every element x ∈ X ∈ U. This is

expressed by the formula.

ɑB(𝑿) = | 𝑩* (𝑿) / 𝑩∗(𝑿)| ………. (9)

when 0 ≤ ɑB(X) ≤ 1 , and if ɑB(X) = 1 X is crisp

with respect to B.

4. Application Of Rough Set For Analyzing

Traffic Data

`

A. Traffic Accident Dataset

The popularity of the vehicles and subsequent road

traffics are increasing steadily in the today’s

modern world. As a result of growing vehicles,

transportation related problems such as traffic

congestion, traffic accident and environmental

pollution rise alarmingly [17]. To identify the

causes of accident is very important to avoid

accidents. Here, The dataset for the study contains

traffic accident records of 2012 - 2014. The traffic

data collected from different sources including

online materials. According to the variable

definitions for dataset, this dataset has accident

records only and does not include passengers’

information. The Traffic accident datasets contain

7 fields and 100 objects (or) records. Table 2

shows the attributes of traffic accident data.

Table. 2. Road Traffic Accident Attributes

SNO ATTRIBU

TES

VALUES

1. City Name 1-9

1-Chennai,

2-Coimbatore,

3-Madurai,

4-Trichy,

5-Salem,

6-Thirunelveli,

7-Trippur,

8-Erode,

9- Vellore,

10- Tuticorin

2.

No. of

Persons

Killed 0-N 0……………….N

3.

No. of

Persons

Injured 0-N 0…………….N

4.

No. of

Vehicle

Involved 0-N 0…………….N

5. Severity 1,2 Slight, 2- Serious

Page 9: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 385

6. Causes 1-9

1-overturn, 2-Hit, 3- loss

of control, 4- over Speed,

5-mishap, 6-break failure,

7- Drunk and drive, 8-

tyre burst, 9- collision

7. Major

Cause 1,2 1-Driver Fault, 2-

Machine Fault

B. Data Analysis

ROSETTA (A Rough Set Toolkit for

Analysis the Data)

ROSETTA is a toolkit for analyzing data table

within an essential supporting or underlying

structure of rough set theory. ROSETTA is

especially designed to support the overall data

mining and knowledge discovery process (KDD).

Data mining is the process of extraction of hidden

information from large databases. KDD process

contains data cleaning, data integration, data

selection, data transformation, data mining, pattern

evaluation, knowledge presentation.

RS performs the basic operation such as

preprocessing, data cleaning, data splitting, data

discretization, data reduction, rule generation, and

classification.

Steps involved in processing data are:

a) Import/export

b) Preprocessing

c) Computation

d) Post processing

e) Validation and analysis

The figure 1 illustrates the proposed

framework for analyzing traffic data using

Rosetta tool. Rough set algorithms are

existed in ROSETTA toolkit to process

data analysis. This algorithm produce

accurate result compared with traditional

techniques. The steps are importing data

from any valid data source (excel) format,

applying the binary splitting algorithm in

the imported data to split the original

dataset into training and test data, filling

the mean value, finally applying the

reduction and classification algorithms.

The reduction algorithm is used to

compute the reduct set and the

classification algorithm is used to device

reduct rule and compute the classification

result.

The Rosetta tool contains many algorithms

for reduction, such as Genetic algorithm, Johnson

algorithm, Holte’s IR algorithm, and Manual

reducer algorithm, etc. In this process, we used the

Johnson algorithm for reduction in the accident

data. Because it produces the optimal reduct set

comparing other algorithms from large datasets.

And also it eliminates the unnecessary attributes in

the original dataset without loss.

JOHNSON ALGORITHM

“ Johnson (CA, Df)

CA - set of conditional attributes

Df - Discernibility function”

(1) R = Ø; bestc=0;

(2) While (Df! = empty)

(3) For each a ε CA that appears in Df

(4) c = heuristic (a)

(5) If (c > bestc )

(6) Bestc=c; // bestAttr_ a

(7) R = R ᴜ a

(8) Df_ remove Clauses (Df, a)

(9) Return R

Page 10: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 386

Fig. 2. Johnson Algorithm

This is a simple greedy heuristic algorithm and it

is applied to discernibility functions to find a

single reduct. The algorithm begins by setting the

current reduct candidate, R, to the empty set.

Then, each conditional attribute appearing in the

discernibility function is evaluated according to

the heuristic measure. This algorithm count of the

number of appearances an attribute makes within

clauses.

The highest heuristic value of an attribute is added

to the reduct. Discernibility functions containing

this attribute are removed in all clauses. Then all

clauses have been removed, the algorithm

terminates and returns the reduct R.

S No Reduct Support Length

1 {causes} 100 1

2 {causes} 100 1

3 {causes} 100 1

4 {causes} 100 1

5 {causes} 100 1

6 {causes} 100 1

7 {causes} 100 1

8 {causes} 100 1

9 {causes} 100 1

10 {causes} 100 1

11 {causes} 100 1

12 {causes} 100 1

13 {causes} 100 1

14 {causes} 100 1

15 {Severity} 100 1

16 {Severity} 100 1

17 {causes} 100 1

18 {causes} 100 1

19 {causes} 100 1

20 {Severity} 100 1

21 {causes} 100 1

22 {causes} 100 1

Table 3.Johnson Reduction Output

Johnson reduction algorithm applied to predict the

Traffic accident causes which can be used to

estimate the faults. Johnson reduction algorithm

gave 22 combinations of reduct set. It also gave a

minimal set of combinations effectively. The

output of Johnson reduction algorithm is as shown

in table

3.

5. Experimental Results

The majority of the traffic accidents are caused by

the vehicles which are crossing the road

unlawfully[17]. The traffic data set used in this

work contains attributes such as City name, No. of

person killed, No. of persons injured, No. of

vehicles involved, Severity, Causes and Main

cause. The reduction rule explains the rule LHS

support, RHS support, RHS accuracy, LHS

coverage, RHS coverage, RHS stability, LHS

length and RHS Length. Each row of the reduction

rule is called

descriptors. The left hand side and right hand rule LHS Coverage: LHS support divided are called

antecedent and consequent respectively. by the number of objects in the training This resulted reduction rule

is used for classification set.

Page 11: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 387

process. The reduction rules generated is as RHS Coverage: RHS Support divided illustrated in table 4. by

the number of objects in the

For each rule, the following statistics are given: decision class listed in the THEN part LHS support:

Number of objects in of the rule.

the training set matching the IF-part. RHS Stability: Not applicable for the RHS support:

Number of objects in Johnson algorithm (always 1.0).

the training set matching the IF-part LHS Length: Number of attributes in and the THEN-part

(LHS and RHS the IF-part of the rule.

support is the same unless the THEN- RHS Length: Number of decisions in part contains

several decisions). the THEN-part [12].

RHS Accuracy: RHS support divided by LHS

support (Accuracy is 1.0

unless the THEN-part contains several decisions). S.NO RULE LHS S RHS S RHS

ACC LHS

COV RHS COV RHS ST LHS L RHS L

1 causes(Overturn) => Main

Cause(Driver Fault)

2 2 1.0 0.04 0.066667 1.0 1 1

2 causes(Hit) => Main

Cause(Driver Fault)

7 7 1.0 0.14 0.233333 1.0 1 1

3 causes(hit) => Main

Cause(Driver Fault)

11 11 1.0 0.22 0.366667 1.0 1 1

4 causes(Loss of Control) =>

Main Cause(Machine Fault)

1 1 1.0 0.02 0.05 1.0 1 1

5 causes(mishap) => Main

Cause(Driver Fault)

3 3 1.0 0.06 0.1 1.0 1 1

6 causes(Driver Fault) => Main

Cause(Driver Fault)

1 1 1.0 0.02 0.033333 1.0 1 1

7 causes(Over Speed) => Main

Cause(Driver Fault)

1 1 1.0 0.02 0.033333 1.0 1 1

8 causes(Break Failure) => Main

Cause(Machine Fault)

5 5 1.0 0.1 0.25 1.0 1 1

9 causes(Collision) => Main

Cause(Machine Fault)

12 12 1.0 0.24 0.6 1.0 1 1

10 causes(Mishap) => Main

Cause(Driver Fault)

1 1 1.0 0.02 0.033333 1.0 1 1

Page 12: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 388

Rosetta has several algorithms for classification. Those

algorithms classify all objects in the decision table. The

classifiers are such as naive bayes, batch classifier, and

standard voting classifier and object voting classifier. In this

work, we used the naive bayes classifier for classifying the

road accident labels. The reduction rule will help to classify

the result. A Naive Bayesian classifier is a simple

probabilistic classifier based on applying Bayesian theorem

(from Bayesian statistics) with strong (naive) independence

assumptions. The reduction rule is used to make the

confusion matrix.

Table 5. Confusion matrix

The confusion matrix is as depicted in table 5 and

generated confusion matrix is as depicted in table 6.

Confusion matrix is a table layout that gives the

performance of an algorithm. Each column represents

the instance of the predicate class. Each row

represents the instance of the actual class. It is also

called error matrix or contingency table. Those

contain two classes of positive and negative

observations,

i. false positives (FP) are negative observations

classified to the positive class, ii. false negatives

Page 13: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 389

(FN) are positive observations classified to the negative

class,

iii. true positives (TP) are correctly classified

positive observations and

iv. True negatives (TN) are correctly classified

negative observations.

Sensitivity and specificity are the fractions of correctly

classified positive and negative observations, respectively

(i.e. TP/ (TP+FN) and TN/(TN+FP)).

PREDICTED

ACTUAL

DRIVER

FAULT

MACHIN

E FAULT

DRIVER FAULT 31 1 0.96875

MACHINE FAULT 0 17 1.0

1.0 0.944444 0.979592

THR. (0, 1) 0.628

THR.ACC.

0.628

Table 6. Generated Confusion Matrix

0

1

2

3

4

5

6

Series 1

Series 2

Series 3

Page 14: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 390

The classification result of TPR and FPR are obtained

as given below:

Sensitivity (True Positive Rate) TPR = TP /

(TP+FN)

1-Specificity (False Positive Rate) FPR = TN /

(TN+FP)

PPV = TP / (TP+FP)

FPV = TN / (TN+FN)

Accuracy = (TP+TN) / (TP+FP+TN+FN)

Sensitivity for actual = TP/(TP+FN) =

31/(31+0)= 0.96875

Specificity for actual = TN/(TN+FP)

=17/(17+0) = 1.0

Sensitivity for Predicted = TP/ (TP+FP) =

31/ (31+0) =1.0

Specificity for actual = TN/(TN+FN) = 17/

(17+1) =0.944444

Accuracy = (TP+TN)/(TP+FP+TN+FN) =

0.979592

Then the result shows that navie bayes algorithm gave

the accuracy of 97% in predicting causes of an

accident. The receiver operating characteristic

(ROC) curve is constructed by plotting sensitivity

against specificity for the full range of possible

threshold values. This is created by plotting the TPR

and FPR at various threshold settings. The graph

drawn on actual and predicted result is depicted in

figure 3, which indicates the higher accuracy level of

rough sets analysis in predicting causes of accidents

on traffic data sets.

Fig. 3. ROC curve drawn on actual and predicted

resultConclusion Economic growth in developing

country like India resulted in growth of vehicle

manufacturing sectors. The growth of vehicle

manufacturing and increase in purchase power of

people ultimately resulted in increased road traffic.

The traffic accidents occurred every minute

particularly in the

developing countries. To detect the causes

of accidents is very important, which helps us to

reduce / avoid traffic accidents in some extent.

Rough set is one of the tools which could be

efficiently applied for processing large data sets. In

this paper, we describe the Rough set approaches for

detecting and analyzing the causes of an accident.

The techniques such as data

cleaning, data reduction,

rule generation, and classification are used

in this work.

We used Johnson algorithm for reduction and Navies

bayes for classification. Johnson algorithm gave

minimal reduct set without loss of information of

original data set. Navie bayes gave the 93% of

accuracy in prediction. From this work, we conclude

that Driver Fault is the major cause of Traffic accident.

REFERENCES

[1] www.who.int

[2] www.ncrb.gov.in

[3] www.tn.gov.in

[4] Z. Pawlak, “Rough Sets”, International Journal

of Computer and Information Sciences, Vol.11,

pp.

341-356 (1982).

[5] Olutayo V.A, Eludire A.A, “Traffic Accident

Analysis Using Decision Trees and Neural

Networks”,

0.92 0.94 0.96 0.98

1 1.02

0.94 0.96 0.98 1 1.02

False Positive Rate

ROC GRAPH

Roc plot for actual

Roc Plot for Prdicted

Page 15: ROUGH SETS FOR ANALYZING ROAD TRAFFIC …. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 377 ROUGH SETS FOR ANALYZING ROAD TRAFFIC ACCIDENTS P. Nithya1, S. Sharmila Jeyarani2, A

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

P. Nithya, S. Sharmila Jeyarani, A. Pradeep Kumar 391

I.J. Information Technology and Computer Science, Vol. 2,

pp. 22-28, 2014.

[6] Khair S. Jadaan, Muaath Al-Fayyad, and Hala F.

Gammoh, “ Prediction of Road Traffic Accidents in Jordan

using Artificial Neural Network (ANN)”, Journal of Traffic

and Logistics Engineering Vol.

2,(2), 2014

[7] S. Vigneswaran1, A. Arun Joseph, E.

Rajamanickam, “Efficient Analysis of Traffic

Accident Using Mining Techniques”, Intentional

journal of hardware and software in Engineering,

Vol. 2(3) , pp. 110-118, 2014.

[8] S. Krishnaveni, M. hemalatha, “A Perspective Analysis

of Traffic Accident using Data Mining

Techniques”, International Journal of Computer

Applications, Vol. 23(7), pp. 41-48, 2011.

[9] F. Rezaie Moghaddam, Sh. Afandizadeh, M.

Ziyadi, “Prediction of accident severity using

artificial neural networks”, International Journal of

Civil Engineering, Vol. 9(1), pp. 41-49, 2011.

[10] M. Durairaj, T. Sathyavathi, “Applying Rough

Set Theory for Medical Informatics Data Analysis”,

International Journal of Scientific Research in

Computer Science and Engineering, Vol. 1 (5), pp.

1-8 2013.

[11] Badreldin O. S. Elgabbanni, Mohamed Osama

Khozium, Mahmoud Ali Ahmed,” “Combination

prediction model of traffic accident using Rough Set

Technology approach,” International Journal of

Enhanced Research in Science Technology &

Engineering, Vol. 3(1), pp. 47-56, 2014. [12]

Torgeir R. Hvidsten, “A tutorial-based guide to the

ROSETTA system: A Rough Set Toolkit for

Analysis of Data” Edition 1: May, 2006 Edition 2:

April, 2010.

[13] M. Durairaj, K. Meena M. Durairaj and K.

Meena, “Intelligent classification using Rough set

and neural network”, The Icfai University Journal of

Science & Technology, pp. 75-85, 2007.

[14] M. Durairaj and K. Meena, “A Hybrid Approach

of Neural Network and Rough Set Theory for

Prediction of Fertility Rate from IVF

Outcomes”, The Icfai University Journal of

Science &

Technology, Vol. 5(2), pp. 72-82, 2009.

[15] Zbigniew Suraj, “An Introduction to Rough Set

Theory and Its Applications”, ICENCO, 27-30, 2004,

Cairo, Egypt.

[16] M. Durairaj and K. Meena, “A Hybrid Prediction

System Using Rough Sets and Artificial

Neural Networks”, International journal

of innovative technology & creative engineering, Vol.

1(7), pp. 16-23 , 2011.

[17] M. Durairaj, P. Nithya, “Analysis of

Transportation Systems for Reducing the Road

Traffic Using Soft Computing

Techniques”, International Journal of

Scientific Research in Computer Science

Applications and Management Studies, vol. 3(2),

ISSN 2319 – 1953, 2014.