ieee transactions on industrial informatics 1...

10
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 SafeDrive: Online Driving Anomaly Detection from Large-Scale Vehicle Data Mingming Zhang, Chao Chen, Member, IEEE, Tianyu Wo, Member, IEEE, Tao Xie, Senior Member, IEEE, Md Zakirul Alam Bhuiyan, Member, IEEE, Xuelian Lin, Member, IEEE Abstract—Identifying driving anomalies is of great significance for improving driving safety. The development of the Internet- of-Vehicles (IoV) technology has made it feasible to acquire big data from multiple vehicle sensors and such big data play a fundamental role in identifying driving anomalies. Existing approaches are mainly based on either rules or supervised learning. However, such approaches often require labeled data, which are typically not available in big data scenarios. In addition, because driving behaviors differ under vehicle statuses (e.g., speed, gear position), to precisely model driving behaviors needs to fuse multiple sources of sensor data. To address these issues, in this article, we propose SafeDrive, an online and status- aware approach, which does not require labeled data. From a historical data set, SafeDrive statistically offline derives a state graph (SG) as a behavior model. Then SafeDrive splits the online data stream into segments, and compares each segment with the SG. SafeDrive identifies a segment that significantly deviates from the SG as an anomaly. We evaluate SafeDrive on a cloud- based IoV platform with over 29,000 real connected vehicles. The evaluation results demonstrate that SafeDrive is capable of identifying a variety of driving anomalies effectively from a large- scale vehicle data stream with an overall accuracy of 93%; such identified driving anomalies can be used to timely alert drivers to correct their driving behaviors. Index Terms—Big Data, Driving Behavior, State Graph, Internet-of-Vehicles, OBD, Anomaly, Data Stream I. I NTRODUCTION I DENTIFYING abnormal driving behaviors is known to be an important research focus due to its significant influence on people’s daily life. Apart from the impact on fuel consump- tion [1], driving behaviors also play a key role in transportation safety [2]. With in-vehicle sensing and Internet-of-Vehicles (IoV) technologies, we are capable of collecting abundant driving data, such as speed and engine parameters, from a large number of vehicles. Such data are characterized as large volume, multi-frequency, and multi-source, which largely re- flect the vehicle status and thereby are widely used to evaluate driving behaviors. For example, insurance companies provide a new ‘pay-as-you-drive’ service to customers by collecting their driving data and judging their driving behaviors [3]. With the collected data, fleet-operating companies regulate their drivers to behave more properly, lowering the accidental risk and fuel consumption. These applications have inspired previous research on iden- tifying driving anomalies. Rule-based techniques are often Tianyu Wo is the corresponding author, Email: [email protected] Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. 0 0.5 1 1.5 2 2.5 0-20 20-40 40-60 60-80 80-100 100-120 120-140 Average Acceleration (m/s 2 ) Vehicle Speed (km/h) Fig. 1. Accelerations when driving at different speed ranges. It can be ob- served that acceleration of 1m/s 2 when the vehicle speed exceeds 100km/h would be abnormal, while the same acceleration would be normal when the speed is lower than 60km/h. adopted to extract abnormal driving behaviors due to its simplicity and high efficiency [4], [5]. Supervised-learning- based techniques are another kind of popular solution. With predefined abnormal patterns and manually labeled training data, a classifier can be trained and further used to identify similar patterns [6], which are marked as anomalies. Such techniques basically require manually labeled training data, where fixed behavioral definitions such as patterns and rules (e.g., fast acceleration) need to be predefined. However, the prior research cannot effectively identify ab- normal driving behaviors for three main reasons. First, in IoV, the volume of data is huge. They are collected from multiple sensors and are with complicated relations, making it infeasible to label normal and abnormal behaviors. Second, the process of manually labeling the huge volume of the data stream can be difficult and biased because abnormal driving behaviors can be uncertain and human perceptions can be error-prone. Third, whether a driving behavior is abnormal or not is heavily dependent on the current vehicle status (e.g., speed, gear position). For instance, Fig. 1 shows the relationship between acceleration behaviors and vehicle speed statuses. It can be observed that drivers would normally accelerate more slowly when driving at high speed. Such behavior is a kind of contextual-status-related behaviors. As a comparison, another behavior can be observed in the relations between different types of data, and such behavior is a kind of correlational-status-related behaviors. To effectively identify driving anomalies requires consider- ing such detailed status-related characteristics. Moreover, the evaluation criterion for anomaly detection should be based on objective data instead of subjective judgment. To address

Upload: others

Post on 29-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1

SafeDrive: Online Driving Anomaly Detection fromLarge-Scale Vehicle Data

Mingming Zhang, Chao Chen, Member, IEEE, Tianyu Wo, Member, IEEE, Tao Xie, Senior Member, IEEE,Md Zakirul Alam Bhuiyan, Member, IEEE, Xuelian Lin, Member, IEEE

Abstract—Identifying driving anomalies is of great significancefor improving driving safety. The development of the Internet-of-Vehicles (IoV) technology has made it feasible to acquirebig data from multiple vehicle sensors and such big data playa fundamental role in identifying driving anomalies. Existingapproaches are mainly based on either rules or supervisedlearning. However, such approaches often require labeled data,which are typically not available in big data scenarios. Inaddition, because driving behaviors differ under vehicle statuses(e.g., speed, gear position), to precisely model driving behaviorsneeds to fuse multiple sources of sensor data. To address theseissues, in this article, we propose SafeDrive, an online and status-aware approach, which does not require labeled data. From ahistorical data set, SafeDrive statistically offline derives a stategraph (SG) as a behavior model. Then SafeDrive splits the onlinedata stream into segments, and compares each segment withthe SG. SafeDrive identifies a segment that significantly deviatesfrom the SG as an anomaly. We evaluate SafeDrive on a cloud-based IoV platform with over 29,000 real connected vehicles.The evaluation results demonstrate that SafeDrive is capable ofidentifying a variety of driving anomalies effectively from a large-scale vehicle data stream with an overall accuracy of 93%; suchidentified driving anomalies can be used to timely alert driversto correct their driving behaviors.

Index Terms—Big Data, Driving Behavior, State Graph,Internet-of-Vehicles, OBD, Anomaly, Data Stream

I. INTRODUCTION

IDENTIFYING abnormal driving behaviors is known to bean important research focus due to its significant influence

on people’s daily life. Apart from the impact on fuel consump-tion [1], driving behaviors also play a key role in transportationsafety [2]. With in-vehicle sensing and Internet-of-Vehicles(IoV) technologies, we are capable of collecting abundantdriving data, such as speed and engine parameters, from alarge number of vehicles. Such data are characterized as largevolume, multi-frequency, and multi-source, which largely re-flect the vehicle status and thereby are widely used to evaluatedriving behaviors. For example, insurance companies provide anew ‘pay-as-you-drive’ service to customers by collecting theirdriving data and judging their driving behaviors [3]. With thecollected data, fleet-operating companies regulate their driversto behave more properly, lowering the accidental risk and fuelconsumption.

These applications have inspired previous research on iden-tifying driving anomalies. Rule-based techniques are often

Tianyu Wo is the corresponding author, Email: [email protected] (c) 2009 IEEE. Personal use of this material is permitted.

However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

0

0.5

1

1.5

2

2.5

0-20 20-40 40-60 60-80 80-100 100-120 120-140

Av

erag

e A

ccel

erat

ion

(m

/s2)

Vehicle Speed (km/h)

Fig. 1. Accelerations when driving at different speed ranges. It can be ob-served that acceleration of 1m/s2 when the vehicle speed exceeds 100km/hwould be abnormal, while the same acceleration would be normal when thespeed is lower than 60km/h.

adopted to extract abnormal driving behaviors due to itssimplicity and high efficiency [4], [5]. Supervised-learning-based techniques are another kind of popular solution. Withpredefined abnormal patterns and manually labeled trainingdata, a classifier can be trained and further used to identifysimilar patterns [6], which are marked as anomalies. Suchtechniques basically require manually labeled training data,where fixed behavioral definitions such as patterns and rules(e.g., fast acceleration) need to be predefined.

However, the prior research cannot effectively identify ab-normal driving behaviors for three main reasons. First, inIoV, the volume of data is huge. They are collected frommultiple sensors and are with complicated relations, making itinfeasible to label normal and abnormal behaviors. Second,the process of manually labeling the huge volume of thedata stream can be difficult and biased because abnormaldriving behaviors can be uncertain and human perceptionscan be error-prone. Third, whether a driving behavior isabnormal or not is heavily dependent on the current vehiclestatus (e.g., speed, gear position). For instance, Fig. 1 showsthe relationship between acceleration behaviors and vehiclespeed statuses. It can be observed that drivers would normallyaccelerate more slowly when driving at high speed. Suchbehavior is a kind of contextual-status-related behaviors. As acomparison, another behavior can be observed in the relationsbetween different types of data, and such behavior is a kindof correlational-status-related behaviors.

To effectively identify driving anomalies requires consider-ing such detailed status-related characteristics. Moreover, theevaluation criterion for anomaly detection should be basedon objective data instead of subjective judgment. To address

Page 2: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2

the preceding issues, in this article, we propose SafeDrive,an online and status-aware approach for detecting drivinganomalies. SafeDrive does not require costly labeled data, byemploying a State Graph (SG). SafeDrive fuses data on boththe vehicle-sensor level and fleet level; such data preciselyreflect the normal driving styles. For the online detection,SafeDrive compares the real-time driving data stream with theSG to detect anomalies. SafeDrive includes novel techniques toaddress two main challenges: (1) uniformly modeling a varietyof vehicle statuses represented by complex data relations; (2)capturing how people normally drive based on the modeledrelations between statuses.

In particular, SafeDrive includes a SG to model (1) contex-tual relations between statuses of the same type of data, suchas speed, at different timings and (2) correlational relationsbetween statuses of different types of data, such as the vehicleRevolutions Per Minute (RPM) and gear position, at the sametiming. SafeDrive represents all the statuses as states andconnected with edges in the graph. To construct an objectivedriving model, SafeDrive fuses different vehicles’ historicaldata together and statistically calculates the structure of SG.In the online setting, SafeDrive identifies driving anomaliesby splitting the data stream into segments and comparing eachsegment with the SG. SafeDrive considers as abnormal thosesegments that largely deviate from the SG.

We implement SafeDrive on a real-world cloud-based IoVplatform, which connects over 29,000 vehicles from 60 cities.Each vehicle is equipped with an On-Board Diagnostics(OBD) connector to collect the vehicle’s parameter valuesand send the data to the server through the mobile wirelessnetwork. The evaluation results demonstrate that SafeDriveis able to effectively identify driving anomalies includingaggressive acceleration, sudden breaking, fast turn, and evenmismatching of RPM with speed.

In summary, the article makes the following main contribu-tions:• A status-aware behavior model, which is able to combine

multi-sensor data of different vehicles, to characterizenormal driving behaviors quantitatively.

• A light-weight online anomaly detector for detecting avariety of abnormal driving behaviors from large scalevehicle data.

• An implementation of SafeDrive upon a large-scalecloud-based IoV platform, and an evaluation of SafeDrivewith a huge volume of real-world driving data.

The reminder of this article is organized as follows. Sec-tion II summarizes related work. Section III presents anoverview of the proposed SafeDrive approach. Section IVillustrates SafeDrive’s online detection of driving anomalies.Section V presents our evaluation of SafeDrive. Section VIconcludes this article.

II. RELATED WORK

Safe driving is one of the major public concerns, andidentifying abnormal driving behaviors is an indispensable partof improving driving safety [2], [7]. In recent years, varioustechniques have been proposed to detect driving anomalies.

Rule-based techniques employ thresholds to filter out dataof specific ranges and mark these data as driving events.These techniques are often adopted in previous work assystem solutions and basic behavior-evaluation mechanismsdue to their simplicity and efficiency [4], [5]. For example,using sensor data collected from smartphones, Zhao et al.[8] detect aggressive driving events based on thresholds. Todetect drunk driving, Dai et al. [9] propose a pattern-matchingalgorithm that compares acceleration with pre-defined drunkdriving thresholds. Fazeen et al. [10] combine rule-basedbehavior analysis with road-condition evaluation to constructa smartphone-based safe driving system. Moreover, Taha etal. [11] propose a threshold-based framework to evaluate thedriving behaviors from Controller Area Network (CAN-bus)data collected by OBD connectors.

Supervised-learning-based techniques are another kind ofwidely adopted techniques. By using labeled data, a classifiercan be trained and further used to predict unlabeled data.Specifically, Chen et al. [4] propose a classifier based onSupport Vector Machine (SVM) to recognize abnormal drivingstyles, such as swerving and fast U-turn, from smartphonesensors in real time. Quintero et al. [12] propose a techniquebased on Artificial Neural Network (ANN) to detect erraticdriving from OBD and GPS data, and the model is evaluatedon a driving simulator. Hone et al. [6] propose a techniqueof data fusion by combing accelerometer data with OBD dataand using a naive Bayes classifier to identify aggressive drivingbehaviors. Jaramillo et al. [13] propose an online monitoringsystem based on a fuzzy clustering algorithm.

In addition, Johnson et al. [14] use Dynamic Time Warping(DTW) to detect aggressive driving using smartphone sensordata. Li et al. [15] construct a driving analysis system viaoperation-mode classification. Miyajima et al. [19] propose aGaussian Mixture Model (GMM) to model driving behaviorsand further to identify drivers. As foundational research, Con-stantinescu et al. [17] investigate driving-style categories witha clustering algorithm. Bolovinou et al. [16] survey techniquesof driving-style recognition for co-operative driving. Banerjeeet al. [22] propose an algorithm named Skill-Aggression-Quantifier (SAQ) to evaluate driving behaviors. They alsoimplement a tool named MyDrive based on the SAQ algorithm.Lei [23] design a framework to detect anomalies in trajectorybehaviors, which can be considered as another kind of drivingbehaviors. There are also various other techniques [20], [21]in this research area.

In summary, although there exist various previous tech-niques [24]–[26] of unsupervised or semi-supervised anomalydetection, in driving-analysis scenarios, previous work ismainly based on the definition of driving behaviors, suchas rules or patterns. To identify a specific type of drivinganomalies, it is usually necessary to define anomaly patternsand prepare labeled data first. However, in a real-world IoVscenario, such labeled data are not available because the dataare collected automatically, and thus manually labeling drivingstyles is not applicable. In addition, driving behaviors arestatus-aware, and the behavioral model should be able toreflect detailed characteristics of driving. Therefore, based onan IoV system and the huge volume of collected data, in

Page 3: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 3

Historical Dataset Weighted State GraphGenerated States

Offline Building of Driving Behavioral Model

Online Detection of Driving Anomalies

Sub-GraphReal-time

Stream

Mapping and Comparison

Stream

Segments

Driving

Anomalies

S1S4

S3

S5

S6

S3

S4

S1

S2S1

S2

S4

S3

S5

S6

0.6

0.050.49

0.5 0.01

0.4

0.35

0.2

0.4

Fig. 2. Overview of SafeDrive.

this article, we propose SafeDrive, an online, data driven, andstatus-aware approach for driving-anomaly detection.

III. OVERVIEW OF SafeDrive

Modeling of driving behaviors plays a fundamental rolefor detecting driving anomalies but is quite challenging. Asstated earlier, to detect driving anomalies, a behavioral modelshould be able to (1) cover variable relationships of drivingdata and (2) reflect driving styles quantifiably. From thedata perspective, the model should reflect the relationshipsof different types of data and their patterns. To that end, wepropose a state-graph-based behavioral model, as discussed indetail in this section.

The overview of SafeDrive is shown in Figure 2. Themodel contains two main parts: the offline building of adriving behavioral model and the online detection of drivinganomalies. Overall, we use the model built offline based onthe historical data, to online identify the newly-arrived datastream.

Offline building of a driving behavioral model. Our basicidea is to uniformly model the status relations of streamedvehicle data in a weighted state graph, in which the stateis a term used to represent the value (or its range) of dataattributes. Specifically, we adopt discrete states to quantifystatus (data values) and employ weighted edges (connectionsbetween states) to measure the relationship between states. Thestates are generated from different sensor data and connectedwith each other via weighted edges, in which manner themodel can combine multiple data, even those with differentfrequencies. The structure of the graph is constructed based onstatistics of the historical data; as a result, the graph becomes adetailed behavioral model for fusing different vehicles’ data.In this way, the graph structure can objectively reflect howpeople usually drive under different conditions or statusessince the weights are generated from real-world data. Theformal definition of SG is given in Definition 1.

Definition 1. SG A SG =< S,E > is a weighted directedgraph, where S is a set of states and E is a set of weightededges. A weighted edge e ∈ E corresponds to a kind ofrelation between states, where weight w ∈ (0, 1].

Online anomaly detection. In many cloud-based applica-tions, the collected driving data are often organized as a stream

𝑆1𝑎

𝑆2𝑎

𝑆1𝑏

𝑆2𝑏

Fig. 3. An example of SG in which bidirectional arrows represent correlationaledges and unidirectional arrows represent contextual edges.

or data sequence where each data instance contains differentattributes. The stream, as stated earlier, reflects driving behav-iors (related to contextual and correlational statuses) whereanomalies may occur. To detect anomalies online, we first splitthe newly arrived data instances in data stream into segmentsand then map each segment as a temporal sub-graph (TS-SG),which is further evaluated by being compared with the offline-generated state graph model. The sub-graph (or segment) thatsignificantly deviates from the state graph model is consideredas an anomaly. The formal definition of a temporal sub-graphis given in Definition 2.

Definition 2. TS-SG A TS-SG=< S∗, E∗ > is a temporalsub-graph of a state graph, generated by a data sub-sequence,where S∗ ∈ S and E∗ ∈ E. A specific state may repeatedly,with different time stamps, appear in a TS-SG.

Formally, in SafeDrive, a state s ∈ S represents a categoryor a set of data instances. For numerical data, states canbe acquired by discretization, while for categorical data, thecategory itself can be used as a state. In our evaluation(described in Section V), for example, the vehicle speedis generated into 100 states each of which covers a speedrange of ±1km/h. Acceleration and deceleration behaviorscan be reflected by the transitions between those speed states.Therefore, the abnormal level of acceleration behaviors can beevaluated by the connection weight from speed states with asmaller value to states with higher values. We regard the edgebetween the same type of states as the contextual edge, whichmodels the contextual driving behaviors reflected by the samekind of OBD data parameters. Note that for numerical dataattributes, a potential risk of discretization is state explosion:a huge number of states might be generated and thus causean extremely large graph. However, in most real-world cases,only a limited scope of ranges or data sources are used inpractice, largely limiting the scale of the graph.

In SafeDrive, different parameters are separately generatedinto different types of states. The co-occurrence relationsbetween different types of data reflect the correlational-state-related behaviors. For instance, the speed has a correlation withRPM, e.g., a high RPM value usually implies high vehiclespeed. To strengthen the expression ability of SafeDrive,we use correlational weighted edges to represent the co-occurrence relationships between different types of states. Asa result, two kinds of edges are enclosed in the SG, i.e.,contextual edges and correlational edges. Figure 3 illustratesan abstract example of a state graph with two types of states,Sa and Sb. Note that in SafeDrive, the state graph may containcycles since the data in a stream can be repeatable.

Page 4: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 4

As discussed earlier, the state-graph behavioral model isbuilt by two steps: state generation and graph construction.The first step uses discretization to transfer data ranges ofOBD parameters into states, while the second step scans thehistorical data set statistically, from which the edges betweenstates and their weights are derived and calculated. The valueof the weight of a contextual edge is computed by Equation1 where t is a time stamp referring to the relative temporalrelationship between states. The value of connection weightdenotes the conditional probability of s2 appearing at time(t+1) when s1 appears at t.

w(s1, s2) = p(s2(t+ 1)|s1(t)) (1)

For correlational relationship between two different typesof states sa and sb, with the objective of presenting a de-tailed reflection of the correlation, we implement it with twoconditional edges separately, i.e., from sa to sb and from sb

to sa. Their weights can be calculated according to Equation2. As can been seen in Equation 2, the values of connectionweights denote the probability of sb appearing at time t giventhe condition of sa appears at t, and the probability of sa

appearing at time t given the condition of sb appears at t. Notethat the correlational edges between sa and sb are asymmetricsince w(sa, sb) is usually not equal to w(sb, sa).

w(sa, sb) = p(sb(t)|sa(t))w(sb, sa) = p(sa(t)|sb(t)) (2)

The finally realized model includes 308 states: 100 states fordriving speed from 0km/h to 200km/h, 100 states for engineRPM from 0 to 5000, 100 states for swing angle from 0-360degrees, and the remaining 8 states for gear positions.

Building a state graph model that combines different typesof data makes it feasible to model a variety of driving behav-iors. For example, the speed states and their connections reflectacceleration and deceleration behaviors, while the connectionsbetween states of speed and gear position reflect the combinedcontrol behaviors of vehicle speed and gear position.

IV. ONLINE ABNORMAL DETECTION WITH SafeDrive

Typically, we evaluate driving behaviors for each shortperiod of time. To that end, after the SG is built, we use itto measure the online stream data. As shown in Figure 2, thestream is split into segments, each of which is a behavior unitand being mapped to a TS-SG. The segmentation length isbased on the time duration for completing a behavior. Ourempirical investigation suggests that an interval of 10 to 15seconds is suitable to represent a driving behavior.

Unlike other sub-graphs, a TS-SG contains contextual in-formation of data stream. In such a graph, a state is allowedto appear repeatedly given that specific data are likely to begenerated repeatedly. For example, when driving in a stablestatus, many of the sampled speed data in the uploadingstream might be the same; hence, the TS-SG may contain somerecurring states with different time stamps.

The states of TS-SG are generated in the same way thatthe states of SG are generated. The edges in TS-SG also haveweight values assigned according to their counterpart edges in

as1

bs1bs1

as2

t 1t

cs1

cs2

as1

bs1bs1

as2

t 1t

cs1

cs2

(a) (b)

Fig. 4. Temporal sub-graph that contains an anomaly, where a red dashedline represents an abnormal connection with low probability. (a) ContextualAnomaly, (b) Correlational Anomaly.

the original state graph. For example, if there is an edge fromsa to sb in a TS-SG, then its weight value equals to the valueof w(sb, sa) in SG. Specifically, if no such edge exists in theSG, the graph would be updated by SafeDrive automatically.

f(TS − SG) = 1

m

∑si,sj∈S∗

w(si, sj)−1 (3)

After the TS-SG of each segment is generated, according toEquation 3, we compute an anomaly score for the sub-graphTS-SG, marked as f(TS-SG). Given that the aim of the scoreis to filter out those state connections with low probabilities,we hereby employ an inverse proportional function to con-struct f(TS-SG). In this manner, we are able to amplify lowprobabilities and filter them out. Note that m is the numberof edges in the sub-graph. Basically, we consider TS-SG withlow-probability edges, which usually cause a high value off(TS-SG), as an anomaly. The score is compared with δ, athreshold defined manually. If f(TS − SG) > δ , then thesub-graph is marked as an anomaly. In practice, it is suggestedto choose the threshold δ according to the distribution of scoref(TS-SG).

Figure 4 shows two abstract examples of sub-graph withdifferent types of anomalies caused by contextual and cor-relational relationships, respectively. In real-world drivingscenarios, contextual and correlational driving anomalies mayoccur simultaneously because the data are generated by vehiclecomponents closely working together, and a specific abnormaldriving behavior or operation may cause various anomalies.

By analyzing the structure of the abnormal TS-SG andevaluating which kind of edge causes a high score value,data analysts are able to understand the detailed reason forthis anomaly. For example, if the cause of high f(TS-SG)is vehicle-speed state transition, it signifies that the driverbehaves not so well in accelerating or decelerating. While ifRPM states cause the high score, it signifies that the driverdoes not take a smooth control of accelerator pedal or gearposition, suggesting that the driver drives either aggressivelyor unskillfully.

Note that due to the limitation of historical data, changeof environment, or people’s driving styles, the structure ofthe graph may need to be able to evolve over time. Suchcharacteristic is known as concept drift in anomaly detectionfor streaming data [18]. In this scenario, for example, giventwo states, si and sj , when measured in different times,w(si, sj) could be different. As a result, the abnormal level of

Page 5: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 5

Algorithm 1 Online Anomaly Detection Algorithm

Input: (1) DS;(2) SG.Output: (1) Anomaly.

1: SegmentSet ← Split{DS}2: //split the data sequence into segments3: for each Segment Seg in SegmentSet do4: TS-SG ← Match{Seg,SG}5: matching the Seg with SG, generate a TS-SG6: calculate f(TS − SG)7: if(f(TS − SG) > δ)8: Output Seg and TS-SG as an anomaly9: end for

sequence si, sj changes over time. Failing to sense or accountfor such change could lower the performance of the detectorby causing many false alarms. We address this problem bydesigning a module in the cloud named SG-Maintainer tomaintain and update the state graph. In practice, the SG-Maintainer maintains an array that records the connectionnumber between states. It updates the array when each dataarrives and then periodically calculates the connection weightsof the SG according to the array.

Given a data sequence DS, the online detection is describedin Algorithm 1.

V. EVALUATION

We comprehensively evaluate the effectiveness and effi-ciency of SafeDrive. In this section, we first present the datadescription used in the evaluation, followed by a detailedcategory analysis of the detected abnormal driving behaviors.We quantitatively evaluate the detection accuracy and com-putational performance of SafeDrive, and compare SafeDrivewith other related approaches.

A. System and Data Description

We evaluate the performance of SafeDrive on a real-worldIoV system. The system is designed as a cloud-based IoVarchitecture in which driving data are collected with OBD de-vices plugged in the vehicles. Each OBD device has integrateda wireless communication module to maintain connectionswith the back-end server and send the collected data to theserver with an adjustable time interval. Over 29,000 realvehicles from 60 cities have been connected to the system.This system collects around 0.2 billion data instances daily.

Table I lists the details of the data attributes used in theevaluation, including speed, RPM, swerve angle, and gearposition. The OBD connector is capable of sampling variousattributes from vehicles, such as the door status or break pedalstatus, and we are aware that employing more parameters mayimprove the analysis performance. However, due to the factthat different vehicle manufacturers use different CAN-BUSprotocols, it is not easy to collect all the parameters from allthe vehicles in the fleet. Hence, to assure the applicability ofthe learned model, we construct it based on attributes thatcould be collected from almost any kind of vehicles. Also,such attributes are considered as directly influenced by drivingbehaviors, and they can reflect driving behaviors to a largeextent. We use these attributes to evaluate the lateral and

TABLE IDATA SPECIFICATION

Name Type Range DescriptionSpeed Numerical 0-200km/h Vehicle speedRPM Numerical 0-8000 Engine round per minuteSwerving Numerical 0-270

degreeThe change of vehicle di-rection

Gear posi-tion

Enum 8 positions Gear position

longitudinal dynamic of a vehicle. Note that the swerve angleis not directly collected but is calculated based on positiondata collected from the GPS module embedded in the OBDconnector.

The vehicles in this system belong to a chauffeur companyand the drivers are hired after a strict selection, most of whomare experienced and well trained. Thus, we assume that thebehaviors of most drivers, under most situations, are normal.Therefore, it makes sense to use the state graph generated inthis system as a benchmark for evaluating abnormal driving.In the training phase, the data collected in the first month areused to construct the graph, and then the generated state graphis deployed to analyze the data of subsequent months.

However, note that in other driving scenarios, it is possiblethat SafeDrive may ignore some unsafe driving styles if manydrivers perform unsafe behaviors habitually. For example,according to a report [27], nearly 35% American drivers areaggressive. Therefore, some aggressive driving styles might beidentified as normal by our learned model. In such situation,the training data should be collected in a manner such thatthe training data represent only non-aggressive driving styles.Hence, it is suggested to collect the training data with selecteddrivers driving under safe instructions.

SafeDrive is initially implemented as a cloud service be-cause the infrastructure of the system is designed and imple-mented with the cloud so that data can be collected. Therefore,the analysis of driving behaviors is conducted in the cloud.However, it is known that driving alert is safety-critical andthe application may suffer from network delay; thus, to acquirea faster reaction, the model is further implemented into asmartphone application. The smartphone is supposed to bein the vehicle and maintains communication with the OBDconnector to sample data in a higher frequency, and withthe cloud to update its model. In this manner, the backendserver takes charge of collecting driving data from the fleetand updating the state graph model, which is updated to thesmartphone application periodically.

The system is running on a cluster with 25 computationservers, each of which is equipped with 16GB memory anda quad-core processor. The data stream is uploaded fromeach vehicle and collected by 2 TCP servers. Then the datais loaded into a distributed data bus system. The streamingcomputing system takes 5 of the servers. The information fromthe cluster and application monitoring shows that the real-timedata uploaded by the vehicles are easily handled by the system,and the CPU and memory use ratio maintains lower than20%. To further assess the computational cost of SafeDrive,we replay the historical data with a much higher ratio on a

Page 6: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 6

TABLE IIABNORMAL DRIVING BEHAVIORS DETECTED BY SafeDrive

Anomaly Behavior Corresponding Anomaly in TS-SGRapid Acceleration Speed states with small value connect to states with

large value and the connection weights are low.Sudden Breaking Speed states with large value connect to states with

small value and the connection weights are low.RPM-SpeedMismatching

RPM states with large value connect to speed stateswith small value and the connection weights arelow.

Over speed Speed states with extremely high value occurs.These ‘rare states’ often have very low connectionweight with other states.

RPM Anomaly RPM states with extremely high value occurs(which are rare) and have very low connectionweight with other states.

Rapid Swerving Swerving states with large value connect to speedstates with large value and the connection weightsare low.

Neutral Taxiing Neutral gear position states connect to speed stateswith value larger than 0km/h.

personal computer with quad-core Intel processor and 8GBmemory, and the data is processed by SafeDrive implementedwith Java. The simulation result suggests that the model isable to process millions of data instances per second on thatsingle computer, indicating that SafeDrive has a potential tobe employed to deal with large-scale IoV scenarios.

B. Anomaly-Category Analysis

The detection results are classified and analyzed based ondriving behavioral semantics. As previously stated, SafeDrivecalculates an anomaly score for each sequence segmentationbased on its inner data relations, and then identifies anomaliesby comparing the score with a threshold. The evaluationitself does not provide a semantic description for the detectedabnormal behaviors. Therefore, to better understand what kindof behaviors SafeDrive is capable of identifying, we manuallyclassify the results according to the structure of the abnormalTS-SG. Table II lists the categories of abnormal drivingbehaviors identified by SafeDrive. Seven kinds of abnormaldriving behaviors can be detected.

SafeDrive is a status-aware anomaly detector in that it evalu-ates driving behaviors under a specific status. The connectionweight of SG is actually the condition probability. The de-tected anomalies are classified into two categories: contextualanomalies and correlational anomalies. This classification isbased on the type of edge that causes a high value of f(TS-SG). If a contextual edge causes a high score, then the anomalyis a contextual anomaly. If a correlational edge causes a highscore, then the anomaly is a correlational anomaly.

SafeDrive learns an unsupervised model and automaticallydetects anomalies, but the detection results provide no se-mantic description beyond true or false, limiting the practicaluse of the system. Therefore, to provide readable warninginformation to drivers, based on Table II and manual analysis,we use a rule set to translate the detected anomalies into theircorresponding semantic explanations. For example, a rapidacceleration warning is given if an anomaly is caused by thetransition from a speed state with a small value to a speed

0

30

60

90

120

150

1 5 9 13 17 21 25

Sp

eed

(k

m/h

)

Rapid Acceleration

0

1000

2000

3000

4000

5000

6000

1 4 7 10 13 16 19

RP

M

RPM Anomaly

Fig. 5. Examples of contextual driving anomalies.

state with a large value. This section provides a number ofanomaly examples and discusses how they are detected. Notethat the horizontal axis in Figures 5 and 7 (for showing theanomaly examples) represents the relative index of the datainstances, reflecting their temporal relationships in the stream.

1) Contextual Abnormal Behaviors: In the uploaded datastream, the value of RPM or speed is a behavioral attributeand the time is a contextual attribute. Driving anomalies areidentified by evaluating the behavioral attributes under a spe-cific context. For driving evaluation, the behavioral attributesunder different contexts are quantified and expressed in anSG, and hence it is reasonable to apply this model to identifycontextual anomalies.

Figure 5 illustrates two sequences with detected RPM andacceleration anomalies, as marked by the red boxes. Rapidacceleration or deceleration (sudden breaking) is the moststraightforward contextual anomaly. They are detected bySafeDrive for the reason that, given a current speed state,its connection weight with a much higher speed state issmall, causing a higher anomaly score. These driving stylesare considered as anomalies and are not advocated becausethey could cause more fuel consumption and increase vehicle-component wear. Most drivers usually do not adopt suchdriving styles.

The RPM anomaly shown on the left-hand side of Figure 5suggests that the engine raves are extremely high, beingunusual and considered as an anomaly. These high valuescorrespond to ‘rare states’ in the state graph because theprobability of incurring such states (values) is extremely low,meaning that only a few states are connected with thosestates and the edges to them have low connection weights.As discussed earlier, the state graph has 100 initialized RPMstates and does not contain a value exceeding 5000. Forthe implementation, when these rare states occur, they areinserted into the graph by the SG-Maintainer module in caseof concept drift. Although they are inserted into the graph, stillthese rare states do not have a close relationship with otherstates. The rare RPM states with high values are sensitive tocontextual attributes, further causing them to be identified asan anomaly. SafeDrive detects several over-speed anomaliesexceeding 150km/h for the same reason. Figure 6 shows thenumber of connections of each speed state. It can be seen thatstates of higher speed tend to have sparse connections withother states in the graph. This sparsity characteristic is themain reason why the anomalies are detected.

Page 7: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 7

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120 140

Nu

mb

er o

f C

on

nec

ted

Sta

tes

States of Speed (km/h)

Fig. 6. The connection number of each speed state.

SafeDrive is able to detect this kind of driving anomalies,whereas other previous approaches have to rely on a com-prehensive rule set to accomplish such detection. Still, futurework, such as introducing information of location and roadnetwork into SafeDrive, is required to improve detection ofanomalies such as over speed, because the dangerous levelof such anomalies varies depending on the specific roadcondition.

2) Correlational Abnormal Behaviors: Correlational be-haviors exist between co-evolving or correlational sequences,such as (speed, RPM) and (speed, gear position). A cor-relational anomaly can be detected when the data deviatesthe relationships between the data sequences. As listed inTable II, there are three kinds of correlational anomaliesdetected by SafeDrive, including rapid swerving, RPM-speedmismatching, and neutral taxiing. SafeDrive is also able toidentify correlational abnormal behaviors by jointly evaluatingtwo or more types of data attributes.

A representative correlational driving anomaly in this ap-plication is rapid swerving, which could be identified byspeed and swerving angle. For swerving data, due to a highpossibility of making a turn for a vehicle running on an urbanroad network, the extent of swerving angle change by itselfwould not provide much value in anomaly detection. But bycombining such information with vehicle speed, SafeDrive candetect a rapid serving anomaly.

Figure 7 shows examples of detected RPM-speed mismatch-ing and rapid swerving anomalies. It can be seen from themismatching anomaly that the vehicle RPM is too high for thecorresponding vehicle speed. RPM-speed mismatching occurswhen the vehicle speed is low and the driver pushes the gaspedal aggressively. Fast acceleration usually happens with thisbehavior; however, depending on specific road conditions orgear positions, the vehicle speed does not always dramaticallyincrease, causing this RPM-speed mismatching. For the rapidswerving anomaly, the vehicle continuously takes two turns;before that, the vehicle was running at a fast speed of nearly100km/h. Although the driver slows down the vehicle inadvance, the speed is still too fast for turning over 100 degrees.In the state graph model, most high-speed states are connectedwith lower swerve angle states, causing SafeDrive to identifythose fast turn anomalies.

For a neutral taxiing anomaly, the gear is put in a neutralposition while the vehicle is still running at a relatively high

0

10

20

30

40

50

60

70

80

90

0

1500

3000

4500

6000

1 3 5 7 9 11 13 15 17

Sp

eed

(k

m/h

)

RP

M

RPM-Speed Mismatch

RPM Speed

0

20

40

60

80

100

120

140

0

20

40

60

80

100

120

1 3 5 7 9 11 13 15 17

Sp

eed

(k

m/h

)

Sw

erv

ing

An

gle

(d

egre

e)

Rapid Swerving Swerving Angle

Speed

Fig. 7. Examples of correlational driving anomaly.

speed. Such anomaly is another representative correlationaldriving anomaly that can be detected by SafeDrive. Thisbehavior usually occurs on the downhill path or straight roadwhen the vehicle speed is high. Some drivers’ driving exhibitssuch behavior because they think it is a fuel-efficient way ofdriving. But in fact, neutral taxiing is not fuel efficient formost automatic transmission vehicles, and it could also bedangerous because it increases the breaking distance whendanger happens. This behavior can hardly be detected onlyby speed data; however, by combing speed data with gearposition, the detection would be much easier. In the stategraph model, the gear position states with higher values tendto connect with vehicle speed states with higher value, whilenormally neutral position states only connect with the speedstates with values lower than 10km/h.

In the evaluation, note that some abnormal behaviors, suchas fast acceleration and RPM anomalies, could occur simul-taneously because the data are generated by highly correlatedvehicle components. Due to the complexity of driving envi-ronments, some behaviors could also occur alone.

C. Quantitative Evaluation

To understand the applicability of SafeDrive, we manuallyanalyze the detection results of an arbitrarily chosen week.Figure 8 shows the False Positive Rate (FPR) evaluation ofthe model on detecting contextual and correlational anomaliesunder different score threshold δ. FPR is the probability thata normal behavior being detected as abnormal. A long taileffect can be observed from the curves as the FPR decreasesfast with the increase of threshold δ. When setting a smallerthreshold δ, many false alarms would be raised by SafeDrivedue to the fact that most of the behaviors are normal with a lowscore. As shown in the curve, when the threshold is relativelysmall, the FPR of correlational anomaly detection is lowerthan contextual anomaly detection. Such result shows thatthe discrimination between normal and abnormal correlationalbehavior is higher than that of contextual behavior.

The trend of FPR curve with threshold also shows thedistribution characteristic of the anomaly score. Therefore,in practice, it is suggested to set the threshold according tostatistical principles. For example, let δ > µ + 3σ, where µis the mean value of anomaly score and σ is the standarddeviation of the score.

Many industrial solutions use rule-based techniques to ad-dress the problem of driving-behavior detection. Therefore, in

Page 8: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 8

𝛿

Fig. 8. FPR under different threshold values.

0.8

0.85

0.9

0.95

1

Speed

Anomaly

RPM

Anomaly

Swerving

Anomaly

Neutral

Taxiing

RPM-Speed

Mismatch

Rec

all

SafeDrive RuleBased

Fig. 9. Recall evaluation for different types of anomalies.

our evaluation, we first compare the performance of SafeDrivewith a rule-based approach, which is frequently being used asan industrial solution. In the rule-based approach, the used ruleset contains a number of rules that define the outlier thresholdsuch as acceleration > 1.5m/s2.

Figure 9 shows the recall evaluation and the comparisonwith the rule-based approach. The speed anomaly in the figurerefers to fast acceleration, deceleration, and over speed. Inthe evaluation, it is found that the rule-based approach mayhave ignored anomalies in several specific situations, such asthe fast acceleration when driving at a high speed requiring asmaller threshold for the rules, and also the swerving anomaly,which might need a more comprehensive rule set. It might bepossible to build a comprehensive rule set to perform muchbetter. However, in some cases, such as driving monitoring,it might be hard to build such a reasonable rule set manuallybecause the data might have many attributes and have complexrelationships with each other. SafeDrive fills this gap byautomatically extracting complex relationships from the dataset and representing such relationships with a state graphmodel.

Recent research uses Hidden Markov Model (HMM) andSVM to detect driving anomalies. Thus, we further compareSafeDrive with an HMM-based detector and an SVM-baseddetector. As shown in Table III, with our data set, SafeDriveoutperforms both SVM-based and HMM-based detectors inthe detection accuracy. Given that the data set of SafeDrive

TABLE IIIPRECISION OF HMM-BASED DETECTOR AND SafeDrive

Model Contextual Correlational Averave PrecisionSafeDrive 94.0% 92.0% 93.0%HMM 90.0% 88.0% 89.0%HMM 85.0% 84.0% 84.5%

is too huge to label them all, to train the model, parts ofthe training data are labeled as a training data set for HMMand SVM. The testing result is even lower than 86%. Theperformance is improved as we increase the labeled trainingset. However, in IoV scenarios, it is hard to prepare sufficientlabeled data to train the model because the collected data vol-ume is too huge to label. Our SafeDrive approach outperformsthose popular supervised algorithms because SafeDrive doesnot require labeled data and thus it can largely utilize the hugetraining set to improve the performance. It is believed thatsupervised learning algorithms can also produce a better result,but the high cost in large scale industrial scenarios might notbe desirable.

By fusing different data in a state graph, SafeDrive isable to detect detailed anomalies from streamed driving data.However, the fusion of multiple data in one model may alsohave negative effects. As the results suggest, the combinationmay cover several anomalies because the data may interferewith each other and thus lower the sensitivity of the model.In practice, for different data attributes, it is suggested to do acorrelation analysis for different data attributes and to regulatethe state graph by removing the connections between attributeswith no obvious correlations.

Compared with other approaches, SafeDrive has severalmain advantages. First, it uses an SG to represent normaldriving behavior. The graph is derived from a large data set,and thus this metric is more objective. Second, SafeDrive doesnot require labeled data to train the detector. Such factor isimportant for IoV scenarios because labeled abnormal dataare often hard to acquire. Third, the SG can be updated withnew data arriving, making SafeDrive sensitive and adaptiveto the change of the environment. Finally, SafeDrive has lowcomputation cost. Its major cost is matching the newly arriveddata as a temporal sub-graph, and such matching can be veryefficiently implemented by indexing. These advantages makeSafeDrive applicable for large-scale IoV scenarios.

However, SafeDrive also has its limitations as driving is notonly status-aware but also environment-aware. Since drivingbehaviors are affected by many environment factors such astraffic and road conditions, and they can be also affected byother drivers’ behaviors. Therefore, several detected unsafebehaviors may actually correspond to safe behaviors depend-ing on specific environments. For example, a hard break toavoid a collision may be considered as safe behaviors whenthe vehicle driving in the front suddenly breaks or there aresome obstacles on the way. Unfortunately, this problem canhardly be solved solely with vehicle data. Other data typessuch as video or radar data should be introduced to senseenvironment conditions. In fact, part of our ongoing work isto develop a mobile-phone-based application to collect and

Page 9: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 9

analyze driving video and g-sensor data, aiming at extractinginformation about road conditions, and thus to enhance thedetection of traffic conditions from driving video data. Butfor the time being this application has not been widelydeployed. Nevertheless, to construct a comprehensive drivingsensing platform, it is necessary to fuse different types ofdriving data. Such direction is becoming an important researchtrend. Another potential limitation of this work could be nottaking personal characteristics into consideration. This modeladdresses the problem of the driver’s behavior evolution ingeneral but does not consider the change of personal behaviorsas one might change his or her driving style over time. Thepersonalized behavioral analysis should be carefully addressedbecause a driver may have a habitual bad driving behavior,which might be inappropriately considered as normal and thusbe ignored in our solution.

VI. CONCLUSION

We have proposed an online, unsupervised, and status-aware approach, named SafeDrive, to detect abnormal drivingbehaviors from large scale vehicle data. Compared with otherapproaches, SafeDrive uses normal behaviors, which are repre-sented by an SG extracted from a large data set, as benchmarksfor identifying abnormal behaviors. The realtime-uploadeddriving data stream is split into segments by SafeDrive andeach segment is mapped as a TS-SG, which is further comparedwith the SG. A real-world IoV system with over 29,000 realvehicles connected is used to evaluate the performance ofSafeDrive. The results suggest that our model performs wellin detecting various driving anomalies without using labeledtraining data. The computational cost of SafeDrive is very low,and a single PC is capable of dealing with millions of datainstances per second, enabling SafeDrive as an ideal option todetect driving anomalies from large-scale vehicle data. Still,future work on analyzing driving behavior patterns by fusingvehicle data and video, and even road network, needs to beconducted to provide comprehensive understanding of drivingbehaviors.

VII. ACKNOWLEDGMENT

This work is supported by grants from China Key Researchand Development Program (2016YFB0100902), National Nat-ural Science Foundation of China (61602067, 61421003),Fundamental Research Funds for the Central Universities(106112015CDJXY180001). Tao Xie’s work is supported inpart by NSF under grants no. CCF-1409423, CNS-1434582,CNS-1513939, CNS-1564274. We would like to thank UCARInc on the collaboration of the data that used in this paper.

REFERENCES

[1] Jakobsen, K., Mouritsen, S. C., Torp, K. (2013). Evaluating eco-drivingadvice using GPS/CANBus data. In Proceedings of the 21st ACMSIGSPATIAL International Conference on Advances in Geographic In-formation Systems (pp. 44-53). ACM.

[2] Xiaoqiu, F., Jinzhang, J., Guoqiang, Z. (2011). Impact of Driving Behav-ior on the traffic safety of Highway Intersection. In Proceedings of the3rd International Conference on Measuring Technology and MechatronicsAutomation (Vol. 2, pp. 370-373). IEEE.

[3] Gao, X., Firner, B., Sugrim, S., et al., (2014, September). Elastic pathing:Your speed is enough to track you. In Proceedings of the 2014 ACMInternational Joint Conference on Pervasive and Ubiquitous Computing(pp. 975-986). ACM.

[4] Chen, Z., Yu, J., Zhu, Y., Chen, Y., Li, M. (2015). D3: Abnormal drivingbehaviors detection and identification using smartphone sensors. In Pro-ceedings of the 12th Annual IEEE International Conference on Sensing,Communication, and Networking (SECON) (pp. 524-532). IEEE.

[5] Chakravarty, T., Ghose, A., Bhaumik, C., Chowdhury, A. (2013). Mo-biDriveScoreA system for mobile sensor based driving analysis: A riskassessment model for improving one’s driving. In Proceedings of the 7thInternational Conference on Sensing Technology (ICST) (pp. 338-344).IEEE.

[6] Hong, J. H., Margines, B., Dey, A. K. (2014). A smartphone-basedsensing platform to model aggressive driving behaviors. In Proceedingsof the 32nd Annual ACM conference on Human Factors in ComputingSystems (pp. 4047-4056). ACM.

[7] Amarasinghe, M., Kottegoda, S., Arachchi, A. L., et al., (2015). Cloud-based driver monitoring and vehicle diagnostic with OBD2 telematics.In Proceedings of the 15th International Conference on Advances in ICTfor Emerging Regions (ICTER) (pp. 243-249). IEEE.

[8] Zhao, H., Zhou, H., Chen, C., Chen, J. (2013, December). Join driving:A smart phone-based driving behavior evaluation system. In Proceedingsof the Global Communications Conference (GLOBECOM 2013) (pp. 48-53). IEEE.

[9] Dai, J., Teng, J., Bai, X., Shen, Z., Xuan, D. (2010, March). Mobile phonebased drunk driving detection. In Proceedings of the 4th InternationalConference on Pervasive Computing Technologies for Healthcare (pp.1-8). IEEE.

[10] Fazeen, M., Gozick, B., Dantu, R., Bhukhiya, M., Gonzlez, M. C.(2012). Safe driving using mobile phones. IEEE Transactions on Intelli-gent Transportation Systems, 13(3), 1462-1468.

[11] Taha, A. E. M., Nasser, N. (2015, July). Utilizing CAN-Bus andsmartphones to enforce safe and responsible driving. In Proceedings ofthe IEEE Symposium on Computers and Communication (ISCC) (pp.111-115). IEEE.

[12] Quintero, G. C. M., Lpez, J. A. O., Ra, J. M. P. (2010, September).Intelligent erratic driving diagnosis based on artificial neural networks.In Proceedings of the 2010 Andean Council International Conference(ANDESCON) (pp. 1-6). IEEE.

[13] Jaramillo, D., Narvez, C. (2014, June). Vehicle Online MonitoringSystem Based on Fuzzy Classifier. In Proceedings of the 3rd Interna-tional Conference on Advances in Vehicular Systems, Technologies andApplications.

[14] Johnson, D. A., Trivedi, M. M. (2011, October). Driving style recogni-tion using a smartphone as a sensor platform. In Proceedings of the 14thInternational Conference on Intelligent Transportation Systems (ITSC)(pp. 1609-1615). IEEE.

[15] Li, K., Lu, M., Lu, F., Lv, Q., Shang, L., Maksimovic, D. (2012, June).Personalized driving behavior monitoring and analysis for emerginghybrid vehicles. In Proceedings of the 10th International Conference onPervasive Computing (pp. 1-19). Springer Berlin Heidelberg.

[16] Bolovinou, A., Amditis, A., Bellotti, F., Tarkiainen, M. (2014). Drivingstyle recognition for co-operative driving: A survey. In Proceedings ofthe 6th International Conference on Adaptive and Self-Adaptive Systemsand. Applications (pp. 73-78). IARIA

[17] Constantinescu, Z., Marinoiu, C., Vladoiu, M. (2010). Driving styleanalysis using data mining techniques. International Journal of ComputersCommunications and Control, 5(5), 654-663.

[18] Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection: Asurvey. ACM computing surveys (CSUR), 41(3), 15.

[19] Miyajima, C., Nishiwaki, Y., Ozawa, K., Wakita, T., Itou, K., Takeda,K., Itakura, F. (2007). Driver modeling based on driving behavior andits evaluation in driver identification. In Proceedings of the IEEE, 95(2),427-437.

[20] Karatas, C., Liu, L., Li, H., Liu, J., Wang, Y., Tan, S., et al. (2016).Leveraging Wearables for Steering and Driver Tracking. In Proceedingsof IEEE International Conference on Computer Communications (INFO-COM 2016) (pp. 1-9). IEEE

[21] Angela Burton, Tapan Parikh, Shannon Mascarenhas, Jue Zhang,Jonathan Voris, N. Sertac Artan, Wenjia Li. (2016) Driver Identificationand Authentication with Active Behavior Modeling, In Proceedings of2016 International Workshop on Green ICT and Smart Networking (GISN2016), Montreal, Canada.

[22] Banerjee, T., Chowdhury, A., Chakravarty, T. (2016, June). MyDrive:Drive behavior analytics method and platform. In Proceedings of the 3rdInternational on Workshop on Physical Analytics (pp. 7-12). ACM.

Page 10: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 …taoxie.cs.illinois.edu/publications/tii17-safedrive.pdf · A light-weight online anomaly detector for detecting a variety of abnormal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 10

[23] Lei, P. R. (2016). A framework for anomaly detection in maritimetrajectory behavior. Knowledge and Information Systems, 47(1), 189-214.

[24] Susto, G. A., Schirru, A., Pampuri, S., McLoone, S. (2016). Supervisedaggregative feature extraction for big data time series regression. IEEETransactions on Industrial Informatics, 12(3), 1243-1252.

[25] Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. (2002). Ageometric framework for unsupervised anomaly detection. Applicationsof data mining in computer security (pp. 77-101). Springer US.

[26] Zimek, A., Schubert, E., Kriegel, H. P. (2012). A survey on unsupervisedoutlier detection in highdimensional numerical data. Statistical Analysisand Data Mining, 5(5), 363-387.

[27] American Automobile Association. (2009). Aggressive driving: Re-search update. American Automobile Association Foundation for TrafficSafety.

Mingming Zhang is a Ph.D. student in ComputerScience at Beihang University, China. He is also avisiting Ph.D. student of the University of Illinoisat Urbana-Champaign, USA. His research interestsinclude pervaseive computing, Internet-of-Vehicles,and distributed systems.

Chao Chen is an associate professor at Col-lege of Computer Science, Chongqing University,Chongqing, China. He obtained the Ph.D. degreefrom Pierre and Marie Curie University and InstitutMines-TELECOM/TELECOM SudParis, France in2014. He received the B.Sc. and M.Sc. degreesin control science and control engineering fromNorthwestern Polytechnical University, Xian, China,in 2007 and 2010, respectively. Dr. Chen was therecipient of the Best Paper Runner-Up Award atMobiQuitous 2011. His research interests include

pervasive computing, social network analysis, urban logistics, data miningfrom large-scale taxi data, and big data analytics for smart cities.

Tianyu Wo is an associate professor at Beihang Uni-versity, China. He received the B.S. and Ph.D. de-grees in Computer Science from Beihang University,China in 2001 and 2008, respectively. His researchinterests include system software and applications indistributed systems. He is a member of IEEE.

Tao Xie is an associate professor and Willett FacultyScholar in the Department of Computer Scienceat the University of Illinois at Urbana-Champaign,USA. He received the M.S. and Ph.D. degrees inComputer Science from Peking University and theUniversity of Washington at Seattle in 2000 and2005, respectively. His research interests are soft-ware testing, program analysis, software analytics,software security, and educational software engineer-ing. He is a senior member of the IEEE.

Md Zakirul Alam Bhuiyan is an assistant professorof the Department of Computer and Information Sci-ences at the Fordham University. Earlier, he workedas an assistant professor at the Temple Universityand a post-doctoral fellow at the Central SouthUniversity, China, a research assistant at the HongKong PolyU, and a software engineer in industries.He received the Ph.D. degree and the M. Eng degreefrom Central South University, China, in 2013 and2009 respectively, and the B.Sc. degree from Inter-national Islamic University Chittagong, Bangladesh,

in 2005, all in Computer Science and Technology. His research focuseson dependable cyber physical systems, WSN applications, big data, cloudcomputing, and cyber security. He served as a lead guest editor of IEEE TBD,ACM TCPS, Information Sciences, and so on. He is a member of IEEE anda member of ACM.

Xuelian Lin is a research scientist of Beijing Ad-vanced Innovation Center for Big Data and BrainComputing, Beihang University, China. He receivedthe B.E. and Ph.D. degrees in Computer Scienceand Engineering from Beihang University, Beijing,China, in 1999 and 2013, respectively. His researchinterests are focused on time series data manage-ment, data processing, and Internet of vehicles.