urban area tra c flow forecasting in intelligent

Urban Area Traffic Flow Forecasting in IntelligentTransportation Systems

by

Ziyue Wang

A thesis submitted to

The Faculty of Graduate Studies of

The University of Manitoba

in partial fulfillment of the requirements

of the degree of

Master of Science

Department of Computer Science

The University of Manitoba

Winnipeg, Manitoba, Canada

August 2019

c© Copyright 2019 by Ziyue Wang

Thesis advisor Author

Parimala Thulasiraman Ziyue Wang

Urban Area Traffic Flow Forecasting in Intelligent

Transportation Systems

Abstract

Currently, Intelligent Transportation Systems (ITS), is revolutionizing the transporta-

tion industry. ITS incorporates advanced Internet of Things (IoT) technologies to

implement Smart City. These technologies produce tremendous amount of real time

data from diverse sources that can be used to solve transportation problems. In this

thesis, I focus on one such problem, traffic congestion in urban areas. A road segment

affected by traffic affects the surrounding road segments. This is obvious. However,

over a period of time, other roads not necessarily close in proximity to the congested

road segment may also be affected. The congestion is not stationary. It is dynamic

and it spreads. I address this issue by first formulating a similarity function using

ideas from network theory. Using this similarity function, I then cluster the road

points affected by traffic using affinity propagation clustering, a distributed message

passing algorithm. Finally, I predict the effect of traffic on this cluster using long-

short term memory neural network model. I evaluate and show the feasibility of my

proposed clustering and prediction algorithm during peak and non-peak hours on

open source traffic data set.

ii

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 Introduction 11.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 5

3 Background 133.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Connectivity-based clustering . . . . . . . . . . . . . . . . . . 133.1.2 Density-based clustering . . . . . . . . . . . . . . . . . . . . . 143.1.3 Centroid-based clustering . . . . . . . . . . . . . . . . . . . . 153.1.4 k-medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1.5 Affinity propagation . . . . . . . . . . . . . . . . . . . . . . . 163.1.6 Clustering metrics . . . . . . . . . . . . . . . . . . . . . . . . 18

Silhouette coefficient . . . . . . . . . . . . . . . . . . . . . . . 18Similarity mean . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.1 Recurrent neural network . . . . . . . . . . . . . . . . . . . . 21

4 Dynamic Traffic Clustering System 284.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Static data collection . . . . . . . . . . . . . . . . . . . . . . . 294.1.2 Dynamic data collection . . . . . . . . . . . . . . . . . . . . . 294.1.3 Communication between Sensors . . . . . . . . . . . . . . . . 304.1.4 Traffic clustering: Compute all pair-wise similarity . . . . . . . 31

iii

iv Contents

4.1.5 Affinity propagation . . . . . . . . . . . . . . . . . . . . . . . 344.1.6 Clustering results . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.2 Comparison Setup - Correctness of results . . . . . . . . . . . 354.2.3 Data set and environment . . . . . . . . . . . . . . . . . . . . 364.2.4 Cluster Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2.5 Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Cluster-based Traffic Prediction 425.1 Predicting traffic per cluster . . . . . . . . . . . . . . . . . . . . . . . 435.2 Time stamp clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Conclusion and Future Work 526.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Bibliography 65

List of Figures

3.1 Example of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Normal Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 LSTM Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4 Unrolled LSTM Neuron . . . . . . . . . . . . . . . . . . . . . . . . . 233.5 LSTM Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Dynamic Traffic Clustering System . . . . . . . . . . . . . . . . . . . 304.2 Peak Hour Silhouette . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Peak Hour Similarity Mean . . . . . . . . . . . . . . . . . . . . . . . 374.4 Non-Peak Hour Silhouette . . . . . . . . . . . . . . . . . . . . . . . . 384.5 Non-Peak Hour Similarity Mean . . . . . . . . . . . . . . . . . . . . . 394.6 Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 One to One Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Many to One Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 455.3 Many to Many Prediction . . . . . . . . . . . . . . . . . . . . . . . . 455.4 Prediction Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

v

List of Tables

5.1 Adam optimizer parameters . . . . . . . . . . . . . . . . . . . . . . . 485.2 Evaluation of LSTM Models . . . . . . . . . . . . . . . . . . . . . . . 51

vi

Acknowledgments

I would like to begin by thanking my advisor, my committee, my parents, my

significant other, and all the people who have supported me along the way.

vii

This thesis is dedicated to somebody special. You know who you are.

viii

Chapter 1

Introduction

This is the era of Artificial Intelligence (AI) and Internet of Things (IoT). There

is a clear interaction between these two areas. IoT connects many ”smart” physical

devices to generate and collect data for real time analysis. This data is voluminous.

To make any sense of the data, artificial intelligence incorporates ”smart” techniques

and algorithms into machines to allow machines to make real time decisions and

predictions on the data to provide useful insight.

Smart technologies such as IoT and AI already have applications in mobile devices.

Examples include virtual assistants like Siri and Alexa that aid in answering questions

posed by people in their day-to-day life in their mobile devices. IoT has the potential

and power to help to solve some of the challenging issues facing society. One of the

problem facing the day-to-day life of individuals is traffic. No matter where we live,

we will encounter some sort of traffic. In recent years, more and more people are

moving to urban, metropolitan areas for better quality of life. It is estimated that

61% of the population would have relocated to metropolitan areas by 2032 [44]. With

1

2 Chapter 1: Introduction

the increase in population and transport industries, the number of vehicles on the

road in urban areas is increasing. In the near future, we will be encountering drive-

less vehicles [3] on the road. Traffic conditions on the roads will exacerbate even

more, frustrate the individuals on the road, counteracting the better quality of life

that made individuals move to the urban areas in the first place.

Cities and the transportation industries are finding solutions through Intelligent

Transportation System (ITS). ITS incorporates advanced IoT technologies into the

transportation systems such as electronic sensor technologies near road side units,

data transmission technologies with fast data speed or sophisticated and intelligent

control technologies in traffic control systems. The goal of ITS is to implement a

Smart City [[27], Smart Cities Challenge] wherein personal drivers, traffic managers

or emergency responders are well connected through advanced technologies to make

well informed real time decisions on the go. These technologies produce tremendous

amount of real time data from diverse sources (such as GPS, social media, sensors,

etc.) that can be used to solve transportation problems including traffic congestion.

IoT and AI are leading the way in developing autonomous vehicles. Smart tech-

nologies are now adapted in vehicles to automate the interaction and exchange of

information for safe driving and increase the efficiency of our existing traffic systems.

Currently, ITS, is revolutionizing the transportation industry.

A road network is usually represented as a graph where landmarks, junctions or

intersections are used to represent the node entities in the graph and the roads/lanes

are used to represent the interrelationship between the node entities. In general, traffic

on the road could be caused by many different reasons: accidents, construction, peak

Chapter 1: Introduction 3

hours and so on. In studying the traffic congestion problem, there are two types of

locality we need to consider: temporal locality and spatial locality. Spatial locality

implies that if a road r is affected by traffic, then the surrounding roads of r are also

affected. Temporal locality implies that if r is affected at time t, chances are that

the surrounding roads of r will also be affected at time t + 1. Unfortunately, this is

not enough. The surrounding roads are not the only ones affected. Over a period

of time, t + δ, there will be a domino effect. That is, the affected roads will in turn

affect other roads which in turn affect other roads and so on. The congestion is not

stationary. It is dynamic and it spreads. The event created by the road point (r)

at time t and its effect on the road traffic at time t + δ is difficult to capture by the

smart technologies currently. Predicting the future of any event is in general difficult.

In my thesis, I focus on the traffic flow prediction problem on a road network in

urban areas. I take advantage of the capabilities of IoT and propose a dynamic traffic

awareness system to predict traffic. In this system, sensors are placed at various road

points, represented as an IoT network. The sensors dynamically collect and analyze

the traffic flow data to compute the similarity function between road points. Using

the similarity function, I find all the road points affected by the traffic at road point

r at time t, group them together to predict the effect of traffic on this group of nodes.

Grouping the nodes is nothing but clustering since they have similar features, in this

case traffic flow. I use the concepts from network theory, in particular maximum-

flow theory and shortest path algorithms, and distributed message passing clustering

algorithm called affinity propagation [15] to cluster the nodes. The algorithm is

executed continuously to capture up to date information about traffic. The disjoint

4 Chapter 1: Introduction

clusters are trained using long-short term memory neural network model to predict

the traffic flow on the affected roads.

The thesis is organized as follows. The next chapter provides the literature review

followed by background on clustering and prediction techniques. Chapter 4 discusses

the dynamic traffic awareness clustering system. Chapter 5 elaborates on the training

of the clusters using long-short term memory model. Finally, conclusion and future

work is provided in Chapter 6.

1.1 Contribution

The contribution of the thesis is to design, develop and implement a traffic pre-

diction algorithm on an urban road network. To achieve this, I do the following:

1. Partition the traffic road points into time variant clusters, such that the traffic

within the cluster are strongly spatially correlated.

2. Define a similarity metric based on flow for the clustering algorithm.

3. Design the affinity propagation clustering algorithm using my proposed similar-

ity metric.

4. Predict the traffic flow within these disjoint clusters using long-short term neural

network.

5. Implement, simulate and analyze the proposed algorithm on city data set.

Chapter 2

Literature Review

Research in Intelligent Transportation Systems (ITS) has been on going for a

very long time. Although, the term ”ITS” was not specifically used in the early

days. Drane and Rizos [12], for example, introduced the positioning based sys-

tem into transportation research. Using tools such as GPS, they developed position

based algorithms to guide the vehicles to reduce traffic costs. Similarly, Ozbay and

Kachroo [42] developed a software simulation tool that incorporated various man-

agement strategies and responses to handle incidents on the road. Technology has

improved tremendously since the 1990’s.

In the early twenty-first century, the term ITS was defined as a transportation

system that incorporated mobile devices, Internet of Things (IoT) including sensors,

Cloud and media sources to connect the surroundings. ITS is for not only studying

issues with respect to cars but also for rails, airplanes, freights and so on. Taniguchi

and Shimamoto [55] studied the optimal vehicle routing and scheduling problem for

freight carriers. The problem was to find the shortest travel time of freight carriers

5

6 Chapter 2: Literature Review

from a certain depot, pick up and drop off goods from customers within a specified

window and then finally return to the depot. To solve this optimization problem,

the authors used genetic algorithm to find the optimal solution. They used real time

traffic information but assumed a deterministic road network such as a mesh.

In recent years, with the growing population and vehicles on the road, Internet of

Things has been combined with traffic systems. Yu et al. [70] proposed a system to

gather traffic data from moving vehicles using RFID electronic tag. They showed that

their system can be adapted to study a wide variety of traffic related IoT applications.

Mitton et al. [39] designed a system that combined Cloud and IoT to build smart

city. The authors presented an architecture that allowed users accessing the Cloud to

acquire data from any heterogeneous IoT devices. Wu et al. [68] proposed UbiFlow,

a software-defined IoT system to manage traffic. The authors partitioned the large

urban traffic networks into small geographic pieces and studied issues such as fault

tolerance and flow scheduling. They claimed that it was the first system to study

ubiquitous flow control and mobility management in multi-networks. Al-Sakran [2]

developed a system based on IoT devices and agent based technology to collect and

monitor real-time traffic information. They used wireless sensor networks and RFID-

based networks to create links between sensors to share information. Theodoridis et

al. [56] discussed the 3-tier IoT design with IoT sensor tier, IoT gateway tier, and

server tier to design Smart city. They discussed the advantages and disadvantages of

this design and the technological challenges in designing Smart city. Similarly, Gaur

et al. [20] proposed a multi-level Smart city architecture using semantic modeling and

Dempster-Shafer approach. They divided the architecture into four levels with each

Chapter 2: Literature Review 7

level having its own responsibility including data collection, data communication,

and data processing. Strohbach et al. [53] proposed a Big data analytic framework

to study traffic flow and traffic problems. They provided some discussion of their

initial findings and the challenges posed in developing their framework. Meidan et

al. [38] used machine learning techniques to identify IoT devices that are connected to

network traffic. They used nine different IoT devices, smartphones and personal com-

puters to perform their experiments and showed that the IoT classification accuracy

of their model was 99.281%.

With the growing population, vehicles on the road are also increasing. Predicting

traffic is vitally important to reduce travel time. This problem has gained popularity

over the last few years in ITS. Short-term traffic flow prediction algorithms predict

the traffic flow in the near future. Many different prediction strategies have been

developed and compared [52; 54; 35] as is discussed below.

Traffic flow prediction can be divided into two categories: parametric and non-

parametric techniques [52]. Traditionally, parametric time series models such as Box-

Jenkins, have been used to predict the traffic flow. In these models, historical traffic

flow data is used in sequence of time steps to predict the future traffic flow. Autore-

gressive Integrated Moving Average (ARIMA) is one of the popular traffic flow predic-

tion techniques used in parametric Box-Jenkins time series model [64; 1]. There has

been many extensions of this model to improve prediction accuracy[59; 32; 65; 28; 66].

However, the results show that these classical methods have huge drawback on per-

formance and sometimes accuracy [52; 16; 36; 67].

k-nearest neighbor (k-NN) is one of the popular non-parametric models in traffic


flow prediction [69; 72]. k-NN searches the k user-defined constant training samples

and classifies the current sample (x) to the sample points that is closer to x and is

more frequent. k-NN is sensitive to data and the choice of k depends on the data.

Artificial Neural Network (ANN) is another popular non-parametric model, that has

been considered for traffic flow prediction [6; 60]. The recent research in ANN have

shown that variations of ANN such as convolution neural networks (CNN) and Long-

Short Term Memory Neural Network (LSTM) outperform other classic parametric

and non-parametric algorithms in both solution accuracy and performance. Huang

and Ran [26] use ANN to predict traffic speed with regards to the weather condition.

The combination of ANN and Bayes’s theorem is used to predict traffic flow by Zheng

et al.[73]. Fu et al.[16] have shown that GRU and LSTM neural network models shows

a 10% error rate reduction than ARIMA. Ma et al.[36] show that by using CNN, the

error rate can be reduced about 50% compared to k-NN. Tian et al.[57] have shown

that LSTM outperforms most non-parameteric algorithms.

In my thesis, I propose to use ANN to predict the traffic flow, in particular,

Long-Short Term Memory Neural Network (LSTM) model. LSTM neural network

was introduced by Hochreiter and Schmidhuber [24] as a variant of recurrent neural

network (RNN). The main change is that the hidden layer in RNN is included as a

cell called LSTM. The cell consists of three gates: input gate, forget gate and output

gate. These control the information through the cell and the neural network. The cell

output state and the hidden layer output are transmitted to the next neural network

in RNN. A series of RNN can be used to feed in the input data and predict future

output data. The LSTM structure is well suited for time-series data. It has been


applied to traffic speed and traffic flow prediction. Ma et al.[37] show LSTM reduces

the error rate by up to 60% compared with ARIMA model. In this thesis, I propose

to train the clusters using LSTM.

In other work, there has been some interest in clustering similar traffic flow pat-

terns. In [17] the traffic flow data is clustered using pair-wise similarity. Clustering

[4] is an important problem in data analytic [11] and graph analytic [47]. A graph

data structure is used to represent the relationship between the data points. The

vertices in a graph represent the data points and the edges represent the pair-wise

similarity between the vertices. In the transportation problem, the road network can

be represented as a graph. There is no universally accepted definition of graph clus-

tering [50]. One definition of graph clustering refers to the task of dividing the vertex

set V of a given graph G into k non-empty groups such that the number of edges (or

edge weights for weighted graphs) between vertices in different groups is minimized

[40]. For an arbitrary k, the graph clustering problem is NP-complete [19].

There are numerous approximate methods or heuristics such as swarm intelligence

(or nature-inspired) have been considered to solve the clustering problem. Swarm

intelligence algorithms are inspired by the collective behavior of social animals to

solve complex problems [5]. One such algorithm is Ant Brood Clustering (ABC)

[10], inspired by how ants cluster their brood or corpse into different piles. It was

first applied to the graph partitioning problem by Kuntz et al. [31]. In our recent

work [33; 46; 34; 45], variations of ABC have shown to achieve high quality solutions

for the clustering problem. The dynamic and distributive nature of ABC make it

interesting for modern graph analytic problems. The drawback of the ant brood


clustering algorithm is that it is compute intensive and performance degrades for

large data sets. In [61], I studied various parallelization strategies to speedup the

ant brood clustering computations. Another drawback of this algorithm is that the

mathematical model that requires considerable changes to suit the applications. I

therefore, opt not to use this algorithm for clustering traffic network.

Other well known partition based techniques are k-means and k-medoids. The

data points are partitioned into k clusters, k known apriori. In k-means, the initial k

cluster centers are chosen randomly. Then each data point calculates the Euclidean

distance between itself and the cluster center. The data point is assigned to the

closest center data point. Finally, the new center clusters are updated by calculating

the mean of the data points in the clusters. The process is repeated until there is no

change in the cluster centers. The k-medoids is similar to k-means with the constraint

that the center cluster is one of the data points. Initially, k medoids (or exemplars)

are chosen. Then the Euclidean distance is calculated between the data points and

medoids. The data point is assigned to the closed medoids. The new medoids are

found by choosing one data point from a cluster that minimizes the cluster variance.

By doing this, the algorithm minimizes the sum of dissimilarities between points la-

beled to be in a cluster and a point designated as the center of that cluster. This is in

contrast to the k-means algorithm that minimizes the total squared error. k-medoids

is called as an examplar based algorithm. Both, k-means and k-medoids are sensitive

to the initial random selection of centroids or examplars. Another exemplar-based

algorithm is affinity propagation (AP) proposed by Frey and Dueck [63]. Unlike clus-

tering algorithms such as k-means or k-medoids, affinity propagation algorithm does


not require the knowledge of k apriori. Unlike k-medoids, in the AP algorithm, all

data points are considered as examplars. It is a messaging passing distributed clus-

tering algorithm [63]. Two types of messages are sent between the data points in the

network: responsibility and availability. Data points send responsibility messages to

candidate examplars reflecting how well-suited the message-receiving point is to serve

as an exemplar for the sending data point. Candidate exemplars send availability mes-

sages to data points reflecting how appropriate it would be for the message-sending

point to be the exemplar for the message-receiving data point. Therefore, all data

points are considered to be either an examplar or cluster member depending on the

type of message being sent or received. The responsibility and availability metrics

are updated based on the similarity measure until convergence.

Shea et al. [51] used the affinity propagation algorithm for clustering dynamic

vehicular ad hoc networks. In my work, clustering is performed on a static road

network. Zhang et al. [72] use a modified affinity propagation algorithm to cluster the

road network. Rather using the Euclidean distance proposed in the original algorithm,

they used a similarity measure based on the speed of the vehicles on the road, the

flow of the number of vehicles entering and exiting a road. The responsibility and

availability metrics are updated using this modified similarity measure. The update

rules are quite complex and the proof of correctness is too complex to understand.

In my thesis, I will use the affinity propagation algorithm due to its simplicity

and the distributive nature of the algorithm. Also, it has been shown that this

clustering algorithm produces good solution quality with low error rates [48]. The

complexity of the algorithm, however, is O(mN) where m is the number of iterations


and N is the number of data points. For large data sets, the number of message

updates will be communication intensive. To increase performance or speedup the

algorithm, expansions of affinity propagation have been theoretically considered [18].

The main challenge in the affinity propagation algorithm is determining the similarity

measurement. In this thesis, I propose a similarity measure based on traffic flow for

clustering.

Chapter 3

Background

My thesis includes both clustering and prediction. So, this chapter provides

background on the various clustering and prediction techniques. it also provides

an overview of the clustering metrics used in this thesis.

3.1 Clustering

3.1.1 Connectivity-based clustering

Connectivity-based clustering algorithms cluster the nodes based on the node’s

distance. Nodes that are closer together have higher similarity than those that are

farther away. Linkage-based clustering algorithms belong to this class. Linkage-

based algorithms are bottom-up algorithms. It starts by assigning each node by

itself as a cluster. The clusters then begin to merge, until the minimum similarity is

reached between clusters or the minimum number of clusters is found. Merging or

unmerging clusters is determined by the similarity between the clusters. Depending

13

14 Chapter 3: Background

on the different methods used to measure the similarity between clusters, there are

three different kinds of linkage criterion. Single-linkage criteria find two nodes in two

different clusters with maximum similarity. If the nodes exhibit enough similarity,

then it merges the clusters. On the contrary, complete-linkage criteria consider the

nodes with minimum similarity. However, both single-linkage and complete-linkage

have drawbacks. That is, the algorithms consider only 1 node in each cluster. In

my thesis experiments, I use average-linkage criteria. This uses the average similarity

of the nodes in two different clusters allowing all nodes to be compared during the

merge phase.

3.1.2 Density-based clustering

Some clustering algorithms aim to find the clusters with high density. These

are classified as density-based algorithms. One such well known algorithm in this

category is Density-Based Spatial Clustering (DBSCAN). DBSCAN clusters groups

of nodes based on a distance measurement and the minimum number of points. It

takes, therefore, two parameters: an user defined distance measurement and an under

defined density (minimum number of points per cluster). In this thesis, DBSCAN

is one of the technique used for comparison. One of DBSCAN’s feature is that it is

suitable for noise-based applications. That is, the technique can detect noise - the

nodes that do not fit in any cluster. This might be help in traffic clustering, since

there might be some traffic points that are not related to any other traffic point. I

use this technique to compare against my technique.

Chapter 3: Background 15

3.1.3 Centroid-based clustering

Centroid-based clustering techniques assume a pre-defined number of centers, that

may not necessarily be a member of the data elements. A well-known centroid-based

clustering technique is k-means. In k-means, the number of clusters are defined

apriori. The nodes are assigned to the nearest cluster center, such that the squared

distances from the cluster are minimized [14]. The new center clusters are updated

by calculating the mean of the data points in the clusters. The process terminates

when there is no change in the cluster centers. In the traffic clustering problem, it is

difficult to predict the number of clusters apriori. Therefore, in this thesis, I no not

use k-means clustering.

3.1.4 k-medoids

The k-medoids is similar to k-means with the constraint that the center cluster

is one of the data points. Initially, k-medoids (or exemplars) are chosen. Then the

Euclidean distance is calculated between the data points and medoids. The data

point is assigned to the closest medoids. The new medoids are found by choosing

one data point from a cluster that minimizes the cluster variance. By doing this,

the algorithm minimizes the sum of dissimilarities between points labeled to be in a

cluster and a point designated as the center of that cluster. This is in contrast to the

k-means algorithm that minimizes the total squared error. k-medoids is called as an

examplar based algorithm. Both, k-means and k-medoids are sensitive to the initial

random selection of centroids or examplars.


3.1.5 Affinity propagation

Affinity propagation is a centroid-based clustering algorithm. It is an exemplary

based algorithm proposed by Frey and Dueck [15]. Unlike clustering algorithms such

as k-means or k-medoids, affinity propagation algorithm does not require the knowl-

edge of k apriori. Unlike k-medoids, all data points are considered as examplars.

That is, each node considers itself as a cluster center, at the same time looking for

another suitable cluster center. It is a messaging passing distributed clustering al-

gorithm. This algorithm is suitable for the traffic clustering problem. Each node in

the road network can compute independently and exchange information without any

centralized control.

The affinity propagation algorithm works as follows. Initially, every node is the

center of itself. Each node computes and shares two types variables (messages):

responsibility and availability. Responsibility and availability are pair-wised vari-

ables. Responsibility r(i, k) represents how well k is the center of i, and availability

a(i, k) shows how suitable i is a member of k. Initially, both responsibility and avail-

ability are set to 0. Responsibility is updated before availability using the equation

below:

r(i, k) = s(i, k)− maxk′s.t.k′ 6=k

{a(i, k′) + s(i, k′)} (3.1)

To compute responsibility r(i, k), the algorithm finds another data point k′

that

has the highest (maximum) availability and similarity, and computes the difference

in the similarity. In addition, responsibility r(i, k) represents how well k is the center

of i, so it does not only consider how similar i and k are, but also considers which one


of i and k is more suitable be the center. Self responsibility r(k, k) could be negative

or positive. If it is negative, it implies that the node is more likely to be a member

of some cluster rather than the center of a cluster.

On the other hand, availability is computed from the maximum responsibility of

all the points. If there is another data point (i′) that has higher responsibility to the

current point k, then the node i is very less likely to be available. In each iteration,

each vertex updates and exchanges its responsibility and availability. Availability is

updated by using the following equation below:

a(i, k) =

∑

i′s.t.i′ 6=k max{0, r(i′, k)}, i = k

min{0, r(k, k) +∑

i′s.t.i′ 6∈{i,k}max{0, r(i′, k)}}, i 6= k

(3.2)

To determine how suitable a node is a member of itself, the algorithm calculates

the summation of the nodes positive responsibilities. The reason to pick only positive

results is that only good centers should be considered into competition. When a node

k is assigned to a cluster, r(k, k) could be negative. This could impact availability,

a(i, k), which will very likely be lower than 0. If there are some nodes i′ that have

positive responsibility r(i′, k), the availability a(i, k) could be increased. On the other

hand, if there is a node k that is extremely similar to i, r(i, k) could be extremely

high. To avoid these extreme value, the algorithm does not allow any value greater

than 0. Termination is detected if either the algorithm converges or a pre-defined

number of iterations are selected. The nodes are assigned to the clusters with center

k that have maximum a(i, k) + r(i, k) since both the fitness of k, the center of i and

i, the member of k are maximized. The pseudo-code of the algorithm is given below


in Algorithm 1.

Algorithm 1: Affinity Propagation Clustering

Data : Similarity matrix with size of N * N: s(i, j)i,j∈N

Result: Cluster labels: C(i), i ∈ N

1 ∀i, k : a(i, k)← 0 ;

2 while not converge do

3 ∀i, k : r(i, k)← s(i, k)−maxk′s.t.k′ 6=k{a(i, k′) + s(i, k′)};

4 ∀i, k : if i == k then

5 a(i, k) =∑

i′s.t.i′ 6=k max{0, r(i′, k)};

6 else

7 a(i, k) = min{0, r(k, k) +∑

i′s.t.i′ 6∈{i,k}max{0, r(i′, k)}};

8 end

9 end

10 for i ∈ N do

11 C(i) = argmaxk{a(i, k) + r(i, k)};

12 end

3.1.6 Clustering metrics

Silhouette coefficient

One of the most popular clustering quality measurement is Silhouette coefficient

[49]. It evaluates the similarity of the items inter- and intra-clusters. It examines how

an item is similar to current cluster over other clusters. A high Silhouette coefficient

indicates a high cluster quality. It can be written as equation 3.3, 3.4, and 3.5, where


Ci is the node i in cluster C, d(i, j) is the dissimilarity of i and j.

a(i) =1

|Ci| − 1

∑j∈Ci,i 6=j

d(i, j) (3.3)

b(i) = mink 6=i

1

|Ck|∑j∈Ck

d(i, j) (3.4)

s(i) =

1− a(i)/b(i), a(i) < b(i)

0, a(i) = b(i)

b(i)/a(i)− 1, a(i) > b(i)

(3.5)

Similarity mean

Similarity mean is the mean similarity of all nodes in a cluster. Different with

Silhouette coefficient, it only consider the intra-cluster similarity. A high similarity

mean indicates a high cluster quality. Similairty mean of a single cluster similarity

mean can be compute as equation 3.6. The mean of all cluster similarity mean is the

final similarity mean.

csm(C) =

∑∀i,j∈C,i<j s(i, j)

|C|(3.6)

3.2 Artificial neural network

Artificial neural networks (ANN) are inspired by the neurons in the human brain.

ANN tries to learn the pattern in the data set without any task-specific rules. ANN


is constructed by neuron layers, in which the number of neurons construct a layer.

Layers are directly connected to each other and weights are assigned between layers.

Usually, there are three types of layers: input layer, hidden layer, and output layer.

Input layer is the first layer. This layer deals with the input and transmits the data

that the hidden layer can use. Hidden layer does the core of learning. The number of

hidden layers is not restricted and depends on the user. Depending on the different

type of neurons in the hidden layer, the hidden layer may have different functionality.

Large number of hidden layers make the computation more intensive. The last layer

is the output layer. Users may add as many as neurons as they need in any layer,

and as many connection as they need between consecutive layers to increase accuracy

or get the desired output. However, this increases the computation overhead. Figure

3.1 is an example of ANN with one hidden layer. There are 4 neurons in the input

layer, 5 neurons in the hidden layer, and 4 neurons in the output layer.

Each layer performs a predefined activation function using the weight matrix and

bias vector. Different activation functions have different functionality. For example,

sigmoid function is one of the most widely used activation function. This function can

take any input and convert the input into a non-linear distribution. Weight matrix

assigns a weight to each input, and bias vector adds or multiplies a bias number to

the final output.The final result computed by ANN and true result are compared by

a loss function. Depending on the application, the loss function will result in a value

that shows the accuracy of ANN. Based on this loss function, the weight matrix and

bias vector are fine tuned in every iteration by the back propagation algorithm.


Figure 3.1: Example of ANN

3.2.1 Recurrent neural network

There are many types of neural networks available in the literature. Among them,

I choose the recurrent neural network (RNN). RNN is a regular ANN where some

neurons (especially in the hidden layer) have feedback connections to themselves.

That is, their outputs are fed as inputs. The feedback connection of the neuron to itself

acts as a kind of memory element. This element that takes into account the present

decision, the history of decisions previously taken, and hence the previous data. RNN

is therefore suitable for times-series data or temporal data. In our problem, the traffic

data is related to both space and time. In clustering the data, we find the influence

between the traffic points in the same time stamp. Using RNN we can predict the

traffic flow in future time stamps using the data available at present.

RNN fails to remember long term dependencies. To overcome this drawback, Long


short-term memory (LSTM) neural network was introduced [25]. LSTM stores both

long-term information and short-term information. It is powerful in sequence related

problems (e.g. text prediction). LSTM has the ability to loop back. Figure 3.2 shows

a neuron, with input xt entering into the neuron and producing neuron output, ht.

Figure 3.2: Normal Neuron

Figure 3.3 shows a typical LSTM unit with feedback loop. It may consist of many

layers. The result form ht can loop back multiple times. The advantage of this is

it can take more than one time stamp data into consideration. For example, figure

3.4 shows an unrolling of LSTM neuron. x1 is the input data at time 1. This data

may affect the output result at time 3. Consider an example, “a tree has many

leaves, ...”. “tree” is a key word in the sentence in order to predict ”leaves”. The

feedback loop of LSTM takes “tree” into the prediction of “leaves” at time stamp

3. In this example, the gap between two time stamps is very small. The words

could be memorized in short term memory. However, realistically, the gap between

keywords and the prediction could be very large. The keywords could be in stated

in the first paragraph, but the prediction could take place in the third paragraph.

Fortunately, the long-term memory in LSTM is able to memorize the keywords to

solve the problem.

There are many variants of LSTM units. For example, peephole [21] LSTM unit

is one of the variants that fits well in the traffic flow clustering problem. Figure 3.5

shows the structure of peephole LSTM that mainly consists of four layers: memory


Figure 3.3: LSTM Unit

Figure 3.4: Unrolled LSTM Neuron

cell, input gate layer, output gate layer, and forget gate layer. The Input gate layer

considers the incoming data. The Forget gate layer, as the name implies, forgets the

unnecessary data. Output gate computes the final output. The Memory cell layer

stores the output from the previous iteration. Input gate, forget gate, and output


Figure 3.5: LSTM Overview

gate are sigmoid layers which apply sigmoid function. A sigmoid function transfers

the input into sigmoid curve with range [0, 1]. It is defined in equation 3.7.

σ(x) =1

1 + e−x(3.7)

Figure 3.5 shows the overview of LSTM structure. I use the following notations:

• Input data: X = (x1, x2, . . . , xn)

• Output data: Y = (y1, y2, . . . , yn)

• Hidden state of memory cell: H = (h1, h2, . . . , hn)

The Forget gate decides what to forget. The current input xt and the previous


memory ct−1 are the inputs to Forget gate. The Forget gate computes the sigmoid

function with these two input multiplies with a weight matrix Wf plus the bias vector

bf . The output of forget gate ft(Equation 3.8) is a value in the range [0, 1], where a

date value of 0 implies withdraw this data and a data value of 1 implies completely

keep this data.

ft = σ(Wf · [ht−1 + xt + ct−1] + bf ) (3.8)

As show in equation 3.9 similar to Forget gate, Input gate also takes xt and ct−1

as input. However, the purpose of this is opposite to Forget gate. This gate decides

what to store in memory for the next iteration.

it = σ(Wi · [ht−1 + xt + ct−1] + bi) (3.9)

Another copy of the input is copied to the Input modulation gate, which is just a

modification of sigmoid function. The changes in the output range from [−2, 2]. This

is denoted as g. The output of the Input gate and Input modulation gate are added

to compute ct, the data needed to be store in memory cell (Equation 3.10).

ct = ft ∗ ct−1 + it ∗ g(Wc · [ht−1 + xt + ct−1] + bc) (3.10)

The last step is to decide what to output. Similar to Input gate and Forget gate,

Output gate Ot, outputs the result. h is another modification of sigmoid function

that uses the output range [−1, 1]. It takes the current memory ct as input. The

multiplication of Ot and h(ct) is the final output of LSTM, as shown in equation 3.11.


ht = ot ∗ h(ct) (3.11)

After I get the final output, I need to apply a loss function for a comparison and

to determine the loss or accuracy loss of the result. In this thesis, I pick square loss

function, Equation 3.12. It sums the squared error between the predicted results pt

and true result yt.

e =n∑t=1

(yt − pt)2 (3.12)

Once I get the loss, I apply back propagation algorithm to fine tune the hidden

states, weights, and bias. Back propagation through time (BPTT) is a technique

designed for RNN back propagation. In my thesis I choose Adam optimizer [30].

Adam optimizer is the combination of Adaptive Gradient algorithm and Root Mean

Square propagation. It take advantages of both these techniques and works well

with sparse gradients and non-station settings. Adam optimizer requires some hyper-

parameters: α, β1, β2, ε. I use the parameters recommended by Keras [29]: α =

0.001, β1 = 0.9, β2 = 0.999, ε = 10−8. Additionally, the loss function is donated as

L(x) and gradient function is donated as ∇f(x). The gradient function of f(x) dot

product a vector v is the directional derivation of f(x) along v, which is given as

equation 3.13.

(∇f(x)) · v = Dvf(x) (3.13)

With the hyper-parameters, loss function L(x) and gradient function ∇f(x), the


Adam optimizer algorithm is given below:

Algorithm 2: Adam Optimizer

Data : α, β1, β2, ε,W

Result: WT

1 M0 = 0;

2 R0 = 0;

3 for t ∈ T do

4 Mt = β1Mt−1 + (1− β1)∇L(Wt−1);

5 Rt = β2Rt−1 + (1− β2)∇L(Wt−1)2;

6 Mt = Mt/(1− (β1)t);

7 Rt = Rt/(1− (β2)t);

8 Wt = Wt−1 − α Mt√Rt+ε

;

9 end

10 Return WT ;

Chapter 4

Dynamic Traffic Clustering System

This chapter attempts to answer the following question: How does congestion

caused by an unfortunate event on a road segment affect other roads not necessarily

close in proximity to the congested road segment?. I take advantage of the capabilities

of the IoT and propose a dynamic traffic awareness system for urban driving. The

system finds all the road points affected by the traffic at road point r at time t, groups

them together to predict the effect of traffic on this group of nodes. Grouping the

nodes is nothing but clustering since they have similar features, in this case traffic

flow. I develop a traffic aware system using IoT technologies and sensors around road

points, that dynamically collects and analyzes the traffic flow data to compute the

similarity function between road points. I use the concepts from network theory, in

particular maximum flow and shortest path algorithms, and a distributed, message

passing algorithm to cluster the nodes that is executed continuously to capture up to

date information about traffic. I evaluate the system during peak and non-peak hours

and against static clustering algorithms and show the performance of my dynamic

28

Chapter 4: Dynamic Traffic Clustering System 29

clustering algorithm.

4.1 System overview

In this section, I propose [62] a real-time dynamic traffic network clustering system.

There are several phases in my system design. I explain these using Figure 4.1. The

sensors on road side units collect traffic data continuously. The collection interval can

be fine-grained (in nanoseconds) or coarse-grained (hours). As the figure indicates, I

collect two types of data: static data and dynamic data.

4.1.1 Static data collection

This phase is computed only once. From this data I obtain the length of the roads

and all the nodes (sensors) adjacent to the nodes. I require these information later

in developing the dynamic system. These are collected only once before starting the

dynamic clustering phase.

4.1.2 Dynamic data collection

The dynamic data are collected by wireless sensors. The wireless senors placed on

the road side units, collect real-time traffic information and transfer these information

via wireless network. The data is collected periodically and the results reported every

5 minutes. From this data, I count the number of vehicles that pass through the

sensors of the sensors to capture the traffic flow in real time.

30 Chapter 4: Dynamic Traffic Clustering System

Figure 4.1: Dynamic Traffic Clustering System

4.1.3 Communication between Sensors

In this phase, the sensors exchange sensor data, in particular, road side informa-

tion. This is needed to compute the similarity matrix for clustering the nodes used

in the affinity propagation algorithm.


4.1.4 Traffic clustering: Compute all pair-wise similarity

This phase is Step 4 in Figure 4.1. In any clustering algorithm, determining the

similarity metric is crucial to obtain good results. Different clustering methods use

different information obtained from traffic data to calculate similarity measurement.

For example, some research papers [71] only use speed while others use only number

of vehicles on the road or the average distance between nodes. The distance between

road points is static and will not provide real-time traffic information. Average speed

can be captured in real-time, however, it requires complex calculations in finding

the relationship of road points with potential accuracy loss. Compared to these

parameters, traffic flow directly represents the relationship between road points.

In this thesis, I use traffic flow as the parameter to find the traffic clusters. The

idea is that the amount of traffic flow entering a road point must eventually leave the

road point. This is based on the maximum flow theorem [22; 23; 7] which states that

the amount of flow into a node is equal to the amount of flow out of the node. The

number of vehicles from one road point s to another road point t influences the traffic

on the road and the traffic flow. I use this information in the similarity function.

Similarity stands for how well a node is similar/dissimilar to another node. Usu-

ally, when people try to cluster a graph, a similarity is represented by negative Eu-

clidean distance. Euclidean distance is the distance between two nodes in Euclidean

space, it is consider the dissimilarity of two nodes, on the graph, it is two-dimensional,

the negative Euclidean distance is just the negative value of Euclidean distance, which

convert the dissimilarity to similarity. The following formula shows the negative Eu-

clidean distance between node p and q on x and y axis.


d(p, q) = −√

(px − qx)2 + (py − qy)2 (4.1)

However, traffic is complex, the influence between traffic points can not be easily

represented just by using the distance between them. Real-time parameters such as

weather, accidents can influence the traffic condition on the road. Therefore, it is

important to find the true real-time influence between traffic points. There are two

type of data I can make use of: speed and volume. In this research, traffic speed

implies the average speed of vehicles that pass through the sensor every 5 minutes,

and traffic volume or flow implies the number of vehicles that pass through the sensor

every 5 minutes.

Traffic speed is real-time information on the traffic roads. The change of traffic

between road points can show the relationship and influence of traffic between the

road points. However, it is not the best method to measure the influences of traffic

between road points since the unpredictable parameters that affect the traffic speed.

For example, heavy snow can result in a reduction in traffic speed. Such details are

not provided in the data.

On the other hand, traffic flow information is easier to gather and better represent

the influences between traffic points. Calculation of the traffic flow requires the

number of vehicles and not the speed of the vehicles. This information is easier to

collect using sensors. The amount of traffic flow on the road can represent the traffic

influences between road points directly. For example, if a road point has strong

relationship with another road point, the number of vehicles that passes through

these road points, influence each other. According to the maximum flow algorithm


in network theory [9], flow in is equal to flow out. The number of vehicles that enter

a road must eventually leave the road point. In this thesis, I use traffic flow as one

of the parameters to find the similarity metric.

Consider a driver who wants to travel from p to q. Usually, the driver will pick

the shortest path between these two destinations. The similarity as mentioned before

is how the two road points are similar to each other. In this case, it is the flow

between the road points. The flow on the shortest path is denoted by f(p, q). I then

normalize the flow to use in the clustering algorithm by dividing the flow by the total

amount of incoming flow into q ,represented as inf(q). This way, the similarity values

are distributed between [0, 1] making it easier to use within the clustering algorithm.

Furthermore, most clustering algorithms requires the similarity to be symmetric. This

implies that the similarity s(p, q) must equal s(q, p) and I take the mean of these tow

values. Note that the similarity values are still distributed between [0, 1]. I assume

that a node always has similarity 1 to itself, which is the maximum possible similarity.

The equation below shows the similarity between p and q:

s(p, q) =

1, p = q

( f(p,q)inf(q)

+ f(q,p)inf(p)

)/2, p 6= q

(4.2)

By computing the similarities between all node pairs, I can build a similarity

matrix that could be used in any clustering algorithm.


4.1.5 Affinity propagation

Once I set the similarity measurement, the next step is to apply the affinity

propagation clustering algorithm. Affinity propagation considers every data point

as an exemplar - center of the cluster. In each iteration, two kinds of messages are

sent between vertices: responsibility r(i, k) and availability a(i, k). r(i, k) is sent from

i to k which represents how well k is the exemplar of i, and a(i, k) is sent from k to i

which shows how suitable i is a member of k. s(i, k) is the similarity computed from

Equation (4.2). To compute responsibility r(i, k), the algorithm finds another data

point k′

that has the highest (maximum) availability and similarity, and computes

the difference in the similarity. If the current existing similarity is a lot higher, then

it (i) produces a very high responsibility. On the other hand, availability is computed

from the maximum responsibility of all the points. If there is another data point (i′)

that has higher responsibility to the current point k, then it (i) is very less likely to be

available. In each iteration, each vertex updates and exchanges its responsibility and

availability. I can determine the termination condition as the number of iterations,

or the convergence of exemplars. In this system, the algorithm runs continuously

to capture the dynamism in the road traffic. The algorithm detail is introduced in

chapter 3.

4.1.6 Clustering results

Although the algorithm does not terminate, I can always take a snapshot when-

ever I want, which is the step 8. At anytime, the user may send a query to the

system. System then take a snapshot of all nodes with their current availabilities and


responsibilities. A node i is belongs to the exemplar k that has highest a(i, k)+r(i, k).

4.2 Experiment

4.2.1 Experiment Setup

In the first set of experiments, I evaluate the quality of the system. Since there are

no existing dynamic clustering algorithm, I use the static version of the existing al-

gorithms to evaluate the quality of results. I compare affinity propagation (AP) with

different clustering algorithms to see the clustering solution quality. The algorithm

I choose are k-medoids (KM), DBSCAN, and average-linkage clustering (AGG). In

my case, I use k-medoids, an exemplar algorithm similar to affinity propagation. k-

medoids also accepts similarity matrix. In addition, I also evaluate average-linkage

clustering algorithm which also accepts the similarity matrix. Some of these algo-

rithms use dissimilarity matrix. For each similarity s(i, j) in similarity matrix, dis-

similarity d(i, j) = 1− s(i, j). With respect to the parameter setting, KM and AGG

clustering require the number of clusters apriori unlike AP. In AP, there is an ability

to obtain different number of clusters by changing the default self-similarity in the

input.

4.2.2 Comparison Setup - Correctness of results

Please note that I am comparing AP clustering algorithm with modified similarity

measurement against static algorithms that require the knowledge of the number of

clusters apriori. For fair comparison, I ensure that all algorithms use the same number


of clusters, although some algorithms are static. I set the number of clusters in KM

and AGG clustering equal to the number of clusters found by AP. DBSCAN does

not require the number of clusters as parameter, but it requires the maximum intra-

cluster similarity and the minimum cluster size, in the experiment I allow two points

to be in the same cluster if the dissimilarity is less than 0.5, and a point by itself can

be a cluster.

4.2.3 Data set and environment

I use AP, DBSCAN and AGG from Scikit-learn library [43], KM from pyclustering

library [41]. Programs are implemented in python. I evaluate the algorithms on

macOS with 4 cores of 2.2 GHz Intel Core i7 and 16 GB 1600 MHz DDR3 memory.

The dataset we use is collected from CityPulse [58]. It contains the traffic network

with 449 roads with directions and 136 road points from the city of rhus, Denmark.

In meta data, it provides the location of all sensors, and the entire network. The

real-time traffic data is recorded every 5 minutes, it contains the time stamp, the

average speed, and the number of vehicles.

4.2.4 Cluster Quality

In order to measure the quality of the solutions, I use Silhouette coefficient [49].

Other techniques such as Davies Bouldin-index or v-measure require centroid or

ground truth. Ground truth is difficult for a real traffic data. Additionally, I in-

clude the mean similarity within all clusters. A mean similarity is the average of all

pairs of intra-cluster similarity. A higher mean shows a higher intra-cluster similarity.


07:30 07:35 07:40 07:45

−0.1

0

0.1

Silhou

ette

Coeffi

cien

t

AP KM DBSCAN AGG

Figure 4.2: Peak Hour Silhouette

07:30 07:35 07:40 07:455 · 10−2

0.1

0.15

0.2

0.25

0.3

Sim

ilar

ity

Mea

n

AP KM DBSCAN AGG

Figure 4.3: Peak Hour Similarity Mean

To simulate the real-time clusters, I evaluate the data on 2014-08-02, and each

experiment uses the data recorded in 5 minutes in the first experiment. I show both

peak hours (7:30 AM - 7:45 AM) and non-peak hours (0:00 AM - 0:15 AM).

Figure 4.2 shows the Silhouette coefficient at peak hours. DBSCAN shows the

worst results. DSBCAN’s Silhouette coefficient is negative. This implies most items


00:00 00:05 00:10 00:150.1

0.15

0.2

0.25

0.3

0.35

Silhou

ette

Coeffi

cien

t

AP KM DBSCAN AGG

Figure 4.4: Non-Peak Hour Silhouette

are not properly placed in the right clusters. KM is in the third place with its

Silhouette coefficient being positive. However, it is not as good as AP or AGG. In

general, AP has better Silhouette coefficient than AGG. However, AGG is little better

than AP at 07:35. Thus at peak hours, AGG might be better than AP, but overall,

AP has better Silhouette coefficient.

Figure 4.3 shows the mean similarity at peak hours. Similar to figure 4.2,

DBSCAN is still the worst. KM outperforms AGG in three instances, but still, at

07:35, AGG shows a higher mean than KM. AP has highest mean than all other

algorithms. Overall, at peak hours, AP achieves best results in most cases. AGG in

some instances outperforms AP, however, it has a lower mean than AP.

Figure 4.4 and figure 4.5 show the Silhouette coefficient and mean at non-peak

hours. Unlike figure 4.2, figure 4.4’s DBSCAN shows positive results and sometimes

performs better than KM. DBSCAN and KM are still worse than AP and AGG. I

also notice that AGG is not able to outperform AP in any instance. AP always has


00:00 00:05 00:10 00:15

0

0.1

0.2

0.3

0.4

Sim

ilar

ity

Mea

n

AP KM DBSCAN AGG

Figure 4.5: Non-Peak Hour Similarity Mean

the best Silhouette coefficient. AP and AGG are still close compared to KM and

DBSCAN. With respect to mean, AP outperforms all other algorithms. DBSCAN is

unable to reach AP.

None of the experiments show a high silhouette coefficient nor high mean (≥ 0.5).

This is due to the characteristics of the similarity measurement and data sets. In

order to produce a similarity higher than 0.5, two road points must have the most

flow between each other, which is not possible in real world.

4.2.5 Number of Clusters

One advantage of AP is that it does not require the number of clusters to be

pre-defined. It makes the algorithm easier to use since users do not need to find

a good number of clusters in advance. Other algorithms such as KM require the

number of clusters in advance and may not give reasonable result if accurate number

of clusters is not given initially. Furthermore, the number of clusters provided by the


AP algorithm will provide useful information to drivers. Therefore, I examine the

number of clusters in one day period.

Figure 4.6: Number of Clusters

I execute the algorithm dynamically and continuously to provide real time traffic

flow information. figure 4.6 shows the change in the number clusters found by AP in

a day, every 30 minutes. At non peak hours, especially mid-night, there is few traffic

flow on the road, thus most roads have low similarity with each other, therefore, each

road may belong to its own cluster. This results in a large number of clusters. I

obtain a dramatic reduction in the number of clusters at around 7:30 AM. When the

time comes to rush hours, the traffic flow increases, and AP is able to cluster the

traffic network based on the traffic flow. The number of clusters further decreases

until 9:00 AM, since peak hours have very large flow and thus very large similarity

between the roads, the algorithm is able to partition the clusters easily. However, if

the traffic volume is not that large, the similarity between each point will not be that


sufficient, it is more difficult for the algorithm to partition the clusters.

From 00:00 to 8:00, the number of cluster decreases, since from 00:00, the traffic

volume is low, so all nodes have low similarity to each other. When the traffic volume

increases, the similarity become higher, and thus AP is able to find the clusters. From

8:00 to 9:00, I know this is not a peak hour and the number of cluster decreases. This

is due to the smooth and steady traffic flow. After peak hours, each road point does

not have a very large, dominated flow to other points and the algorithm obtains a

further decreased number of clusters.

These observations convinces me that the dynamic system works well during peak

and non-peak hours. As it should be, during non-peak (midnight) hours the number

of clusters increase due to low traffic flow and number of clusters decrease during

peak hours (early morning-more similarity with respect to traffic between) and right

after peak-hours (8:00-9:00 AM) the flow is steady.

Chapter 5

Cluster-based Traffic Prediction

In studying the traffic congestion problem, there are two types of locality we need

to consider: temporal locality and spatial locality. Spatial locality implies that if a

road r is affected by traffic, then the surrounding roads of r are also affected. Temporal

locality implies that if r is affected at time t, chances are that the surrounding roads

of r will also be affected at time t + 1. Unfortunately, this is not enough. The

surrounding roads are not the only ones affected. Over a period of time, t + δ, there

will be a domino effect. That is, the affected roads will in turn affect other roads

which in turn affect other roads and so on. The congestion is not stationary. It is

dynamic and it spreads. The event created by the road point (r) at time t and its

effect on the road traffic at time t+δ is difficult to capture by the smart technologies

currently. Predicting the future of any event is in generally difficult. In this chapter,

I attempt to solve this problem. I use machine learning techniques such as long-short

term memory to predict traffic on the road points formed by the clusters discussed

in the previous chapter.

42

Chapter 5: Cluster-based Traffic Prediction 43

5.1 Predicting traffic per cluster

LSTM is a powerful tool for prediction. LSTM is used in Deep Learning tasks.

Deep learning models are complex networks that allows extraction of even higher

level of information. Deep learning methods using auto-encoders have been used

problems such as travel route prediction problem [35]. In this thesis, as discussed in

section 3.2.1, the peephole [21] unit, a variant of long-short term memory model is used

for traffic prediction. In the literature, LSTM has shown to produce good solution

for time series data [13]. Traffic flow data is time series data. In the literature, LSTM

with single time-series flow data has been used [16; 37; 57]. However, LSTM has

the ability to handle more features in the data to improve the prediction accuracy.

Finding useful data in features is not an easy task. Traffic is both temporally and

spatially correlated.

In general, using the time series data available for the road points, I can find the

temporal correlation between two road points. How traffic influences each road point

at time t could be obtained. Since these are two different road points, I can find

the spatial correlation between the road points. The clustered data will be fed into

the LSTM neural network to predict the traffic flow at time t + δ. There are three

prediction designs: one to one, many to one, and many to many.

Figure 5.1 is an example of one to one prediction: one time-series data, one

prediction. In this figure I use the data of the road from time t = 1,2,...,8 to predict the

outcome at time 9. It is the most traditional way of predicting traffic flow [16; 37; 57].

However this method does not consider the spatially correlation between road points.

Figure 5.2 is many to one design: many time-series data to one prediction. Assume

44 Chapter 5: Cluster-based Traffic Prediction

Figure 5.1: One to One Prediction

road 1, road 2, and road 3 are in the same cluster because they have strong influence

on each other. Using the data available from roads 2 and 3, it is potentially helpful to

predict the outcome of road 1. In the figure, the data points for time t = 1,2,...,8 for

the three roads are used to predict what happens at time 9 for road 1. I can also say

that the traffic flow on road 2 and road 3 may influence road 1. This would improve

the prediction accuracy. However, it might also increase the computation overhead

since more data is required to compute or predict.

Figure 5.3 is many to many design. I am using the capabilities of LSTM to predict

not only one road at a time, but to predict for multiple roads. By feeding the data of

one cluster (which includes multiple road points), LSTM can predict the outcome of

the entire cluster. However, compared to many to one prediction, the computation

overhead is higher. However, if the prediction of the entire cluster is required, it will


Figure 5.2: Many to One Prediction

be more efficient since it can do the predictions in one shot rather than many to one

prediction, that predicts one road at a time. On the other hand, predicting the entire

cluster will increase training time since more data is fed in. I believe that with a high

performance parallel computer, the training time could be reduced. Please note that

I do not consider parallelization in this thesis.

Figure 5.3: Many to Many Prediction


5.2 Time stamp clustering

In order to make the best use of many to one and many to many prediction

model, the input time-series must have strong influences to each other. In chapter

4, I introduced a dynamic traffic clustering system, and I showed that the affinity

propagation clustering algorithm showed the best overall solution quality. Prediction,

however, is static. Each cluster can be trained statically using LSTM.

As discussed in the previous chapter, I collect data at each time interval. In the

experiments, I did this every five minutes (t = 5). Every time interval I cluster the

road points to capture the traffic influence on each other. At time t+ 5 minutes (say

10 minutes), I find another cluster. I can do this for say 1 hour. In this example, I

have 12 different clusters but a data point in one cluster at time, t = 5 may exist in

another cluster at time t = 15 for example. Since prediction is static, I design a static

clustering technique as follows. This is a bit more general and can be used any time.

I propose to use a cluster merge technique to merge the time stamp clusters with

all time clusters. To do this, first, I build a similarity matrix s[N,N ] that initially

values to 0. Then I start to cluster the traffic from first time stamp to last time stamp

in a month. If in a time stamp, a node A is in same cluster as node B, I increase

the similarity of the nodes by 1, that is, s[A,B]+ = 1. After all time stamps are

evaluated, I normalize the similarity matrix s[N,N ] by s[N,N ] = s[N,N ]−min(s[N,N ])max(s[N,N ])

so all values in similarity matrix will be in range s[N,N ]. Now, I can apply this

similarity matrix to any clustering algorithm that I would like to use to generate the


final static traffic clusters. The pseudocode of the algorithm is shown below.

Algorithm 3: Cluster Merge

Data : Clustering Algorithm A(x), Time stamp similarities

TS = {ts0, ts1, . . . tsT}

Result: Cluster labels: C(i), i ∈ N

1 s[N,N ] = 0;

2 for t ∈ T do

3 Ct = A(tst);

4 ∀i, j ∈ N : if Ct(i) == Ct(j) then

5 s[N,N ]+ = 1;

6 end

7 end

8 s[N,N ] = s[N,N ]−min(s[N,N ])max(s[N,N ])

;

9 C = A(s);

10 Return C;

The reason I do not merge the flow data but merge the cluster is to ensure that

each time stamp and each cluster have the same weight in the final cluster. In a road

network, most traffic flow happens in peak hours. If I merge the cluster data, the

non-peak hour clusters will have less weight. Since the prediction of traffic flow could

take place at anytime, I choose to merge the clusters based on time stamp and not

traffic flow.


5.3 Experiment

The program in implemented in Python by using Keras library [8]. The exper-

imental machine is a 4.2 GHz Intel i7-7700k processor with 16 GB memory. The

operating system used is Unbuntu 18.04 LTS. My LSTM design has four hidden lay-

ers. The first two are LSTM layers. Both these layers have 64 LSTM units. The third

layer is a dropout layer. The purpose of this layer is to avoid over fitting: this means

that the LSTM is so well trained with the current data set that it does not work well

for other new data sets. There is a one to one mapping to the last layer to decide

whether to dropout the unit in computation. The dropout rate is set to be 0.2. The

last layer is a Dense layer. Each neuron in this layer is connected to all neurons in

dropout layer. I then apply a sigmoid function. The size of Dense layer is set to be

same as the output size. In one to one and many to one model, it has only 1 unit,

for many to many model, the number of units is the same as the cluster size. For the

back propagation algorithm, Adam optimizer [30] is used. I use the recommended

parameters from Keras [29], which is shown in table 5.1.

α β1 β2 ε0.001 0.9 0.999 10−8

Table 5.1: Adam optimizer parameters

Data set is collected from CityPulse [58]. I use the data from 2014-03-01 to 2014-

05-30. Data from 2014-03-01 to 2014-04-30 is used as training data, and data from

2014-05-01 to 2014-05-30 is used as test data. I use the past 1 hour data (12 time

stamps) to predict the next 5 minutes (1 time stamp).

To evaluate the prediction quality, I use mean square error (MSE, equation 5.1),


mean absolute error (MAE, equation 5.2), explained variance (EV, equation 5.3), and

R2 regression (R2, equation 5.4) to evaluate. MSE, MAE are the errors found in the

result. We should expect a lower result. EV and R2 shows how well the result is.

So, if the result is to be classified as good, the values should be high. The highest

possible value of EV and R2 is 1, which means the predicted result is perfectly same

as real data.

MSE =1

n

n∑i=1

(yi − pi)2 (5.1)

MAE =1

n

n∑i=1

|yi − pi| (5.2)

EV = 1− V ar(y − p)y

(5.3)

R2 = 1−∑n

i=1(yi − pi)2∑ni=1(yi − y)2

(5.4)

All three models (one to one, many to one, many to many) are evaluated. 600

echoes are executed in each training. I choose road 180709 which is one of the most

busies road in data set as my experiment road. By applying affinity propagation and

cluster merge algorithm, there are 51 more roads in the same cluster with road 180709.

In all experiments, the time cost is about 66 µsec per step during training, and same

as prediction, the time costs are same. I assume from this observation that by adding

more features to the input data, the impact on performance is too insignificant to

observe. In this experiment, 52 features has no impact on performance compared


with only 1 feature.

Figure 5.4: Prediction Result

Figure 5.4 shows the result of many to one prediction of first day, note the traffic

flow is normalized within [0, 1]. for peak hours, the result is not very accurate since the

flow in peak hours can be extremely different depending on the day. The prediction

is very close to the real traffic flow value and the floating pattern. Thus in general,

LSTM shows good solution quality.

In addition, I studied how much benefits I can obtain from the clustered traffic.

That is, how much accuracy I can achieve by adding more features. Therefore, I

evaluated all 3 models. One to one model is just normal LSTM prediction, read


road 180709 as input, and predicted road 180709 only. Many to one model read the

entire cluster of road 180709 as input, and predicted road 180709 only. Many to many

model read the entire cluster of road 180709, and predicted the future traffic for entire

cluster of road 180709. The results are included in table 5.2. In all 4 metrics, many to

many achieves the best result, many to one is in second place, and one to one model

is the worst. Many to one and many to many model is better than one to one model

as expected. The clustered traffic data indeed improves the solution quality. MSE

is reduced by 30%, MAE is reduced by 21%, the errors are clearly decreased. Both

EV and R2 has increased by about 7%. Another benefit of many to many model is,

it is able to predict the entire cluster, by training only once. The entire cluster data

can be predicted, compared with one to one and many to one model. It saves huge

amount of time in training. Since I found the increase in features does not increase

the time cost, in this experiment, I train 52 roads together, and reduce the total time

amount to 1/52.

Models MSE MAE EV R2One to One 0.006501 0.051743 0.779400 0.779377Many to One 0.004556 0.042653 0.810834 0.810533Many to Many 0.004004 0.039840 0.836513 0.833502

Table 5.2: Evaluation of LSTM Models

Chapter 6

Conclusion and Future Work

In this thesis, I proposed a dynamic traffic clustering system to forecast traffic

flow on urban roads. I used an IoT network where in each IoT device placed on

the road constantly collects traffic flow and communicates via wireless network to

compute the real-time traffic clusters. These clusters consisted of road points that

had similar traffic flow at a given time period. The clusters were based on the shortest

path between road points and the influence of traffic on this path. I showed that

the algorithm dynamically discovered the clusters during peak and non-peak hours.

I also showed the correctness of the algorithm by comparing the algorithm to static

algorithms using predefined number of clusters. The real-time traffic clustering system

using affinity propagation is decentralized, and offers good solution quality. The

evaluation showed that the affinity propagation clustering technique had overall best

Silhouette coefficient and similarity mean among k-medoid, DBSCAN, and average-

linkage algorithm. The number of clusters found by affinity propagation showed

reasonable number of clusters in both peak hours and non-peak hours.

52

Chapter 6: Conclusion and Future Work 53

The clusters were then trained using a variant of recurrent neural network, Long-

short term memory (LSTM). I evaluated various clustering and prediction metrics to

show the feasibility of the proposed approach. In order to produce a high prediction

quality, both temporal locality and spatial locality were considered. Clustering op-

timized the spatial locality of traffic network while LSTM predicted the traffic flow

based on previous data optimizing temporal locality. I proposed many to one model

and many to many model to predict the traffic flow. Many to one model used cluster

data to predict single road, many to many model used cluster data to predict the

entire cluster. In the experiment, I found there is no performance reduction in many

to one model and many to many model. On the other hand, many to one and many to

many model achieved better solution quality than non-clustered traffic data. Many

to many model can also reduce the training cost, an entire cluster can be trained

together, thus the speedup is 1|C| .

6.1 Future Work

Here are some ideas on future work.

• Improve the similarity measurement.

• Find multiple paths (say k) using bio-inspired techniques such an ant colony

optimization. The traffic flow in top-k shortest path can be considered simul-

taneously in the similarity measurement.

• Implement the affinity propagation on real IoT devices to build a wireless net-

work to test the real performance of it.

54 Chapter 6: Conclusion and Future Work

• Prediction on huge traffic network is compute intensive even with many to many

model. Domain specific architectures such as TPU can reduce the time cost.

Since the clusters can be predicted independently, partitioning the clusters into

different accelerator such as GPU and TPU could help reduce the time cost.

Bibliography

[1] Ahmed, M. S., and Cook, A. R. Analysis of freeway traffic time-series data

by using Box-Jenkins techniques. No. 722 in Transportation Research Record.

Transportation Research Board, 1979.

[2] Al-Sakran, H. O. Intelligent traffic information system based on integration

of internet of things and agent technology. International Journal of Advanced

Computer Science and Applications (IJACSA) 6, 2 (2015), 37–43.

[3] autopilot-project. AUTOPILOT.

[4] Berkhin, P. A survey of clustering data mining techniques. In Grouping

multidimensional data. Springer, 2006, pp. 25–71.

[5] Bonabeau, E., Marco, D. d. R. D. F., Dorigo, M., Theraulaz, G.,

Theraulaz, G., et al. Swarm intelligence: from natural to artificial sys-

tems. No. 1 in Santa Fe Institute Studies on the Sciences of Complexity. Oxford

university press, 1999.

[6] Chan, K. Y., Dillon, T. S., Singh, J., and Chang, E. Neural-network-

based models for short-term traffic flow forecasting using a hybrid exponential

55

56 Bibliography

smoothing and levenbergmarquardt algorithm. IEEE Trans. Intell. Transp. Syst.

13, 2 (June 2012), 644654.

[7] Cherkassky, B. V., and Goldberg, A. V. On implementing the push

relabel method for the maximum flow problem. Algorithmica 19, 4 (1997), 390–

410.

[8] Chollet, F., et al. Keras. https://keras.io, 2015.

[9] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Intro-

duction to Algorithms, second ed. MIT Press and McGraw-Hill, 2001.

[10] Deneubourg, J.-L., Goss, S., Franks, N., Sendova-Franks, A., De-

train, C., and Chretien, L. The dynamics of collective sorting robot-like

ants and ant-like robots. In Proceedings of the first international conference on

simulation of adaptive behavior on From animals to animats (1991), pp. 356–363.

[11] dos Santos, W., Carvalho, L. F. M., de P. Avelar, G., Jr., A. S.,

Ponce, . M., Guedes, D., and Jr., W. M. Lemonade: A scalable and effi-

cient spark-based platform for data analytics. In 17th IEEE/ACM International

Symposium on Cluster, Cloud and Grid Computing (2017).

[12] Drane, C. R., and Rizos, C. Positioning systems in intelligent transportation

systems. Artech House, Inc., 1998.

[13] Duan, Y., Lv, Y., and Wang, F.-Y. Travel time prediction with lstm neural

network. In IEEE 19th International Conference on Intelligent Transportation

Systems (ITSC) (Rio de Janeiro, Brazil, November 2016).

https://keras.io

Bibliography 57

[14] Everitt, B. Cluster analysis. Wiley, Chichester, West Sussex, U.K, 2011.

[15] Frey, B. J., and Dueck, D. Clustering by passing messages between data

points. science 315, 5814 (2007), 972–976.

[16] Fu, R., Zhang, Z., and Li, L. Using lstm and gru neural network methods

for traffic flow prediction. In Chinese Association of Automation (YAC), Youth

Academic Annual Conference of (2016), IEEE, pp. 324–328.

[17] Fu, Z., Hu, W., and Tan, T. Similarity based vehicle trajectory clustering and

anomaly detection. In Image Processing, 2005. ICIP 2005. IEEE International

Conference on (2005), vol. 2, IEEE, pp. II–602.

[18] Galdi, P., Napolitano, F., and Tagilaferri, R. A comparison between

affinity propagation and assessment based methods in finding the best number

of clusters. In Eleventh International Meeting on Computational Intelligence

Methods for Bioinformatics and Biostatistics (june 2014).

[19] Garey, M. R., Johnson, D. S., and Stockmeyer, L. Some simplified

np-complete graph problems. Theoretical computer science 1, 3 (1976), 237–267.

[20] Gaur, A., Scotney, B., Parr, G., and McClean, S. Smart city archi-

tecture and its applications based on iot. Procedia computer science 52 (2015),

1089–1094.

[21] Gers, F. A., and Schmidhuber, J. Recurrent nets that time and count. In

Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural

58 Bibliography

Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives

for the New Millennium (2000), vol. 3, IEEE, pp. 189–194.

[22] Goldberg, A. V. Efficient graph algorithms for sequential and parallel com-

puters. PhD thesis, Dept. of Electrical Engineering and Computer Science, Mas-

sachusetts Institute of Technology, 1987.

[23] Goldberg, A. V., and Tarjan, R. E. A new approach to the maximum

flow problem. Journal of the ACM (JACM) 35, 4 (1988), 921940.

[24] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural

computation 9, 8 (1997), 1735–1780.

[25] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural

Computation 9, 8 (1997), 1735–1780.

[26] Huang, S.-H., and Ran, B. An application of neural network on traffic speed

prediction under adverse weather condition. PhD thesis, University of Wisconsin–

Madison, 2003.

[27] Infrastructure Canada. Smart Cities Challenge,.

[28] Kamarianakis, Y., and Prastacos, P. Forecasting traffic flow conditions in

an urban networkcomparison of multivariate and univariate approaches. Transp.

Res. Rec. 1857 (2003), 7484.

[29] keras. Keras: The Python Deep Learning library.

[30] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization.

arXiv preprint arXiv:1412.6980 (2014).

Bibliography 59

[31] Kuntz, P., Snyers, D., and Layzell, P. A stochastic heuristic for visual-

ising graph clusters in a bi-dimensional space prior to partitioning. Journal of

Heuristics 5, 3 (1999), 327–351.

[32] Lee, S., and Fambro, D. Application of subset autoregressive integrated

moving average model for short-term freeway traffic volume forecasting. Transp.

Res. Rec. 1678 (1999), 179188.

[33] Liu, Y. Y. A polymorphic ant-based algorithm for graph clustering. Master’s

thesis, Department of Computer Science, University of Manitoba, Winnipeg,

MB, Canada, 2016.

[34] Liu, Y. Y., Thulasiraman, P., and Thulasiram, R. K. A self fixing intelli-

gent ant clustering algorithm for graphs. In IEEE International Joint Conference

on Neural Networks in IEEE World Congress on Computational Intelligence (Rio

de Janeiro,Brazil, July 2018).

[35] Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.-Y., et al. Traffic flow

prediction with big data: A deep learning approach. IEEE Trans. Intelligent

Transportation Systems 16, 2 (2015), 865–873.

[36] Ma, X., Dai, Z., He, Z., Ma, J., Wang, Y., and Wang, Y. Learning traffic

as images: a deep convolutional neural network for large-scale transportation

network speed prediction. Sensors 17, 4 (2017), 818.

[37] Ma, X., Tao, Z., Wang, Y., Yu, H., and Wang, Y. Long short-term

memory neural network for traffic speed prediction using remote microwave sen-

60 Bibliography

sor data. Transportation Research Part C: Emerging Technologies 54 (2015),

187–197.

[38] Meidan, Y., Bohadana, M., Shabtai, A., Guarnizo, J. D., Ochoa, M.,

Tippenhauer, N. O., and Elovici, Y. Profiliot: a machine learning approach

for iot device identification based on network traffic analysis. In Proceedings of

the Symposium on Applied Computing (2017), ACM, pp. 506–509.

[39] Mitton, N., Papavassiliou, S., Puliafito, A., and Trivedi, K. S. Com-

bining cloud and sensors in a smart city environment. EURASIP Journal of

Wireless Communications and Networking (2012).

[40] Newman, M. E. J. Detecting community structure in networks. The European

Physical Journal B-Condensed Matter and Complex Systems 38, 2 (2004), 321–

330.

[41] Novikov, A. annoviko/pyclustering: pyclustering 0.8.2 release, Nov. 2018.

[42] Ozbay, K., and Kachroo, P. Incident management in intelligent transporta-

tion systems. Artech House Publishers (1999).

[43] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,

B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,

V., et al. Scikit-learn: Machine learning in python. Journal of machine learn-

ing research 12, Oct (2011), 2825–2830.

[44] Petrolo R, Loscr V, M. N. Towards a smart city based on cloud of things.

In Proceedings of the 2014 ACM international workshop on Wireless and mobile

Bibliography 61

technologies for smart cities - WiMobCity (New York, New York, USA, 2014),

ACM Press, pp. 61–66.

[45] Qasem, M. Bio-inspired constrained clustering: A case study on aspect-based

sentiment analysis. PhD thesis, Department of Computer Science, University of

Manitoba, Winnipeg, MB, Canada, 2018.

[46] Qasem, M., Thulasiraman, P., and Thulasiram, R. K. Constrained ant

brood clustering algorithm with adaptive radius: A case study on aspect based

sentiment analysis. In IEEE Swarm Intelligence Symposium (SIS), IEEE Sym-

posium Series on Computational Intelligence (SSCI) (Honolulu, Hawaii, USA,

Nov 27-Dec 1 2017).

[47] Rao, W., Yoneki, E., and Chen, L. L-graph: A general graph analytic

system on continuous computation. In ACM HotPlanet (2015).

[48] REFIANTI, R., MUTIARA, B., and GUNAWAN, S. Time complexity

comparison between affinity propagation algorithms. Journal of Theoretical and

Applied Information Technology 95, 7 (2017), 1497–1505.

[49] Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and vali-

dation of cluster analysis. Journal of computational and applied mathematics 20

(1987), 53–65.

[50] Schaeffer, S. E. Graph clustering. Computer science review 1, 1 (2007),

27–64.

62 Bibliography

[51] Shea, C., Hassanabadi, B., and Valaee, S. Mobility-based clustering in

vanets using affinity propagation. In Global telecommunications conference, 2009.

GLOBECOM 2009. IEEE (2009), IEEE, pp. 1–6.

[52] Smith, B. L., Williams, B. M., and Oswald, R. K. Comparison of para-

metric and nonparametric models for traffic flow forecasting. Transportation

Research Part C: Emerging Technologies 10, 4 (2002), 303–321.

[53] Strohbach, M., Ziekow, H., Gazis, V., and Akiva, N. Towards a big

data analytics framework for iot and smart city applications. In Modeling and

processing for next-generation big-data technologies. Springer, 2015, pp. 257–282.

[54] Sun, S., Zhang, C., and Yu, G. A bayesian network approach to traffic flow

forecasting. IEEE Transactions on intelligent transportation systems 7, 1 (2006),

124–132.

[55] Taniguchi, E., and Shimamoto, H. Intelligent transportation system based

dynamic vehicle routing and scheduling with variable travel times. Transporta-

tion Research Part C: Emerging Technologies 12, 3-4 (2004), 235–250.

[56] Theodoridis, E., Mylonas, G., and Chatzigiannakis, I. Developing an

IoT smart city framework. In IISA 2013 (2013), IEEE, pp. 1–6.

[57] Tian, Y., and Pan, L. Predicting short-term traffic flow by long short-

term memory recurrent neural network. In Smart City/SocialCom/SustainCom

(SmartCity), 2015 IEEE International Conference on (2015), IEEE, pp. 153–

158.

Bibliography 63

[58] Tonjes, R., Barnaghi, P., Ali, M., Mileo, A., Hauswirth, M., Ganz,

F., Ganea, S., Kjærgaard, B., Kuemper, D., Nechifor, S., et al. Real

time iot stream processing and large-scale data analytics for smart city applica-

tions. In poster session, European Conference on Networks and Communications

(2014), sn.

[59] vanderVoort, M., Dougherty, M., and Watson, S. Combining koho-

nen maps with arima time series models to forecast traffic flow. Transportation

Research . C, Emerging Technology 4, 5 (October 1996), 307–318.

[60] Vlahogianni, E. I., Karlaftis, M. G., and Golias, J. C. Optimized and

meta optimized neural networks for short-term traffic flow prediction: A genetic

approach. Transp. Res. C, Emerging Technol. 13, 3 (June 2005), 211–234.

[61] Wang, Z., Liu, Y. Y., Thulasiraman, P., and Thulasiram, R. K. Ant

brood clustering on intel xeon multi-core: Challenges and strategies. In Sympo-

sium Series on Computational Intelligence (2018), IEEE.

[62] Wang, Z., Thulasiraman, P., and Thulasiram, R. A dynamic traffic

awareness system for urban driving. The 12th IEEE International Conference

on Internet of Things (2019).

[63] Weiming, L., Du Chenyang, W. B., Chunhui, S., and Zhenchao, Y.

Distributed affinity propagation clustering based on map reduce. Journal of

Computer Research and Development 8 (2012), 024.

64 Bibliography

[64] Williams, B., Durvasula, P., and Brown, D. Urban freeway traffic flow

prediction: application of seasonal autoregressive integrated moving average and

exponential smoothing models. Transportation Research Record: Journal of the

Transportation Research Board 1644 (1998), 132–141.

[65] Williams, B. M. Multivariate vehicular traffic flow prediction evaluation of

arimax modeling. Transp. Res. Rec. 1776 (2001), 194200.

[66] Williams, B. M., and Hoel, L. A. Modeling and forecasting vehicular

traffic flow as a seasonal arima process: Theoretical basis and empirical results.

J. Transp. Eng. 129, 6 (Nov 2003), 664672.

[67] Williams, B. M., and Hoel, L. A. Modeling and forecasting vehicular traffic

flow as a seasonal arima process: Theoretical basis and empirical results. Journal

of transportation engineering 129, 6 (2003), 664–672.

[68] Wu, D., Arkhipov, D. I., Asmare, E., Qin, Z., and McCann, J. A.

Ubiflow: Mobility management in urban-scale software defined iot. In 2015

IEEE Conference on Computer Communications (INFOCOM) (2015), IEEE,

pp. 208–216.

[69] Yu, B., Song, X., Guan, F., Yang, Z., and Yao, B. k-nearest neighbor

model for multiple-time-step prediction of short-term traffic condition. Journal

of Transportation Engineering 142, 6 (2016), 04016018.

Bibliography 65

[70] Yu, M., Zhang, D., Cheng, Y., and Wang, M. An rfid electronic tag

based automatic vehicle identification system for traffic iot applications. In 2011

Chinese Control and Decision Conference (CCDC) (2011), IEEE, pp. 4192–4197.

[71] Zhang, B., Xing, K., Cheng, X., Huang, L., and Bie, R. Traffic clustering

and online traffic prediction in vehicle networks: A social influence perspective.

In Infocom, 2012 Proceedings IEEE (2012), IEEE, pp. 495–503.

[72] Zhang, L., Liu, Q., Yang, W., Wei, N., and Dong, D. An improved

k-nearest neighbor model for short-term traffic flow prediction. Procedia-Social

and Behavioral Sciences 96 (2013), 653–662.

[73] Zheng, W., Lee, D.-H., and Shi, Q. Short-term freeway traffic flow pre-

diction: Bayesian combined neural network approach. Journal of transportation

engineering 132, 2 (2006), 114–121.

urban area tra c flow forecasting in intelligent

Documents