urban area tra c flow forecasting in intelligent
TRANSCRIPT
Urban Area Traffic Flow Forecasting in IntelligentTransportation Systems
by
Ziyue Wang
A thesis submitted to
The Faculty of Graduate Studies of
The University of Manitoba
in partial fulfillment of the requirements
of the degree of
Master of Science
Department of Computer Science
The University of Manitoba
Winnipeg, Manitoba, Canada
August 2019
c© Copyright 2019 by Ziyue Wang
Thesis advisor Author
Parimala Thulasiraman Ziyue Wang
Urban Area Traffic Flow Forecasting in Intelligent
Transportation Systems
Abstract
Currently, Intelligent Transportation Systems (ITS), is revolutionizing the transporta-
tion industry. ITS incorporates advanced Internet of Things (IoT) technologies to
implement Smart City. These technologies produce tremendous amount of real time
data from diverse sources that can be used to solve transportation problems. In this
thesis, I focus on one such problem, traffic congestion in urban areas. A road segment
affected by traffic affects the surrounding road segments. This is obvious. However,
over a period of time, other roads not necessarily close in proximity to the congested
road segment may also be affected. The congestion is not stationary. It is dynamic
and it spreads. I address this issue by first formulating a similarity function using
ideas from network theory. Using this similarity function, I then cluster the road
points affected by traffic using affinity propagation clustering, a distributed message
passing algorithm. Finally, I predict the effect of traffic on this cluster using long-
short term memory neural network model. I evaluate and show the feasibility of my
proposed clustering and prediction algorithm during peak and non-peak hours on
open source traffic data set.
ii
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction 11.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 5
3 Background 133.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Connectivity-based clustering . . . . . . . . . . . . . . . . . . 133.1.2 Density-based clustering . . . . . . . . . . . . . . . . . . . . . 143.1.3 Centroid-based clustering . . . . . . . . . . . . . . . . . . . . 153.1.4 k-medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1.5 Affinity propagation . . . . . . . . . . . . . . . . . . . . . . . 163.1.6 Clustering metrics . . . . . . . . . . . . . . . . . . . . . . . . 18
Silhouette coefficient . . . . . . . . . . . . . . . . . . . . . . . 18Similarity mean . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.1 Recurrent neural network . . . . . . . . . . . . . . . . . . . . 21
4 Dynamic Traffic Clustering System 284.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Static data collection . . . . . . . . . . . . . . . . . . . . . . . 294.1.2 Dynamic data collection . . . . . . . . . . . . . . . . . . . . . 294.1.3 Communication between Sensors . . . . . . . . . . . . . . . . 304.1.4 Traffic clustering: Compute all pair-wise similarity . . . . . . . 31
iii
iv Contents
4.1.5 Affinity propagation . . . . . . . . . . . . . . . . . . . . . . . 344.1.6 Clustering results . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.2 Comparison Setup - Correctness of results . . . . . . . . . . . 354.2.3 Data set and environment . . . . . . . . . . . . . . . . . . . . 364.2.4 Cluster Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2.5 Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Cluster-based Traffic Prediction 425.1 Predicting traffic per cluster . . . . . . . . . . . . . . . . . . . . . . . 435.2 Time stamp clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Conclusion and Future Work 526.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Bibliography 65
List of Figures
3.1 Example of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Normal Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 LSTM Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4 Unrolled LSTM Neuron . . . . . . . . . . . . . . . . . . . . . . . . . 233.5 LSTM Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Dynamic Traffic Clustering System . . . . . . . . . . . . . . . . . . . 304.2 Peak Hour Silhouette . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Peak Hour Similarity Mean . . . . . . . . . . . . . . . . . . . . . . . 374.4 Non-Peak Hour Silhouette . . . . . . . . . . . . . . . . . . . . . . . . 384.5 Non-Peak Hour Similarity Mean . . . . . . . . . . . . . . . . . . . . . 394.6 Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 One to One Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Many to One Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 455.3 Many to Many Prediction . . . . . . . . . . . . . . . . . . . . . . . . 455.4 Prediction Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
v
List of Tables
5.1 Adam optimizer parameters . . . . . . . . . . . . . . . . . . . . . . . 485.2 Evaluation of LSTM Models . . . . . . . . . . . . . . . . . . . . . . . 51
vi
Acknowledgments
I would like to begin by thanking my advisor, my committee, my parents, my
significant other, and all the people who have supported me along the way.
vii
This thesis is dedicated to somebody special. You know who you are.
viii
Chapter 1
Introduction
This is the era of Artificial Intelligence (AI) and Internet of Things (IoT). There
is a clear interaction between these two areas. IoT connects many ”smart” physical
devices to generate and collect data for real time analysis. This data is voluminous.
To make any sense of the data, artificial intelligence incorporates ”smart” techniques
and algorithms into machines to allow machines to make real time decisions and
predictions on the data to provide useful insight.
Smart technologies such as IoT and AI already have applications in mobile devices.
Examples include virtual assistants like Siri and Alexa that aid in answering questions
posed by people in their day-to-day life in their mobile devices. IoT has the potential
and power to help to solve some of the challenging issues facing society. One of the
problem facing the day-to-day life of individuals is traffic. No matter where we live,
we will encounter some sort of traffic. In recent years, more and more people are
moving to urban, metropolitan areas for better quality of life. It is estimated that
61% of the population would have relocated to metropolitan areas by 2032 [44]. With
1
2 Chapter 1: Introduction
the increase in population and transport industries, the number of vehicles on the
road in urban areas is increasing. In the near future, we will be encountering drive-
less vehicles [3] on the road. Traffic conditions on the roads will exacerbate even
more, frustrate the individuals on the road, counteracting the better quality of life
that made individuals move to the urban areas in the first place.
Cities and the transportation industries are finding solutions through Intelligent
Transportation System (ITS). ITS incorporates advanced IoT technologies into the
transportation systems such as electronic sensor technologies near road side units,
data transmission technologies with fast data speed or sophisticated and intelligent
control technologies in traffic control systems. The goal of ITS is to implement a
Smart City [[27], Smart Cities Challenge] wherein personal drivers, traffic managers
or emergency responders are well connected through advanced technologies to make
well informed real time decisions on the go. These technologies produce tremendous
amount of real time data from diverse sources (such as GPS, social media, sensors,
etc.) that can be used to solve transportation problems including traffic congestion.
IoT and AI are leading the way in developing autonomous vehicles. Smart tech-
nologies are now adapted in vehicles to automate the interaction and exchange of
information for safe driving and increase the efficiency of our existing traffic systems.
Currently, ITS, is revolutionizing the transportation industry.
A road network is usually represented as a graph where landmarks, junctions or
intersections are used to represent the node entities in the graph and the roads/lanes
are used to represent the interrelationship between the node entities. In general, traffic
on the road could be caused by many different reasons: accidents, construction, peak
Chapter 1: Introduction 3
hours and so on. In studying the traffic congestion problem, there are two types of
locality we need to consider: temporal locality and spatial locality. Spatial locality
implies that if a road r is affected by traffic, then the surrounding roads of r are also
affected. Temporal locality implies that if r is affected at time t, chances are that
the surrounding roads of r will also be affected at time t + 1. Unfortunately, this is
not enough. The surrounding roads are not the only ones affected. Over a period
of time, t + δ, there will be a domino effect. That is, the affected roads will in turn
affect other roads which in turn affect other roads and so on. The congestion is not
stationary. It is dynamic and it spreads. The event created by the road point (r)
at time t and its effect on the road traffic at time t + δ is difficult to capture by the
smart technologies currently. Predicting the future of any event is in general difficult.
In my thesis, I focus on the traffic flow prediction problem on a road network in
urban areas. I take advantage of the capabilities of IoT and propose a dynamic traffic
awareness system to predict traffic. In this system, sensors are placed at various road
points, represented as an IoT network. The sensors dynamically collect and analyze
the traffic flow data to compute the similarity function between road points. Using
the similarity function, I find all the road points affected by the traffic at road point
r at time t, group them together to predict the effect of traffic on this group of nodes.
Grouping the nodes is nothing but clustering since they have similar features, in this
case traffic flow. I use the concepts from network theory, in particular maximum-
flow theory and shortest path algorithms, and distributed message passing clustering
algorithm called affinity propagation [15] to cluster the nodes. The algorithm is
executed continuously to capture up to date information about traffic. The disjoint
4 Chapter 1: Introduction
clusters are trained using long-short term memory neural network model to predict
the traffic flow on the affected roads.
The thesis is organized as follows. The next chapter provides the literature review
followed by background on clustering and prediction techniques. Chapter 4 discusses
the dynamic traffic awareness clustering system. Chapter 5 elaborates on the training
of the clusters using long-short term memory model. Finally, conclusion and future
work is provided in Chapter 6.
1.1 Contribution
The contribution of the thesis is to design, develop and implement a traffic pre-
diction algorithm on an urban road network. To achieve this, I do the following:
1. Partition the traffic road points into time variant clusters, such that the traffic
within the cluster are strongly spatially correlated.
2. Define a similarity metric based on flow for the clustering algorithm.
3. Design the affinity propagation clustering algorithm using my proposed similar-
ity metric.
4. Predict the traffic flow within these disjoint clusters using long-short term neural
network.
5. Implement, simulate and analyze the proposed algorithm on city data set.
Chapter 2
Literature Review
Research in Intelligent Transportation Systems (ITS) has been on going for a
very long time. Although, the term ”ITS” was not specifically used in the early
days. Drane and Rizos [12], for example, introduced the positioning based sys-
tem into transportation research. Using tools such as GPS, they developed position
based algorithms to guide the vehicles to reduce traffic costs. Similarly, Ozbay and
Kachroo [42] developed a software simulation tool that incorporated various man-
agement strategies and responses to handle incidents on the road. Technology has
improved tremendously since the 1990’s.
In the early twenty-first century, the term ITS was defined as a transportation
system that incorporated mobile devices, Internet of Things (IoT) including sensors,
Cloud and media sources to connect the surroundings. ITS is for not only studying
issues with respect to cars but also for rails, airplanes, freights and so on. Taniguchi
and Shimamoto [55] studied the optimal vehicle routing and scheduling problem for
freight carriers. The problem was to find the shortest travel time of freight carriers
5
6 Chapter 2: Literature Review
from a certain depot, pick up and drop off goods from customers within a specified
window and then finally return to the depot. To solve this optimization problem,
the authors used genetic algorithm to find the optimal solution. They used real time
traffic information but assumed a deterministic road network such as a mesh.
In recent years, with the growing population and vehicles on the road, Internet of
Things has been combined with traffic systems. Yu et al. [70] proposed a system to
gather traffic data from moving vehicles using RFID electronic tag. They showed that
their system can be adapted to study a wide variety of traffic related IoT applications.
Mitton et al. [39] designed a system that combined Cloud and IoT to build smart
city. The authors presented an architecture that allowed users accessing the Cloud to
acquire data from any heterogeneous IoT devices. Wu et al. [68] proposed UbiFlow,
a software-defined IoT system to manage traffic. The authors partitioned the large
urban traffic networks into small geographic pieces and studied issues such as fault
tolerance and flow scheduling. They claimed that it was the first system to study
ubiquitous flow control and mobility management in multi-networks. Al-Sakran [2]
developed a system based on IoT devices and agent based technology to collect and
monitor real-time traffic information. They used wireless sensor networks and RFID-
based networks to create links between sensors to share information. Theodoridis et
al. [56] discussed the 3-tier IoT design with IoT sensor tier, IoT gateway tier, and
server tier to design Smart city. They discussed the advantages and disadvantages of
this design and the technological challenges in designing Smart city. Similarly, Gaur
et al. [20] proposed a multi-level Smart city architecture using semantic modeling and
Dempster-Shafer approach. They divided the architecture into four levels with each
Chapter 2: Literature Review 7
level having its own responsibility including data collection, data communication,
and data processing. Strohbach et al. [53] proposed a Big data analytic framework
to study traffic flow and traffic problems. They provided some discussion of their
initial findings and the challenges posed in developing their framework. Meidan et
al. [38] used machine learning techniques to identify IoT devices that are connected to
network traffic. They used nine different IoT devices, smartphones and personal com-
puters to perform their experiments and showed that the IoT classification accuracy
of their model was 99.281%.
With the growing population, vehicles on the road are also increasing. Predicting
traffic is vitally important to reduce travel time. This problem has gained popularity
over the last few years in ITS. Short-term traffic flow prediction algorithms predict
the traffic flow in the near future. Many different prediction strategies have been
developed and compared [52; 54; 35] as is discussed below.
Traffic flow prediction can be divided into two categories: parametric and non-
parametric techniques [52]. Traditionally, parametric time series models such as Box-
Jenkins, have been used to predict the traffic flow. In these models, historical traffic
flow data is used in sequence of time steps to predict the future traffic flow. Autore-
gressive Integrated Moving Average (ARIMA) is one of the popular traffic flow predic-
tion techniques used in parametric Box-Jenkins time series model [64; 1]. There has
been many extensions of this model to improve prediction accuracy[59; 32; 65; 28; 66].
However, the results show that these classical methods have huge drawback on per-
formance and sometimes accuracy [52; 16; 36; 67].
k-nearest neighbor (k-NN) is one of the popular non-parametric models in traffic
8 Chapter 2: Literature Review
flow prediction [69; 72]. k-NN searches the k user-defined constant training samples
and classifies the current sample (x) to the sample points that is closer to x and is
more frequent. k-NN is sensitive to data and the choice of k depends on the data.
Artificial Neural Network (ANN) is another popular non-parametric model, that has
been considered for traffic flow prediction [6; 60]. The recent research in ANN have
shown that variations of ANN such as convolution neural networks (CNN) and Long-
Short Term Memory Neural Network (LSTM) outperform other classic parametric
and non-parametric algorithms in both solution accuracy and performance. Huang
and Ran [26] use ANN to predict traffic speed with regards to the weather condition.
The combination of ANN and Bayes’s theorem is used to predict traffic flow by Zheng
et al.[73]. Fu et al.[16] have shown that GRU and LSTM neural network models shows
a 10% error rate reduction than ARIMA. Ma et al.[36] show that by using CNN, the
error rate can be reduced about 50% compared to k-NN. Tian et al.[57] have shown
that LSTM outperforms most non-parameteric algorithms.
In my thesis, I propose to use ANN to predict the traffic flow, in particular,
Long-Short Term Memory Neural Network (LSTM) model. LSTM neural network
was introduced by Hochreiter and Schmidhuber [24] as a variant of recurrent neural
network (RNN). The main change is that the hidden layer in RNN is included as a
cell called LSTM. The cell consists of three gates: input gate, forget gate and output
gate. These control the information through the cell and the neural network. The cell
output state and the hidden layer output are transmitted to the next neural network
in RNN. A series of RNN can be used to feed in the input data and predict future
output data. The LSTM structure is well suited for time-series data. It has been
Chapter 2: Literature Review 9
applied to traffic speed and traffic flow prediction. Ma et al.[37] show LSTM reduces
the error rate by up to 60% compared with ARIMA model. In this thesis, I propose
to train the clusters using LSTM.
In other work, there has been some interest in clustering similar traffic flow pat-
terns. In [17] the traffic flow data is clustered using pair-wise similarity. Clustering
[4] is an important problem in data analytic [11] and graph analytic [47]. A graph
data structure is used to represent the relationship between the data points. The
vertices in a graph represent the data points and the edges represent the pair-wise
similarity between the vertices. In the transportation problem, the road network can
be represented as a graph. There is no universally accepted definition of graph clus-
tering [50]. One definition of graph clustering refers to the task of dividing the vertex
set V of a given graph G into k non-empty groups such that the number of edges (or
edge weights for weighted graphs) between vertices in different groups is minimized
[40]. For an arbitrary k, the graph clustering problem is NP-complete [19].
There are numerous approximate methods or heuristics such as swarm intelligence
(or nature-inspired) have been considered to solve the clustering problem. Swarm
intelligence algorithms are inspired by the collective behavior of social animals to
solve complex problems [5]. One such algorithm is Ant Brood Clustering (ABC)
[10], inspired by how ants cluster their brood or corpse into different piles. It was
first applied to the graph partitioning problem by Kuntz et al. [31]. In our recent
work [33; 46; 34; 45], variations of ABC have shown to achieve high quality solutions
for the clustering problem. The dynamic and distributive nature of ABC make it
interesting for modern graph analytic problems. The drawback of the ant brood
10 Chapter 2: Literature Review
clustering algorithm is that it is compute intensive and performance degrades for
large data sets. In [61], I studied various parallelization strategies to speedup the
ant brood clustering computations. Another drawback of this algorithm is that the
mathematical model that requires considerable changes to suit the applications. I
therefore, opt not to use this algorithm for clustering traffic network.
Other well known partition based techniques are k-means and k-medoids. The
data points are partitioned into k clusters, k known apriori. In k-means, the initial k
cluster centers are chosen randomly. Then each data point calculates the Euclidean
distance between itself and the cluster center. The data point is assigned to the
closest center data point. Finally, the new center clusters are updated by calculating
the mean of the data points in the clusters. The process is repeated until there is no
change in the cluster centers. The k-medoids is similar to k-means with the constraint
that the center cluster is one of the data points. Initially, k medoids (or exemplars)
are chosen. Then the Euclidean distance is calculated between the data points and
medoids. The data point is assigned to the closed medoids. The new medoids are
found by choosing one data point from a cluster that minimizes the cluster variance.
By doing this, the algorithm minimizes the sum of dissimilarities between points la-
beled to be in a cluster and a point designated as the center of that cluster. This is in
contrast to the k-means algorithm that minimizes the total squared error. k-medoids
is called as an examplar based algorithm. Both, k-means and k-medoids are sensitive
to the initial random selection of centroids or examplars. Another exemplar-based
algorithm is affinity propagation (AP) proposed by Frey and Dueck [63]. Unlike clus-
tering algorithms such as k-means or k-medoids, affinity propagation algorithm does
Chapter 2: Literature Review 11
not require the knowledge of k apriori. Unlike k-medoids, in the AP algorithm, all
data points are considered as examplars. It is a messaging passing distributed clus-
tering algorithm [63]. Two types of messages are sent between the data points in the
network: responsibility and availability. Data points send responsibility messages to
candidate examplars reflecting how well-suited the message-receiving point is to serve
as an exemplar for the sending data point. Candidate exemplars send availability mes-
sages to data points reflecting how appropriate it would be for the message-sending
point to be the exemplar for the message-receiving data point. Therefore, all data
points are considered to be either an examplar or cluster member depending on the
type of message being sent or received. The responsibility and availability metrics
are updated based on the similarity measure until convergence.
Shea et al. [51] used the affinity propagation algorithm for clustering dynamic
vehicular ad hoc networks. In my work, clustering is performed on a static road
network. Zhang et al. [72] use a modified affinity propagation algorithm to cluster the
road network. Rather using the Euclidean distance proposed in the original algorithm,
they used a similarity measure based on the speed of the vehicles on the road, the
flow of the number of vehicles entering and exiting a road. The responsibility and
availability metrics are updated using this modified similarity measure. The update
rules are quite complex and the proof of correctness is too complex to understand.
In my thesis, I will use the affinity propagation algorithm due to its simplicity
and the distributive nature of the algorithm. Also, it has been shown that this
clustering algorithm produces good solution quality with low error rates [48]. The
complexity of the algorithm, however, is O(mN) where m is the number of iterations
12 Chapter 2: Literature Review
and N is the number of data points. For large data sets, the number of message
updates will be communication intensive. To increase performance or speedup the
algorithm, expansions of affinity propagation have been theoretically considered [18].
The main challenge in the affinity propagation algorithm is determining the similarity
measurement. In this thesis, I propose a similarity measure based on traffic flow for
clustering.
Chapter 3
Background
My thesis includes both clustering and prediction. So, this chapter provides
background on the various clustering and prediction techniques. it also provides
an overview of the clustering metrics used in this thesis.
3.1 Clustering
3.1.1 Connectivity-based clustering
Connectivity-based clustering algorithms cluster the nodes based on the node’s
distance. Nodes that are closer together have higher similarity than those that are
farther away. Linkage-based clustering algorithms belong to this class. Linkage-
based algorithms are bottom-up algorithms. It starts by assigning each node by
itself as a cluster. The clusters then begin to merge, until the minimum similarity is
reached between clusters or the minimum number of clusters is found. Merging or
unmerging clusters is determined by the similarity between the clusters. Depending
13
14 Chapter 3: Background
on the different methods used to measure the similarity between clusters, there are
three different kinds of linkage criterion. Single-linkage criteria find two nodes in two
different clusters with maximum similarity. If the nodes exhibit enough similarity,
then it merges the clusters. On the contrary, complete-linkage criteria consider the
nodes with minimum similarity. However, both single-linkage and complete-linkage
have drawbacks. That is, the algorithms consider only 1 node in each cluster. In
my thesis experiments, I use average-linkage criteria. This uses the average similarity
of the nodes in two different clusters allowing all nodes to be compared during the
merge phase.
3.1.2 Density-based clustering
Some clustering algorithms aim to find the clusters with high density. These
are classified as density-based algorithms. One such well known algorithm in this
category is Density-Based Spatial Clustering (DBSCAN). DBSCAN clusters groups
of nodes based on a distance measurement and the minimum number of points. It
takes, therefore, two parameters: an user defined distance measurement and an under
defined density (minimum number of points per cluster). In this thesis, DBSCAN
is one of the technique used for comparison. One of DBSCAN’s feature is that it is
suitable for noise-based applications. That is, the technique can detect noise - the
nodes that do not fit in any cluster. This might be help in traffic clustering, since
there might be some traffic points that are not related to any other traffic point. I
use this technique to compare against my technique.
Chapter 3: Background 15
3.1.3 Centroid-based clustering
Centroid-based clustering techniques assume a pre-defined number of centers, that
may not necessarily be a member of the data elements. A well-known centroid-based
clustering technique is k-means. In k-means, the number of clusters are defined
apriori. The nodes are assigned to the nearest cluster center, such that the squared
distances from the cluster are minimized [14]. The new center clusters are updated
by calculating the mean of the data points in the clusters. The process terminates
when there is no change in the cluster centers. In the traffic clustering problem, it is
difficult to predict the number of clusters apriori. Therefore, in this thesis, I no not
use k-means clustering.
3.1.4 k-medoids
The k-medoids is similar to k-means with the constraint that the center cluster
is one of the data points. Initially, k-medoids (or exemplars) are chosen. Then the
Euclidean distance is calculated between the data points and medoids. The data
point is assigned to the closest medoids. The new medoids are found by choosing
one data point from a cluster that minimizes the cluster variance. By doing this,
the algorithm minimizes the sum of dissimilarities between points labeled to be in a
cluster and a point designated as the center of that cluster. This is in contrast to the
k-means algorithm that minimizes the total squared error. k-medoids is called as an
examplar based algorithm. Both, k-means and k-medoids are sensitive to the initial
random selection of centroids or examplars.
16 Chapter 3: Background
3.1.5 Affinity propagation
Affinity propagation is a centroid-based clustering algorithm. It is an exemplary
based algorithm proposed by Frey and Dueck [15]. Unlike clustering algorithms such
as k-means or k-medoids, affinity propagation algorithm does not require the knowl-
edge of k apriori. Unlike k-medoids, all data points are considered as examplars.
That is, each node considers itself as a cluster center, at the same time looking for
another suitable cluster center. It is a messaging passing distributed clustering al-
gorithm. This algorithm is suitable for the traffic clustering problem. Each node in
the road network can compute independently and exchange information without any
centralized control.
The affinity propagation algorithm works as follows. Initially, every node is the
center of itself. Each node computes and shares two types variables (messages):
responsibility and availability. Responsibility and availability are pair-wised vari-
ables. Responsibility r(i, k) represents how well k is the center of i, and availability
a(i, k) shows how suitable i is a member of k. Initially, both responsibility and avail-
ability are set to 0. Responsibility is updated before availability using the equation
below:
r(i, k) = s(i, k)− maxk′s.t.k′ 6=k
{a(i, k′) + s(i, k′)} (3.1)
To compute responsibility r(i, k), the algorithm finds another data point k′
that
has the highest (maximum) availability and similarity, and computes the difference
in the similarity. In addition, responsibility r(i, k) represents how well k is the center
of i, so it does not only consider how similar i and k are, but also considers which one
Chapter 3: Background 17
of i and k is more suitable be the center. Self responsibility r(k, k) could be negative
or positive. If it is negative, it implies that the node is more likely to be a member
of some cluster rather than the center of a cluster.
On the other hand, availability is computed from the maximum responsibility of
all the points. If there is another data point (i′) that has higher responsibility to the
current point k, then the node i is very less likely to be available. In each iteration,
each vertex updates and exchanges its responsibility and availability. Availability is
updated by using the following equation below:
a(i, k) =
∑
i′s.t.i′ 6=k max{0, r(i′, k)}, i = k
min{0, r(k, k) +∑
i′s.t.i′ 6∈{i,k}max{0, r(i′, k)}}, i 6= k
(3.2)
To determine how suitable a node is a member of itself, the algorithm calculates
the summation of the nodes positive responsibilities. The reason to pick only positive
results is that only good centers should be considered into competition. When a node
k is assigned to a cluster, r(k, k) could be negative. This could impact availability,
a(i, k), which will very likely be lower than 0. If there are some nodes i′ that have
positive responsibility r(i′, k), the availability a(i, k) could be increased. On the other
hand, if there is a node k that is extremely similar to i, r(i, k) could be extremely
high. To avoid these extreme value, the algorithm does not allow any value greater
than 0. Termination is detected if either the algorithm converges or a pre-defined
number of iterations are selected. The nodes are assigned to the clusters with center
k that have maximum a(i, k) + r(i, k) since both the fitness of k, the center of i and
i, the member of k are maximized. The pseudo-code of the algorithm is given below
18 Chapter 3: Background
in Algorithm 1.
Algorithm 1: Affinity Propagation Clustering
Data : Similarity matrix with size of N * N: s(i, j)i,j∈N
Result: Cluster labels: C(i), i ∈ N
1 ∀i, k : a(i, k)← 0 ;
2 while not converge do
3 ∀i, k : r(i, k)← s(i, k)−maxk′s.t.k′ 6=k{a(i, k′) + s(i, k′)};
4 ∀i, k : if i == k then
5 a(i, k) =∑
i′s.t.i′ 6=k max{0, r(i′, k)};
6 else
7 a(i, k) = min{0, r(k, k) +∑
i′s.t.i′ 6∈{i,k}max{0, r(i′, k)}};
8 end
9 end
10 for i ∈ N do
11 C(i) = argmaxk{a(i, k) + r(i, k)};
12 end
3.1.6 Clustering metrics
Silhouette coefficient
One of the most popular clustering quality measurement is Silhouette coefficient
[49]. It evaluates the similarity of the items inter- and intra-clusters. It examines how
an item is similar to current cluster over other clusters. A high Silhouette coefficient
indicates a high cluster quality. It can be written as equation 3.3, 3.4, and 3.5, where
Chapter 3: Background 19
Ci is the node i in cluster C, d(i, j) is the dissimilarity of i and j.
a(i) =1
|Ci| − 1
∑j∈Ci,i 6=j
d(i, j) (3.3)
b(i) = mink 6=i
1
|Ck|∑j∈Ck
d(i, j) (3.4)
s(i) =
1− a(i)/b(i), a(i) < b(i)
0, a(i) = b(i)
b(i)/a(i)− 1, a(i) > b(i)
(3.5)
Similarity mean
Similarity mean is the mean similarity of all nodes in a cluster. Different with
Silhouette coefficient, it only consider the intra-cluster similarity. A high similarity
mean indicates a high cluster quality. Similairty mean of a single cluster similarity
mean can be compute as equation 3.6. The mean of all cluster similarity mean is the
final similarity mean.
csm(C) =
∑∀i,j∈C,i<j s(i, j)
|C|(3.6)
3.2 Artificial neural network
Artificial neural networks (ANN) are inspired by the neurons in the human brain.
ANN tries to learn the pattern in the data set without any task-specific rules. ANN
20 Chapter 3: Background
is constructed by neuron layers, in which the number of neurons construct a layer.
Layers are directly connected to each other and weights are assigned between layers.
Usually, there are three types of layers: input layer, hidden layer, and output layer.
Input layer is the first layer. This layer deals with the input and transmits the data
that the hidden layer can use. Hidden layer does the core of learning. The number of
hidden layers is not restricted and depends on the user. Depending on the different
type of neurons in the hidden layer, the hidden layer may have different functionality.
Large number of hidden layers make the computation more intensive. The last layer
is the output layer. Users may add as many as neurons as they need in any layer,
and as many connection as they need between consecutive layers to increase accuracy
or get the desired output. However, this increases the computation overhead. Figure
3.1 is an example of ANN with one hidden layer. There are 4 neurons in the input
layer, 5 neurons in the hidden layer, and 4 neurons in the output layer.
Each layer performs a predefined activation function using the weight matrix and
bias vector. Different activation functions have different functionality. For example,
sigmoid function is one of the most widely used activation function. This function can
take any input and convert the input into a non-linear distribution. Weight matrix
assigns a weight to each input, and bias vector adds or multiplies a bias number to
the final output.The final result computed by ANN and true result are compared by
a loss function. Depending on the application, the loss function will result in a value
that shows the accuracy of ANN. Based on this loss function, the weight matrix and
bias vector are fine tuned in every iteration by the back propagation algorithm.
Chapter 3: Background 21
Figure 3.1: Example of ANN
3.2.1 Recurrent neural network
There are many types of neural networks available in the literature. Among them,
I choose the recurrent neural network (RNN). RNN is a regular ANN where some
neurons (especially in the hidden layer) have feedback connections to themselves.
That is, their outputs are fed as inputs. The feedback connection of the neuron to itself
acts as a kind of memory element. This element that takes into account the present
decision, the history of decisions previously taken, and hence the previous data. RNN
is therefore suitable for times-series data or temporal data. In our problem, the traffic
data is related to both space and time. In clustering the data, we find the influence
between the traffic points in the same time stamp. Using RNN we can predict the
traffic flow in future time stamps using the data available at present.
RNN fails to remember long term dependencies. To overcome this drawback, Long
22 Chapter 3: Background
short-term memory (LSTM) neural network was introduced [25]. LSTM stores both
long-term information and short-term information. It is powerful in sequence related
problems (e.g. text prediction). LSTM has the ability to loop back. Figure 3.2 shows
a neuron, with input xt entering into the neuron and producing neuron output, ht.
Figure 3.2: Normal Neuron
Figure 3.3 shows a typical LSTM unit with feedback loop. It may consist of many
layers. The result form ht can loop back multiple times. The advantage of this is
it can take more than one time stamp data into consideration. For example, figure
3.4 shows an unrolling of LSTM neuron. x1 is the input data at time 1. This data
may affect the output result at time 3. Consider an example, “a tree has many
leaves, ...”. “tree” is a key word in the sentence in order to predict ”leaves”. The
feedback loop of LSTM takes “tree” into the prediction of “leaves” at time stamp
3. In this example, the gap between two time stamps is very small. The words
could be memorized in short term memory. However, realistically, the gap between
keywords and the prediction could be very large. The keywords could be in stated
in the first paragraph, but the prediction could take place in the third paragraph.
Fortunately, the long-term memory in LSTM is able to memorize the keywords to
solve the problem.
There are many variants of LSTM units. For example, peephole [21] LSTM unit
is one of the variants that fits well in the traffic flow clustering problem. Figure 3.5
shows the structure of peephole LSTM that mainly consists of four layers: memory
Chapter 3: Background 23
Figure 3.3: LSTM Unit
Figure 3.4: Unrolled LSTM Neuron
cell, input gate layer, output gate layer, and forget gate layer. The Input gate layer
considers the incoming data. The Forget gate layer, as the name implies, forgets the
unnecessary data. Output gate computes the final output. The Memory cell layer
stores the output from the previous iteration. Input gate, forget gate, and output
24 Chapter 3: Background
Figure 3.5: LSTM Overview
gate are sigmoid layers which apply sigmoid function. A sigmoid function transfers
the input into sigmoid curve with range [0, 1]. It is defined in equation 3.7.
σ(x) =1
1 + e−x(3.7)
Figure 3.5 shows the overview of LSTM structure. I use the following notations:
• Input data: X = (x1, x2, . . . , xn)
• Output data: Y = (y1, y2, . . . , yn)
• Hidden state of memory cell: H = (h1, h2, . . . , hn)
The Forget gate decides what to forget. The current input xt and the previous
Chapter 3: Background 25
memory ct−1 are the inputs to Forget gate. The Forget gate computes the sigmoid
function with these two input multiplies with a weight matrix Wf plus the bias vector
bf . The output of forget gate ft(Equation 3.8) is a value in the range [0, 1], where a
date value of 0 implies withdraw this data and a data value of 1 implies completely
keep this data.
ft = σ(Wf · [ht−1 + xt + ct−1] + bf ) (3.8)
As show in equation 3.9 similar to Forget gate, Input gate also takes xt and ct−1
as input. However, the purpose of this is opposite to Forget gate. This gate decides
what to store in memory for the next iteration.
it = σ(Wi · [ht−1 + xt + ct−1] + bi) (3.9)
Another copy of the input is copied to the Input modulation gate, which is just a
modification of sigmoid function. The changes in the output range from [−2, 2]. This
is denoted as g. The output of the Input gate and Input modulation gate are added
to compute ct, the data needed to be store in memory cell (Equation 3.10).
ct = ft ∗ ct−1 + it ∗ g(Wc · [ht−1 + xt + ct−1] + bc) (3.10)
The last step is to decide what to output. Similar to Input gate and Forget gate,
Output gate Ot, outputs the result. h is another modification of sigmoid function
that uses the output range [−1, 1]. It takes the current memory ct as input. The
multiplication of Ot and h(ct) is the final output of LSTM, as shown in equation 3.11.
26 Chapter 3: Background
ht = ot ∗ h(ct) (3.11)
After I get the final output, I need to apply a loss function for a comparison and
to determine the loss or accuracy loss of the result. In this thesis, I pick square loss
function, Equation 3.12. It sums the squared error between the predicted results pt
and true result yt.
e =n∑t=1
(yt − pt)2 (3.12)
Once I get the loss, I apply back propagation algorithm to fine tune the hidden
states, weights, and bias. Back propagation through time (BPTT) is a technique
designed for RNN back propagation. In my thesis I choose Adam optimizer [30].
Adam optimizer is the combination of Adaptive Gradient algorithm and Root Mean
Square propagation. It take advantages of both these techniques and works well
with sparse gradients and non-station settings. Adam optimizer requires some hyper-
parameters: α, β1, β2, ε. I use the parameters recommended by Keras [29]: α =
0.001, β1 = 0.9, β2 = 0.999, ε = 10−8. Additionally, the loss function is donated as
L(x) and gradient function is donated as ∇f(x). The gradient function of f(x) dot
product a vector v is the directional derivation of f(x) along v, which is given as
equation 3.13.
(∇f(x)) · v = Dvf(x) (3.13)
With the hyper-parameters, loss function L(x) and gradient function ∇f(x), the
Chapter 3: Background 27
Adam optimizer algorithm is given below:
Algorithm 2: Adam Optimizer
Data : α, β1, β2, ε,W
Result: WT
1 M0 = 0;
2 R0 = 0;
3 for t ∈ T do
4 Mt = β1Mt−1 + (1− β1)∇L(Wt−1);
5 Rt = β2Rt−1 + (1− β2)∇L(Wt−1)2;
6 Mt = Mt/(1− (β1)t);
7 Rt = Rt/(1− (β2)t);
8 Wt = Wt−1 − α Mt√Rt+ε
;
9 end
10 Return WT ;
Chapter 4
Dynamic Traffic Clustering System
This chapter attempts to answer the following question: How does congestion
caused by an unfortunate event on a road segment affect other roads not necessarily
close in proximity to the congested road segment?. I take advantage of the capabilities
of the IoT and propose a dynamic traffic awareness system for urban driving. The
system finds all the road points affected by the traffic at road point r at time t, groups
them together to predict the effect of traffic on this group of nodes. Grouping the
nodes is nothing but clustering since they have similar features, in this case traffic
flow. I develop a traffic aware system using IoT technologies and sensors around road
points, that dynamically collects and analyzes the traffic flow data to compute the
similarity function between road points. I use the concepts from network theory, in
particular maximum flow and shortest path algorithms, and a distributed, message
passing algorithm to cluster the nodes that is executed continuously to capture up to
date information about traffic. I evaluate the system during peak and non-peak hours
and against static clustering algorithms and show the performance of my dynamic
28
Chapter 4: Dynamic Traffic Clustering System 29
clustering algorithm.
4.1 System overview
In this section, I propose [62] a real-time dynamic traffic network clustering system.
There are several phases in my system design. I explain these using Figure 4.1. The
sensors on road side units collect traffic data continuously. The collection interval can
be fine-grained (in nanoseconds) or coarse-grained (hours). As the figure indicates, I
collect two types of data: static data and dynamic data.
4.1.1 Static data collection
This phase is computed only once. From this data I obtain the length of the roads
and all the nodes (sensors) adjacent to the nodes. I require these information later
in developing the dynamic system. These are collected only once before starting the
dynamic clustering phase.
4.1.2 Dynamic data collection
The dynamic data are collected by wireless sensors. The wireless senors placed on
the road side units, collect real-time traffic information and transfer these information
via wireless network. The data is collected periodically and the results reported every
5 minutes. From this data, I count the number of vehicles that pass through the
sensors of the sensors to capture the traffic flow in real time.
30 Chapter 4: Dynamic Traffic Clustering System
Figure 4.1: Dynamic Traffic Clustering System
4.1.3 Communication between Sensors
In this phase, the sensors exchange sensor data, in particular, road side informa-
tion. This is needed to compute the similarity matrix for clustering the nodes used
in the affinity propagation algorithm.
Chapter 4: Dynamic Traffic Clustering System 31
4.1.4 Traffic clustering: Compute all pair-wise similarity
This phase is Step 4 in Figure 4.1. In any clustering algorithm, determining the
similarity metric is crucial to obtain good results. Different clustering methods use
different information obtained from traffic data to calculate similarity measurement.
For example, some research papers [71] only use speed while others use only number
of vehicles on the road or the average distance between nodes. The distance between
road points is static and will not provide real-time traffic information. Average speed
can be captured in real-time, however, it requires complex calculations in finding
the relationship of road points with potential accuracy loss. Compared to these
parameters, traffic flow directly represents the relationship between road points.
In this thesis, I use traffic flow as the parameter to find the traffic clusters. The
idea is that the amount of traffic flow entering a road point must eventually leave the
road point. This is based on the maximum flow theorem [22; 23; 7] which states that
the amount of flow into a node is equal to the amount of flow out of the node. The
number of vehicles from one road point s to another road point t influences the traffic
on the road and the traffic flow. I use this information in the similarity function.
Similarity stands for how well a node is similar/dissimilar to another node. Usu-
ally, when people try to cluster a graph, a similarity is represented by negative Eu-
clidean distance. Euclidean distance is the distance between two nodes in Euclidean
space, it is consider the dissimilarity of two nodes, on the graph, it is two-dimensional,
the negative Euclidean distance is just the negative value of Euclidean distance, which
convert the dissimilarity to similarity. The following formula shows the negative Eu-
clidean distance between node p and q on x and y axis.
32 Chapter 4: Dynamic Traffic Clustering System
d(p, q) = −√
(px − qx)2 + (py − qy)2 (4.1)
However, traffic is complex, the influence between traffic points can not be easily
represented just by using the distance between them. Real-time parameters such as
weather, accidents can influence the traffic condition on the road. Therefore, it is
important to find the true real-time influence between traffic points. There are two
type of data I can make use of: speed and volume. In this research, traffic speed
implies the average speed of vehicles that pass through the sensor every 5 minutes,
and traffic volume or flow implies the number of vehicles that pass through the sensor
every 5 minutes.
Traffic speed is real-time information on the traffic roads. The change of traffic
between road points can show the relationship and influence of traffic between the
road points. However, it is not the best method to measure the influences of traffic
between road points since the unpredictable parameters that affect the traffic speed.
For example, heavy snow can result in a reduction in traffic speed. Such details are
not provided in the data.
On the other hand, traffic flow information is easier to gather and better represent
the influences between traffic points. Calculation of the traffic flow requires the
number of vehicles and not the speed of the vehicles. This information is easier to
collect using sensors. The amount of traffic flow on the road can represent the traffic
influences between road points directly. For example, if a road point has strong
relationship with another road point, the number of vehicles that passes through
these road points, influence each other. According to the maximum flow algorithm
Chapter 4: Dynamic Traffic Clustering System 33
in network theory [9], flow in is equal to flow out. The number of vehicles that enter
a road must eventually leave the road point. In this thesis, I use traffic flow as one
of the parameters to find the similarity metric.
Consider a driver who wants to travel from p to q. Usually, the driver will pick
the shortest path between these two destinations. The similarity as mentioned before
is how the two road points are similar to each other. In this case, it is the flow
between the road points. The flow on the shortest path is denoted by f(p, q). I then
normalize the flow to use in the clustering algorithm by dividing the flow by the total
amount of incoming flow into q ,represented as inf(q). This way, the similarity values
are distributed between [0, 1] making it easier to use within the clustering algorithm.
Furthermore, most clustering algorithms requires the similarity to be symmetric. This
implies that the similarity s(p, q) must equal s(q, p) and I take the mean of these tow
values. Note that the similarity values are still distributed between [0, 1]. I assume
that a node always has similarity 1 to itself, which is the maximum possible similarity.
The equation below shows the similarity between p and q:
s(p, q) =
1, p = q
( f(p,q)inf(q)
+ f(q,p)inf(p)
)/2, p 6= q
(4.2)
By computing the similarities between all node pairs, I can build a similarity
matrix that could be used in any clustering algorithm.
34 Chapter 4: Dynamic Traffic Clustering System
4.1.5 Affinity propagation
Once I set the similarity measurement, the next step is to apply the affinity
propagation clustering algorithm. Affinity propagation considers every data point
as an exemplar - center of the cluster. In each iteration, two kinds of messages are
sent between vertices: responsibility r(i, k) and availability a(i, k). r(i, k) is sent from
i to k which represents how well k is the exemplar of i, and a(i, k) is sent from k to i
which shows how suitable i is a member of k. s(i, k) is the similarity computed from
Equation (4.2). To compute responsibility r(i, k), the algorithm finds another data
point k′
that has the highest (maximum) availability and similarity, and computes
the difference in the similarity. If the current existing similarity is a lot higher, then
it (i) produces a very high responsibility. On the other hand, availability is computed
from the maximum responsibility of all the points. If there is another data point (i′)
that has higher responsibility to the current point k, then it (i) is very less likely to be
available. In each iteration, each vertex updates and exchanges its responsibility and
availability. I can determine the termination condition as the number of iterations,
or the convergence of exemplars. In this system, the algorithm runs continuously
to capture the dynamism in the road traffic. The algorithm detail is introduced in
chapter 3.
4.1.6 Clustering results
Although the algorithm does not terminate, I can always take a snapshot when-
ever I want, which is the step 8. At anytime, the user may send a query to the
system. System then take a snapshot of all nodes with their current availabilities and
Chapter 4: Dynamic Traffic Clustering System 35
responsibilities. A node i is belongs to the exemplar k that has highest a(i, k)+r(i, k).
4.2 Experiment
4.2.1 Experiment Setup
In the first set of experiments, I evaluate the quality of the system. Since there are
no existing dynamic clustering algorithm, I use the static version of the existing al-
gorithms to evaluate the quality of results. I compare affinity propagation (AP) with
different clustering algorithms to see the clustering solution quality. The algorithm
I choose are k-medoids (KM), DBSCAN, and average-linkage clustering (AGG). In
my case, I use k-medoids, an exemplar algorithm similar to affinity propagation. k-
medoids also accepts similarity matrix. In addition, I also evaluate average-linkage
clustering algorithm which also accepts the similarity matrix. Some of these algo-
rithms use dissimilarity matrix. For each similarity s(i, j) in similarity matrix, dis-
similarity d(i, j) = 1− s(i, j). With respect to the parameter setting, KM and AGG
clustering require the number of clusters apriori unlike AP. In AP, there is an ability
to obtain different number of clusters by changing the default self-similarity in the
input.
4.2.2 Comparison Setup - Correctness of results
Please note that I am comparing AP clustering algorithm with modified similarity
measurement against static algorithms that require the knowledge of the number of
clusters apriori. For fair comparison, I ensure that all algorithms use the same number
36 Chapter 4: Dynamic Traffic Clustering System
of clusters, although some algorithms are static. I set the number of clusters in KM
and AGG clustering equal to the number of clusters found by AP. DBSCAN does
not require the number of clusters as parameter, but it requires the maximum intra-
cluster similarity and the minimum cluster size, in the experiment I allow two points
to be in the same cluster if the dissimilarity is less than 0.5, and a point by itself can
be a cluster.
4.2.3 Data set and environment
I use AP, DBSCAN and AGG from Scikit-learn library [43], KM from pyclustering
library [41]. Programs are implemented in python. I evaluate the algorithms on
macOS with 4 cores of 2.2 GHz Intel Core i7 and 16 GB 1600 MHz DDR3 memory.
The dataset we use is collected from CityPulse [58]. It contains the traffic network
with 449 roads with directions and 136 road points from the city of rhus, Denmark.
In meta data, it provides the location of all sensors, and the entire network. The
real-time traffic data is recorded every 5 minutes, it contains the time stamp, the
average speed, and the number of vehicles.
4.2.4 Cluster Quality
In order to measure the quality of the solutions, I use Silhouette coefficient [49].
Other techniques such as Davies Bouldin-index or v-measure require centroid or
ground truth. Ground truth is difficult for a real traffic data. Additionally, I in-
clude the mean similarity within all clusters. A mean similarity is the average of all
pairs of intra-cluster similarity. A higher mean shows a higher intra-cluster similarity.
Chapter 4: Dynamic Traffic Clustering System 37
07:30 07:35 07:40 07:45
−0.1
0
0.1
Silhou
ette
Coeffi
cien
t
AP KM DBSCAN AGG
Figure 4.2: Peak Hour Silhouette
07:30 07:35 07:40 07:455 · 10−2
0.1
0.15
0.2
0.25
0.3
Sim
ilar
ity
Mea
n
AP KM DBSCAN AGG
Figure 4.3: Peak Hour Similarity Mean
To simulate the real-time clusters, I evaluate the data on 2014-08-02, and each
experiment uses the data recorded in 5 minutes in the first experiment. I show both
peak hours (7:30 AM - 7:45 AM) and non-peak hours (0:00 AM - 0:15 AM).
Figure 4.2 shows the Silhouette coefficient at peak hours. DBSCAN shows the
worst results. DSBCAN’s Silhouette coefficient is negative. This implies most items
38 Chapter 4: Dynamic Traffic Clustering System
00:00 00:05 00:10 00:150.1
0.15
0.2
0.25
0.3
0.35
Silhou
ette
Coeffi
cien
t
AP KM DBSCAN AGG
Figure 4.4: Non-Peak Hour Silhouette
are not properly placed in the right clusters. KM is in the third place with its
Silhouette coefficient being positive. However, it is not as good as AP or AGG. In
general, AP has better Silhouette coefficient than AGG. However, AGG is little better
than AP at 07:35. Thus at peak hours, AGG might be better than AP, but overall,
AP has better Silhouette coefficient.
Figure 4.3 shows the mean similarity at peak hours. Similar to figure 4.2,
DBSCAN is still the worst. KM outperforms AGG in three instances, but still, at
07:35, AGG shows a higher mean than KM. AP has highest mean than all other
algorithms. Overall, at peak hours, AP achieves best results in most cases. AGG in
some instances outperforms AP, however, it has a lower mean than AP.
Figure 4.4 and figure 4.5 show the Silhouette coefficient and mean at non-peak
hours. Unlike figure 4.2, figure 4.4’s DBSCAN shows positive results and sometimes
performs better than KM. DBSCAN and KM are still worse than AP and AGG. I
also notice that AGG is not able to outperform AP in any instance. AP always has
Chapter 4: Dynamic Traffic Clustering System 39
00:00 00:05 00:10 00:15
0
0.1
0.2
0.3
0.4
Sim
ilar
ity
Mea
n
AP KM DBSCAN AGG
Figure 4.5: Non-Peak Hour Similarity Mean
the best Silhouette coefficient. AP and AGG are still close compared to KM and
DBSCAN. With respect to mean, AP outperforms all other algorithms. DBSCAN is
unable to reach AP.
None of the experiments show a high silhouette coefficient nor high mean (≥ 0.5).
This is due to the characteristics of the similarity measurement and data sets. In
order to produce a similarity higher than 0.5, two road points must have the most
flow between each other, which is not possible in real world.
4.2.5 Number of Clusters
One advantage of AP is that it does not require the number of clusters to be
pre-defined. It makes the algorithm easier to use since users do not need to find
a good number of clusters in advance. Other algorithms such as KM require the
number of clusters in advance and may not give reasonable result if accurate number
of clusters is not given initially. Furthermore, the number of clusters provided by the
40 Chapter 4: Dynamic Traffic Clustering System
AP algorithm will provide useful information to drivers. Therefore, I examine the
number of clusters in one day period.
Figure 4.6: Number of Clusters
I execute the algorithm dynamically and continuously to provide real time traffic
flow information. figure 4.6 shows the change in the number clusters found by AP in
a day, every 30 minutes. At non peak hours, especially mid-night, there is few traffic
flow on the road, thus most roads have low similarity with each other, therefore, each
road may belong to its own cluster. This results in a large number of clusters. I
obtain a dramatic reduction in the number of clusters at around 7:30 AM. When the
time comes to rush hours, the traffic flow increases, and AP is able to cluster the
traffic network based on the traffic flow. The number of clusters further decreases
until 9:00 AM, since peak hours have very large flow and thus very large similarity
between the roads, the algorithm is able to partition the clusters easily. However, if
the traffic volume is not that large, the similarity between each point will not be that
Chapter 4: Dynamic Traffic Clustering System 41
sufficient, it is more difficult for the algorithm to partition the clusters.
From 00:00 to 8:00, the number of cluster decreases, since from 00:00, the traffic
volume is low, so all nodes have low similarity to each other. When the traffic volume
increases, the similarity become higher, and thus AP is able to find the clusters. From
8:00 to 9:00, I know this is not a peak hour and the number of cluster decreases. This
is due to the smooth and steady traffic flow. After peak hours, each road point does
not have a very large, dominated flow to other points and the algorithm obtains a
further decreased number of clusters.
These observations convinces me that the dynamic system works well during peak
and non-peak hours. As it should be, during non-peak (midnight) hours the number
of clusters increase due to low traffic flow and number of clusters decrease during
peak hours (early morning-more similarity with respect to traffic between) and right
after peak-hours (8:00-9:00 AM) the flow is steady.
Chapter 5
Cluster-based Traffic Prediction
In studying the traffic congestion problem, there are two types of locality we need
to consider: temporal locality and spatial locality. Spatial locality implies that if a
road r is affected by traffic, then the surrounding roads of r are also affected. Temporal
locality implies that if r is affected at time t, chances are that the surrounding roads
of r will also be affected at time t + 1. Unfortunately, this is not enough. The
surrounding roads are not the only ones affected. Over a period of time, t + δ, there
will be a domino effect. That is, the affected roads will in turn affect other roads
which in turn affect other roads and so on. The congestion is not stationary. It is
dynamic and it spreads. The event created by the road point (r) at time t and its
effect on the road traffic at time t+δ is difficult to capture by the smart technologies
currently. Predicting the future of any event is in generally difficult. In this chapter,
I attempt to solve this problem. I use machine learning techniques such as long-short
term memory to predict traffic on the road points formed by the clusters discussed
in the previous chapter.
42
Chapter 5: Cluster-based Traffic Prediction 43
5.1 Predicting traffic per cluster
LSTM is a powerful tool for prediction. LSTM is used in Deep Learning tasks.
Deep learning models are complex networks that allows extraction of even higher
level of information. Deep learning methods using auto-encoders have been used
problems such as travel route prediction problem [35]. In this thesis, as discussed in
section 3.2.1, the peephole [21] unit, a variant of long-short term memory model is used
for traffic prediction. In the literature, LSTM has shown to produce good solution
for time series data [13]. Traffic flow data is time series data. In the literature, LSTM
with single time-series flow data has been used [16; 37; 57]. However, LSTM has
the ability to handle more features in the data to improve the prediction accuracy.
Finding useful data in features is not an easy task. Traffic is both temporally and
spatially correlated.
In general, using the time series data available for the road points, I can find the
temporal correlation between two road points. How traffic influences each road point
at time t could be obtained. Since these are two different road points, I can find
the spatial correlation between the road points. The clustered data will be fed into
the LSTM neural network to predict the traffic flow at time t + δ. There are three
prediction designs: one to one, many to one, and many to many.
Figure 5.1 is an example of one to one prediction: one time-series data, one
prediction. In this figure I use the data of the road from time t = 1,2,...,8 to predict the
outcome at time 9. It is the most traditional way of predicting traffic flow [16; 37; 57].
However this method does not consider the spatially correlation between road points.
Figure 5.2 is many to one design: many time-series data to one prediction. Assume
44 Chapter 5: Cluster-based Traffic Prediction
Figure 5.1: One to One Prediction
road 1, road 2, and road 3 are in the same cluster because they have strong influence
on each other. Using the data available from roads 2 and 3, it is potentially helpful to
predict the outcome of road 1. In the figure, the data points for time t = 1,2,...,8 for
the three roads are used to predict what happens at time 9 for road 1. I can also say
that the traffic flow on road 2 and road 3 may influence road 1. This would improve
the prediction accuracy. However, it might also increase the computation overhead
since more data is required to compute or predict.
Figure 5.3 is many to many design. I am using the capabilities of LSTM to predict
not only one road at a time, but to predict for multiple roads. By feeding the data of
one cluster (which includes multiple road points), LSTM can predict the outcome of
the entire cluster. However, compared to many to one prediction, the computation
overhead is higher. However, if the prediction of the entire cluster is required, it will
Chapter 5: Cluster-based Traffic Prediction 45
Figure 5.2: Many to One Prediction
be more efficient since it can do the predictions in one shot rather than many to one
prediction, that predicts one road at a time. On the other hand, predicting the entire
cluster will increase training time since more data is fed in. I believe that with a high
performance parallel computer, the training time could be reduced. Please note that
I do not consider parallelization in this thesis.
Figure 5.3: Many to Many Prediction
46 Chapter 5: Cluster-based Traffic Prediction
5.2 Time stamp clustering
In order to make the best use of many to one and many to many prediction
model, the input time-series must have strong influences to each other. In chapter
4, I introduced a dynamic traffic clustering system, and I showed that the affinity
propagation clustering algorithm showed the best overall solution quality. Prediction,
however, is static. Each cluster can be trained statically using LSTM.
As discussed in the previous chapter, I collect data at each time interval. In the
experiments, I did this every five minutes (t = 5). Every time interval I cluster the
road points to capture the traffic influence on each other. At time t+ 5 minutes (say
10 minutes), I find another cluster. I can do this for say 1 hour. In this example, I
have 12 different clusters but a data point in one cluster at time, t = 5 may exist in
another cluster at time t = 15 for example. Since prediction is static, I design a static
clustering technique as follows. This is a bit more general and can be used any time.
I propose to use a cluster merge technique to merge the time stamp clusters with
all time clusters. To do this, first, I build a similarity matrix s[N,N ] that initially
values to 0. Then I start to cluster the traffic from first time stamp to last time stamp
in a month. If in a time stamp, a node A is in same cluster as node B, I increase
the similarity of the nodes by 1, that is, s[A,B]+ = 1. After all time stamps are
evaluated, I normalize the similarity matrix s[N,N ] by s[N,N ] = s[N,N ]−min(s[N,N ])max(s[N,N ])
so all values in similarity matrix will be in range s[N,N ]. Now, I can apply this
similarity matrix to any clustering algorithm that I would like to use to generate the
Chapter 5: Cluster-based Traffic Prediction 47
final static traffic clusters. The pseudocode of the algorithm is shown below.
Algorithm 3: Cluster Merge
Data : Clustering Algorithm A(x), Time stamp similarities
TS = {ts0, ts1, . . . tsT}
Result: Cluster labels: C(i), i ∈ N
1 s[N,N ] = 0;
2 for t ∈ T do
3 Ct = A(tst);
4 ∀i, j ∈ N : if Ct(i) == Ct(j) then
5 s[N,N ]+ = 1;
6 end
7 end
8 s[N,N ] = s[N,N ]−min(s[N,N ])max(s[N,N ])
;
9 C = A(s);
10 Return C;
The reason I do not merge the flow data but merge the cluster is to ensure that
each time stamp and each cluster have the same weight in the final cluster. In a road
network, most traffic flow happens in peak hours. If I merge the cluster data, the
non-peak hour clusters will have less weight. Since the prediction of traffic flow could
take place at anytime, I choose to merge the clusters based on time stamp and not
traffic flow.
48 Chapter 5: Cluster-based Traffic Prediction
5.3 Experiment
The program in implemented in Python by using Keras library [8]. The exper-
imental machine is a 4.2 GHz Intel i7-7700k processor with 16 GB memory. The
operating system used is Unbuntu 18.04 LTS. My LSTM design has four hidden lay-
ers. The first two are LSTM layers. Both these layers have 64 LSTM units. The third
layer is a dropout layer. The purpose of this layer is to avoid over fitting: this means
that the LSTM is so well trained with the current data set that it does not work well
for other new data sets. There is a one to one mapping to the last layer to decide
whether to dropout the unit in computation. The dropout rate is set to be 0.2. The
last layer is a Dense layer. Each neuron in this layer is connected to all neurons in
dropout layer. I then apply a sigmoid function. The size of Dense layer is set to be
same as the output size. In one to one and many to one model, it has only 1 unit,
for many to many model, the number of units is the same as the cluster size. For the
back propagation algorithm, Adam optimizer [30] is used. I use the recommended
parameters from Keras [29], which is shown in table 5.1.
α β1 β2 ε0.001 0.9 0.999 10−8
Table 5.1: Adam optimizer parameters
Data set is collected from CityPulse [58]. I use the data from 2014-03-01 to 2014-
05-30. Data from 2014-03-01 to 2014-04-30 is used as training data, and data from
2014-05-01 to 2014-05-30 is used as test data. I use the past 1 hour data (12 time
stamps) to predict the next 5 minutes (1 time stamp).
To evaluate the prediction quality, I use mean square error (MSE, equation 5.1),
Chapter 5: Cluster-based Traffic Prediction 49
mean absolute error (MAE, equation 5.2), explained variance (EV, equation 5.3), and
R2 regression (R2, equation 5.4) to evaluate. MSE, MAE are the errors found in the
result. We should expect a lower result. EV and R2 shows how well the result is.
So, if the result is to be classified as good, the values should be high. The highest
possible value of EV and R2 is 1, which means the predicted result is perfectly same
as real data.
MSE =1
n
n∑i=1
(yi − pi)2 (5.1)
MAE =1
n
n∑i=1
|yi − pi| (5.2)
EV = 1− V ar(y − p)y
(5.3)
R2 = 1−∑n
i=1(yi − pi)2∑ni=1(yi − y)2
(5.4)
All three models (one to one, many to one, many to many) are evaluated. 600
echoes are executed in each training. I choose road 180709 which is one of the most
busies road in data set as my experiment road. By applying affinity propagation and
cluster merge algorithm, there are 51 more roads in the same cluster with road 180709.
In all experiments, the time cost is about 66 µsec per step during training, and same
as prediction, the time costs are same. I assume from this observation that by adding
more features to the input data, the impact on performance is too insignificant to
observe. In this experiment, 52 features has no impact on performance compared
50 Chapter 5: Cluster-based Traffic Prediction
with only 1 feature.
Figure 5.4: Prediction Result
Figure 5.4 shows the result of many to one prediction of first day, note the traffic
flow is normalized within [0, 1]. for peak hours, the result is not very accurate since the
flow in peak hours can be extremely different depending on the day. The prediction
is very close to the real traffic flow value and the floating pattern. Thus in general,
LSTM shows good solution quality.
In addition, I studied how much benefits I can obtain from the clustered traffic.
That is, how much accuracy I can achieve by adding more features. Therefore, I
evaluated all 3 models. One to one model is just normal LSTM prediction, read
Chapter 5: Cluster-based Traffic Prediction 51
road 180709 as input, and predicted road 180709 only. Many to one model read the
entire cluster of road 180709 as input, and predicted road 180709 only. Many to many
model read the entire cluster of road 180709, and predicted the future traffic for entire
cluster of road 180709. The results are included in table 5.2. In all 4 metrics, many to
many achieves the best result, many to one is in second place, and one to one model
is the worst. Many to one and many to many model is better than one to one model
as expected. The clustered traffic data indeed improves the solution quality. MSE
is reduced by 30%, MAE is reduced by 21%, the errors are clearly decreased. Both
EV and R2 has increased by about 7%. Another benefit of many to many model is,
it is able to predict the entire cluster, by training only once. The entire cluster data
can be predicted, compared with one to one and many to one model. It saves huge
amount of time in training. Since I found the increase in features does not increase
the time cost, in this experiment, I train 52 roads together, and reduce the total time
amount to 1/52.
Models MSE MAE EV R2One to One 0.006501 0.051743 0.779400 0.779377Many to One 0.004556 0.042653 0.810834 0.810533Many to Many 0.004004 0.039840 0.836513 0.833502
Table 5.2: Evaluation of LSTM Models
Chapter 6
Conclusion and Future Work
In this thesis, I proposed a dynamic traffic clustering system to forecast traffic
flow on urban roads. I used an IoT network where in each IoT device placed on
the road constantly collects traffic flow and communicates via wireless network to
compute the real-time traffic clusters. These clusters consisted of road points that
had similar traffic flow at a given time period. The clusters were based on the shortest
path between road points and the influence of traffic on this path. I showed that
the algorithm dynamically discovered the clusters during peak and non-peak hours.
I also showed the correctness of the algorithm by comparing the algorithm to static
algorithms using predefined number of clusters. The real-time traffic clustering system
using affinity propagation is decentralized, and offers good solution quality. The
evaluation showed that the affinity propagation clustering technique had overall best
Silhouette coefficient and similarity mean among k-medoid, DBSCAN, and average-
linkage algorithm. The number of clusters found by affinity propagation showed
reasonable number of clusters in both peak hours and non-peak hours.
52
Chapter 6: Conclusion and Future Work 53
The clusters were then trained using a variant of recurrent neural network, Long-
short term memory (LSTM). I evaluated various clustering and prediction metrics to
show the feasibility of the proposed approach. In order to produce a high prediction
quality, both temporal locality and spatial locality were considered. Clustering op-
timized the spatial locality of traffic network while LSTM predicted the traffic flow
based on previous data optimizing temporal locality. I proposed many to one model
and many to many model to predict the traffic flow. Many to one model used cluster
data to predict single road, many to many model used cluster data to predict the
entire cluster. In the experiment, I found there is no performance reduction in many
to one model and many to many model. On the other hand, many to one and many to
many model achieved better solution quality than non-clustered traffic data. Many
to many model can also reduce the training cost, an entire cluster can be trained
together, thus the speedup is 1|C| .
6.1 Future Work
Here are some ideas on future work.
• Improve the similarity measurement.
• Find multiple paths (say k) using bio-inspired techniques such an ant colony
optimization. The traffic flow in top-k shortest path can be considered simul-
taneously in the similarity measurement.
• Implement the affinity propagation on real IoT devices to build a wireless net-
work to test the real performance of it.
54 Chapter 6: Conclusion and Future Work
• Prediction on huge traffic network is compute intensive even with many to many
model. Domain specific architectures such as TPU can reduce the time cost.
Since the clusters can be predicted independently, partitioning the clusters into
different accelerator such as GPU and TPU could help reduce the time cost.
Bibliography
[1] Ahmed, M. S., and Cook, A. R. Analysis of freeway traffic time-series data
by using Box-Jenkins techniques. No. 722 in Transportation Research Record.
Transportation Research Board, 1979.
[2] Al-Sakran, H. O. Intelligent traffic information system based on integration
of internet of things and agent technology. International Journal of Advanced
Computer Science and Applications (IJACSA) 6, 2 (2015), 37–43.
[3] autopilot-project. AUTOPILOT.
[4] Berkhin, P. A survey of clustering data mining techniques. In Grouping
multidimensional data. Springer, 2006, pp. 25–71.
[5] Bonabeau, E., Marco, D. d. R. D. F., Dorigo, M., Theraulaz, G.,
Theraulaz, G., et al. Swarm intelligence: from natural to artificial sys-
tems. No. 1 in Santa Fe Institute Studies on the Sciences of Complexity. Oxford
university press, 1999.
[6] Chan, K. Y., Dillon, T. S., Singh, J., and Chang, E. Neural-network-
based models for short-term traffic flow forecasting using a hybrid exponential
55
56 Bibliography
smoothing and levenbergmarquardt algorithm. IEEE Trans. Intell. Transp. Syst.
13, 2 (June 2012), 644654.
[7] Cherkassky, B. V., and Goldberg, A. V. On implementing the push
relabel method for the maximum flow problem. Algorithmica 19, 4 (1997), 390–
410.
[8] Chollet, F., et al. Keras. https://keras.io, 2015.
[9] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Intro-
duction to Algorithms, second ed. MIT Press and McGraw-Hill, 2001.
[10] Deneubourg, J.-L., Goss, S., Franks, N., Sendova-Franks, A., De-
train, C., and Chretien, L. The dynamics of collective sorting robot-like
ants and ant-like robots. In Proceedings of the first international conference on
simulation of adaptive behavior on From animals to animats (1991), pp. 356–363.
[11] dos Santos, W., Carvalho, L. F. M., de P. Avelar, G., Jr., A. S.,
Ponce, . M., Guedes, D., and Jr., W. M. Lemonade: A scalable and effi-
cient spark-based platform for data analytics. In 17th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing (2017).
[12] Drane, C. R., and Rizos, C. Positioning systems in intelligent transportation
systems. Artech House, Inc., 1998.
[13] Duan, Y., Lv, Y., and Wang, F.-Y. Travel time prediction with lstm neural
network. In IEEE 19th International Conference on Intelligent Transportation
Systems (ITSC) (Rio de Janeiro, Brazil, November 2016).
Bibliography 57
[14] Everitt, B. Cluster analysis. Wiley, Chichester, West Sussex, U.K, 2011.
[15] Frey, B. J., and Dueck, D. Clustering by passing messages between data
points. science 315, 5814 (2007), 972–976.
[16] Fu, R., Zhang, Z., and Li, L. Using lstm and gru neural network methods
for traffic flow prediction. In Chinese Association of Automation (YAC), Youth
Academic Annual Conference of (2016), IEEE, pp. 324–328.
[17] Fu, Z., Hu, W., and Tan, T. Similarity based vehicle trajectory clustering and
anomaly detection. In Image Processing, 2005. ICIP 2005. IEEE International
Conference on (2005), vol. 2, IEEE, pp. II–602.
[18] Galdi, P., Napolitano, F., and Tagilaferri, R. A comparison between
affinity propagation and assessment based methods in finding the best number
of clusters. In Eleventh International Meeting on Computational Intelligence
Methods for Bioinformatics and Biostatistics (june 2014).
[19] Garey, M. R., Johnson, D. S., and Stockmeyer, L. Some simplified
np-complete graph problems. Theoretical computer science 1, 3 (1976), 237–267.
[20] Gaur, A., Scotney, B., Parr, G., and McClean, S. Smart city archi-
tecture and its applications based on iot. Procedia computer science 52 (2015),
1089–1094.
[21] Gers, F. A., and Schmidhuber, J. Recurrent nets that time and count. In
Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural
58 Bibliography
Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives
for the New Millennium (2000), vol. 3, IEEE, pp. 189–194.
[22] Goldberg, A. V. Efficient graph algorithms for sequential and parallel com-
puters. PhD thesis, Dept. of Electrical Engineering and Computer Science, Mas-
sachusetts Institute of Technology, 1987.
[23] Goldberg, A. V., and Tarjan, R. E. A new approach to the maximum
flow problem. Journal of the ACM (JACM) 35, 4 (1988), 921940.
[24] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural
computation 9, 8 (1997), 1735–1780.
[25] Hochreiter, S., and Schmidhuber, J. Long short-term memory. Neural
Computation 9, 8 (1997), 1735–1780.
[26] Huang, S.-H., and Ran, B. An application of neural network on traffic speed
prediction under adverse weather condition. PhD thesis, University of Wisconsin–
Madison, 2003.
[27] Infrastructure Canada. Smart Cities Challenge,.
[28] Kamarianakis, Y., and Prastacos, P. Forecasting traffic flow conditions in
an urban networkcomparison of multivariate and univariate approaches. Transp.
Res. Rec. 1857 (2003), 7484.
[29] keras. Keras: The Python Deep Learning library.
[30] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980 (2014).
Bibliography 59
[31] Kuntz, P., Snyers, D., and Layzell, P. A stochastic heuristic for visual-
ising graph clusters in a bi-dimensional space prior to partitioning. Journal of
Heuristics 5, 3 (1999), 327–351.
[32] Lee, S., and Fambro, D. Application of subset autoregressive integrated
moving average model for short-term freeway traffic volume forecasting. Transp.
Res. Rec. 1678 (1999), 179188.
[33] Liu, Y. Y. A polymorphic ant-based algorithm for graph clustering. Master’s
thesis, Department of Computer Science, University of Manitoba, Winnipeg,
MB, Canada, 2016.
[34] Liu, Y. Y., Thulasiraman, P., and Thulasiram, R. K. A self fixing intelli-
gent ant clustering algorithm for graphs. In IEEE International Joint Conference
on Neural Networks in IEEE World Congress on Computational Intelligence (Rio
de Janeiro,Brazil, July 2018).
[35] Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.-Y., et al. Traffic flow
prediction with big data: A deep learning approach. IEEE Trans. Intelligent
Transportation Systems 16, 2 (2015), 865–873.
[36] Ma, X., Dai, Z., He, Z., Ma, J., Wang, Y., and Wang, Y. Learning traffic
as images: a deep convolutional neural network for large-scale transportation
network speed prediction. Sensors 17, 4 (2017), 818.
[37] Ma, X., Tao, Z., Wang, Y., Yu, H., and Wang, Y. Long short-term
memory neural network for traffic speed prediction using remote microwave sen-
60 Bibliography
sor data. Transportation Research Part C: Emerging Technologies 54 (2015),
187–197.
[38] Meidan, Y., Bohadana, M., Shabtai, A., Guarnizo, J. D., Ochoa, M.,
Tippenhauer, N. O., and Elovici, Y. Profiliot: a machine learning approach
for iot device identification based on network traffic analysis. In Proceedings of
the Symposium on Applied Computing (2017), ACM, pp. 506–509.
[39] Mitton, N., Papavassiliou, S., Puliafito, A., and Trivedi, K. S. Com-
bining cloud and sensors in a smart city environment. EURASIP Journal of
Wireless Communications and Networking (2012).
[40] Newman, M. E. J. Detecting community structure in networks. The European
Physical Journal B-Condensed Matter and Complex Systems 38, 2 (2004), 321–
330.
[41] Novikov, A. annoviko/pyclustering: pyclustering 0.8.2 release, Nov. 2018.
[42] Ozbay, K., and Kachroo, P. Incident management in intelligent transporta-
tion systems. Artech House Publishers (1999).
[43] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,
B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,
V., et al. Scikit-learn: Machine learning in python. Journal of machine learn-
ing research 12, Oct (2011), 2825–2830.
[44] Petrolo R, Loscr V, M. N. Towards a smart city based on cloud of things.
In Proceedings of the 2014 ACM international workshop on Wireless and mobile
Bibliography 61
technologies for smart cities - WiMobCity (New York, New York, USA, 2014),
ACM Press, pp. 61–66.
[45] Qasem, M. Bio-inspired constrained clustering: A case study on aspect-based
sentiment analysis. PhD thesis, Department of Computer Science, University of
Manitoba, Winnipeg, MB, Canada, 2018.
[46] Qasem, M., Thulasiraman, P., and Thulasiram, R. K. Constrained ant
brood clustering algorithm with adaptive radius: A case study on aspect based
sentiment analysis. In IEEE Swarm Intelligence Symposium (SIS), IEEE Sym-
posium Series on Computational Intelligence (SSCI) (Honolulu, Hawaii, USA,
Nov 27-Dec 1 2017).
[47] Rao, W., Yoneki, E., and Chen, L. L-graph: A general graph analytic
system on continuous computation. In ACM HotPlanet (2015).
[48] REFIANTI, R., MUTIARA, B., and GUNAWAN, S. Time complexity
comparison between affinity propagation algorithms. Journal of Theoretical and
Applied Information Technology 95, 7 (2017), 1497–1505.
[49] Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and vali-
dation of cluster analysis. Journal of computational and applied mathematics 20
(1987), 53–65.
[50] Schaeffer, S. E. Graph clustering. Computer science review 1, 1 (2007),
27–64.
62 Bibliography
[51] Shea, C., Hassanabadi, B., and Valaee, S. Mobility-based clustering in
vanets using affinity propagation. In Global telecommunications conference, 2009.
GLOBECOM 2009. IEEE (2009), IEEE, pp. 1–6.
[52] Smith, B. L., Williams, B. M., and Oswald, R. K. Comparison of para-
metric and nonparametric models for traffic flow forecasting. Transportation
Research Part C: Emerging Technologies 10, 4 (2002), 303–321.
[53] Strohbach, M., Ziekow, H., Gazis, V., and Akiva, N. Towards a big
data analytics framework for iot and smart city applications. In Modeling and
processing for next-generation big-data technologies. Springer, 2015, pp. 257–282.
[54] Sun, S., Zhang, C., and Yu, G. A bayesian network approach to traffic flow
forecasting. IEEE Transactions on intelligent transportation systems 7, 1 (2006),
124–132.
[55] Taniguchi, E., and Shimamoto, H. Intelligent transportation system based
dynamic vehicle routing and scheduling with variable travel times. Transporta-
tion Research Part C: Emerging Technologies 12, 3-4 (2004), 235–250.
[56] Theodoridis, E., Mylonas, G., and Chatzigiannakis, I. Developing an
IoT smart city framework. In IISA 2013 (2013), IEEE, pp. 1–6.
[57] Tian, Y., and Pan, L. Predicting short-term traffic flow by long short-
term memory recurrent neural network. In Smart City/SocialCom/SustainCom
(SmartCity), 2015 IEEE International Conference on (2015), IEEE, pp. 153–
158.
Bibliography 63
[58] Tonjes, R., Barnaghi, P., Ali, M., Mileo, A., Hauswirth, M., Ganz,
F., Ganea, S., Kjærgaard, B., Kuemper, D., Nechifor, S., et al. Real
time iot stream processing and large-scale data analytics for smart city applica-
tions. In poster session, European Conference on Networks and Communications
(2014), sn.
[59] vanderVoort, M., Dougherty, M., and Watson, S. Combining koho-
nen maps with arima time series models to forecast traffic flow. Transportation
Research . C, Emerging Technology 4, 5 (October 1996), 307–318.
[60] Vlahogianni, E. I., Karlaftis, M. G., and Golias, J. C. Optimized and
meta optimized neural networks for short-term traffic flow prediction: A genetic
approach. Transp. Res. C, Emerging Technol. 13, 3 (June 2005), 211–234.
[61] Wang, Z., Liu, Y. Y., Thulasiraman, P., and Thulasiram, R. K. Ant
brood clustering on intel xeon multi-core: Challenges and strategies. In Sympo-
sium Series on Computational Intelligence (2018), IEEE.
[62] Wang, Z., Thulasiraman, P., and Thulasiram, R. A dynamic traffic
awareness system for urban driving. The 12th IEEE International Conference
on Internet of Things (2019).
[63] Weiming, L., Du Chenyang, W. B., Chunhui, S., and Zhenchao, Y.
Distributed affinity propagation clustering based on map reduce. Journal of
Computer Research and Development 8 (2012), 024.
64 Bibliography
[64] Williams, B., Durvasula, P., and Brown, D. Urban freeway traffic flow
prediction: application of seasonal autoregressive integrated moving average and
exponential smoothing models. Transportation Research Record: Journal of the
Transportation Research Board 1644 (1998), 132–141.
[65] Williams, B. M. Multivariate vehicular traffic flow prediction evaluation of
arimax modeling. Transp. Res. Rec. 1776 (2001), 194200.
[66] Williams, B. M., and Hoel, L. A. Modeling and forecasting vehicular
traffic flow as a seasonal arima process: Theoretical basis and empirical results.
J. Transp. Eng. 129, 6 (Nov 2003), 664672.
[67] Williams, B. M., and Hoel, L. A. Modeling and forecasting vehicular traffic
flow as a seasonal arima process: Theoretical basis and empirical results. Journal
of transportation engineering 129, 6 (2003), 664–672.
[68] Wu, D., Arkhipov, D. I., Asmare, E., Qin, Z., and McCann, J. A.
Ubiflow: Mobility management in urban-scale software defined iot. In 2015
IEEE Conference on Computer Communications (INFOCOM) (2015), IEEE,
pp. 208–216.
[69] Yu, B., Song, X., Guan, F., Yang, Z., and Yao, B. k-nearest neighbor
model for multiple-time-step prediction of short-term traffic condition. Journal
of Transportation Engineering 142, 6 (2016), 04016018.
Bibliography 65
[70] Yu, M., Zhang, D., Cheng, Y., and Wang, M. An rfid electronic tag
based automatic vehicle identification system for traffic iot applications. In 2011
Chinese Control and Decision Conference (CCDC) (2011), IEEE, pp. 4192–4197.
[71] Zhang, B., Xing, K., Cheng, X., Huang, L., and Bie, R. Traffic clustering
and online traffic prediction in vehicle networks: A social influence perspective.
In Infocom, 2012 Proceedings IEEE (2012), IEEE, pp. 495–503.
[72] Zhang, L., Liu, Q., Yang, W., Wei, N., and Dong, D. An improved
k-nearest neighbor model for short-term traffic flow prediction. Procedia-Social
and Behavioral Sciences 96 (2013), 653–662.
[73] Zheng, W., Lee, D.-H., and Shi, Q. Short-term freeway traffic flow pre-
diction: Bayesian combined neural network approach. Journal of transportation
engineering 132, 2 (2006), 114–121.