andrei cramariuc indoor positioning using wireless … cramariuc.pdf · of outdoor positioning...
TRANSCRIPT
ANDREI CRAMARIUC
INDOOR POSITIONING USING WIRELESS LOCAL AREA
NETWORK FINGERPRINTING
Bachelor of Science Thesis
Examiner: University Lecturer
Heikki Huttunen
Supervisor: Associate Professor
Simona Lohan
I
ABSTRACT
ANDREI CRAMARIUC: Indoor Positioning Using Wireless Local Area NetworkFingerprintingTampere University of Technology
Bachelor of Science Thesis, 22 pages
May 2015
Bachelor's Degree Programme in Electrical Engineering
Major: Signal Processing
Examiner: University Lecturer Heikki Huttunen
Supervisor: Associate Professor Simona Lohan
Keywords: indoor positioning, WLAN, RSS �ngerprinting, clustering
Following the success of outdoor positioning systems the focus has now shifted to
developing equally precise positioning methods for indoor environments. Indoor po-
sitioning can be achieved based on �ngerprinting the received signal strength (RSS)
of the already widely available wireless local area network (WLAN). Fingerprinting
means that the target area is mapped beforehand by measuring the RSS in numerous
positions. In this thesis we use a RSS positioning method based on a Naive Bayes
classi�er to study the e�ect of clustering the �ngerprints. The results of the simu-
lations were obtained by using K-means to cluster real data for which the ground
truth was known. On average, the positioning error increased by 7 percent while on
the other hand the accuracy by which the correct �oor was detected increased by
10 percent. At the same time the localization process was almost two times faster,
because half of the �ngerprints were not considered in the process. Clustering the
�ngerprints o�ers a way to balance precision, time e�ciency and the amount of re-
quired data. The studied method is generic and could be also used in combination
with other positioning methods.
II
TIIVISTELMÄ
ANDREI CRAMARIUC: Sisätilapaikannus käyttäen langattoman lähiverkon sor-menjäljentämistäTampereen teknillinen yliopisto
Diplomityö, 22 sivua
Toukokuu 2015
Sähkötekniikan koulutusohjelma
Pääaine: Signaalinkäsittely
Tarkastaja: Yliopistolehtori Heikki Huttunen
Ohjaaja: Associate Pro�essor Simona Lohan
Avainsanat: sisätilapaikannus, WLAN, RSS sormenjäljentäminen, klusterointi
Ulkotilapaikannuksen systeeminen menestyksen seuruksena on alettu etsimään vas-
taavan laatuisia paikannusmenetelmiä sisätiloja varten. Käyttäen jo olemassa olevaa
langatonta verkkoa (WLAN), sisätilojen paikannus voidaan tehdä sormenjäljentä-
mällä tulevan WLAN signaalin vahvuus (RSS). Sormenjäljentämisellä tarkoitetaan
halutun alueen kartoitusta, mittaamalla WLAN signaalin vahvuus useassa eri pis-
teessä. Tässä työssä käytetään Naive Bayes luokitinta ja tulevan signaalin vahvuu-
den sormenjäljentämistä paikannukseen. Tarkoituksena on tutkia miten sormenjäl-
kien klusterointi vaikuttaa tulokseen. Työssä tehtyjen simulaatioden tulokset saat-
tiin käyttämällä K-means algoritmia klusteroimaan todellista dataa, jolle oli mitattu
myös oikea sijainti. Keskimäärin paikannusvirhe kasvoi 7 prosenttia ja samalla tark-
kuus, jolla oikea kerros löydettiin, kasvoi 10 prosenttia. Paikannukseen kuluva aika
puoliintui, sillä klusteroinnin avulla puolitettiin tarvittavien sormenjälkien määrä.
Sormenjälkien klusterointi tarjoaa tavan tasapainottaa tarkkuutta, tehokkuutta ja
tarvittavan datan määrää. Tutkittu menetelmä on yleiskäyttöinen ja sitä olisi mah-
dollista soveltaa myös muihin sisätilojen paikannusmenetelmiin.
III
PREFACE
This thesis is based on the work and measurements done at the Laboratory of
Electronics and Communications Engineering, Department of Signal Processing, at
Tampere University of Technology. I am grateful for the guidance, comments and
feedback provided during the writing of this thesis by my examiner Heikki Huttunen.
I want to thank my supervisor, Simona Lohan, for her assistance, patience and help
in choosing and understanding such an interesting topic. I also want thank to my
opponent Eetu Kuusisto and the other participants to the Bachelor's thesis seminar
for the interesting discussions and constructive remarks.
May 17, 2015
Andrei Cramariuc
IV
TABLE OF CONTENTS
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Naive Bayes classi�er . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Indoor measurement data . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Fine positioning method . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Coarse positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Clusters used in positioning . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1
LIST OF ABBREVIATIONS AND SYMBOLS
WLAN Wireless Local Area Network
ToF Time of Flight
AoA Angle of Arrival
RSS Received Signal Strength
AP Access Point
MAC Media Access Control; an identi�er for electronic devices
NAP Number of access points
NGP Number of grid points, which is also the number of �ngerprints
Pavg Number of points to average in the Naive Bayes classi�er to obtain
the position estimate
K Number of clusters
Kfine Number of clusters in which to use the �ne positioning method
2
1. INTRODUCTION
People spend most of their time indoors in o�ces, shopping malls, restaurants, hos-
pitals, metros, museums, etc. It is therefore not surprising that following the success
of outdoor positioning technologies in a wide range of applications, the focus has
now shifted to developing suitable technologies and methods for indoor position-
ing and navigation. Advances in indoor positioning performance can facilitate a
large number of applications �elds such as medical care, where the monitoring of
a patients position can be very important [3]. Other possible applications include
indoor navigation for �re�ghters, museums or large public buildings [2] as well as net-
work optimisation and load balancing [1]. Most modern indoor positioning systems
use Wireless Local Area Networks (WLAN), Bluetooth, Ultra-wideband, Radio-
frequency identi�cation, infrared, vision based or ultrasound technologies [1][2].
Whereas the popular satellite and cellular network-based positioning methods pro-
vide su�cient accuracy outdoors, they are ine�cient in the shielded and complex
indoor environment [4]. The challenges of indoor positioning include multipath sig-
nal propagation, non line-of-sight conditions and high attenuation and scattering of
the signal [4]. The lack of a widespread indoor positioning infrastructure is due to
the fact that the currently available technologies either lack precision or they are
too expensive and require speci�c locally installed infrastructure [1][2]. The increas-
ing density and availability of Wireless Local Area Networks (WLAN) has sparked
interest in developing precise WLAN-based solutions for indoor positioning.
The three parameters of an incoming signal that can be used to determine the
receiver's relative position to the source are its time of �ight (ToF), angle of arrival
(AoA), or the received signal strength (RSS) [5]. In this work we will focus on RSS-
based methods, which compared to the ones that rely on ToF or AoA, can be easily
implemented using existing protocols and hardware, available in all modern mobile
phones. RSS positioning consists of two phases, an o�ine and an online phase. In
the o�ine phase RSS �ngerprints are collected from several points throughout the
target area, where a �ngerprint consists of a three dimensional position and a vector
of RSS values from the visible access points measured at that position. In the online
phase positioning is done using the gathered �ngerprint map and the RSS vector
Chapter 1. Introduction 3
at the current location. Approaches to positioning using RSS �ngerprinting can be
divided into three main categories: deterministic, probabilistic or machine learning
based [5].
This thesis extends a probabilistic approach based on a Naive Bayes classi�er to
calculate a probability for each �ngerprint that corresponds to how likely it is that
the current position is at that location [6]. Afterwards, a group of points with the
highest probability are chosen and their positions are averaged to obtain an estimate
for the current location.
The aim of this thesis is to study the inclusion of an additional o�ine phase, of
preprocessing, where the RSS �ngerprints are clustered. A spatially relevant seg-
mentation of the �ngerprints would allow the Naive Bayes classi�er to use less points
for positioning, thus increasing e�ciency, while maintaining the same accuracy. The
clusters could further be used to increase accuracy by discarding points that have
high probability but are not spatially close the the other high probability points.
This thesis focuses on the use of the standard K-means clustering algorithm, with
varying parameters. Previous works by various authors have reached positive re-
sults employing similar methods, based on a di�erent combinations of clustering
and positioning algorithms [5][7][8].
Chapter 1 represents the introductory part of the thesis. In Chapter 2 the basic
theoretical concepts used in subsequent chapters will be brie�y explained. Chapter
3 focuses on describing the implementation of the discussed positioning method.
In Chapter 4 the obtained results are presented and analysed. The �nal chapter
contains a brief summary of the work as well as conclusions that can be drawn from
the results.
4
2. THEORY
This chapter is a review of the basic theoretical concepts of Naive Bayes classi�cation
and K-means clustering. These are common signal processing methods that were
adapted in this work for the purpose of indoor positioning.
2.1 Naive Bayes classi�er
The Naive Bayes algorithm is a classi�cation algorithm that classi�es data based on
a set of attributes X = (x1, · · · , xn) for which the target classes (C1, C2, . . . , Cn) are
known. The assumption that is made to simplify the system is that all the attributes
are conditionally independent of one another. Due to this the probability that X
belongs to a class C can be written as
P (C|X) ∝ P (C)n∏i=1
P (xi|C) (2.1)
The left part of the equation is the posterior probability, which is the probability of
class C given the attributes X. Using the Bayes' theorem the conditional probability
P (C|X) can be decomposed. In equation (2.1) P (C) is the prior probability of C,
which is the probability of observing C regardless of any other information. The last
term P (xi|C) is the likelihood of the feature xi occurring given the class C. These
probabilities can be added separately due to the assumption of independence. The
assumption therefore allows for the classi�cation to be done independently for each
feature and the probabilities combined to obtain the total probability for each class.
[11, pp. 20�23]
For example, let's assume �ve kids have the features described in Table 2.1. Using
2.1. Naive Bayes classi�er 5
Table 2.1
Clothing (x1) Hair color (x2) Gender (Class)
Pants Black Boy
Pants Blonde Boy
Skirt Black Girl
Pants Blonde Girl
Skirt Red Girl
Naive Bayes to classify a kid that is blonde and wears trousers the probabilities are
P (Boy|Pants, Blonde) ∝ P (Boy)P (Boy|Pants)P (Boy|Blonde) =
=2
5× 2
2× 1
2=
1
5
P (Girl|Pants, Blonde) ∝ P (Girl)P (Girl|Pants)P (Girl|Blonde) =
=3
5× 1
3× 1
3=
1
15
(2.2)
Therefore, it is more probable that it is a boy rather than a girl. A more complex
example is in Figure 2.1 where Naive Bayes is used to classify �owers based on
continuous features.
Even though the assumption of independence might not always be true, it greatly
simpli�es the system and in many cases leads to accurate results. The main advan-
tages of the Naive Bayes classi�er are its speed and that its insensitivity to irrelevant
features. [11, pp. 20�23]
(a) Training data (b) Calculated decision boundaries
Figure 2.1 The �gure on the left shows the sepal width and length measurements for
three categories of �owers Setosa, Versicolor and Virginica. The resulting classi�cation
boundaries are visible in the �gure on the right.
2.2. K-means clustering 6
2.2 K-means clustering
K-means is a popular clustering algorithm that assigns each point x = (x1,x2, · · · ,xN)in a dataset to cluster c = (c1, c2, · · · , cK). The total number of clusters K is �xed
and has to be determined beforehand. The clustering is done by minimizing the
within-cluster sum of squares, in other words the objective is to �nd c minimizing
argminc
K∑i=1
∑x∈ci
||x− µi||2 (2.3)
where µi is the mean of the points in cluster ci. The obstacle is that �nding a
solution to satisfy Eq. (2.3) is a NP-hard problem. However, there exist di�erent
heuristic methods that converge quickly to a local optimum. [11, pp. 526�528]
Algorithm 1 Lloyd's algorithm
1. Initialize the cluster centres
2. Assign each data point to the nearest cluster
3. Recalculate the center of the cluster as the mean of all the data points belong-ing to that cluster
4. Repeat steps 2.�3. until convergence
(a) Initialization of the clustersat random
(b) After the �rst iteration (c) Final result after 5 itera-tions
Figure 2.2 An example of how the iterative K-means algorithm works, where the centroid
are marked by black crosses. The existence of three clusters is assumed before running the
algorithm.
The most commonly used method is referred to as Lloyd's algorithm, described in
Algorithm 1 and brie�y illustrated in Figure 2.2. Lloyd's algorithm uses iterative
2.2. K-means clustering 7
re�nement to �nd the solution, but before the algorithm can be applied the number
of clusters must be determined and the centroids must be initialize [11, pp. 526�
528]. The original initialization of the clusters can be done at random, but this
can cause the algorithm to not converge or to converge to a local minimum. Better
methods exist that cause the algorithm to converge more quickly and to a better
result . A few of the commonly used methods for cluster initialization are Forgy's,
Jacey's, MacQueen's and k-means++ [9]. Another parameter that a�ects the result
of the clustering is the choice of the distance measure between the clustered points
[9].
8
3. IMPLEMENTATION
The positioning method implemented in this thesis can be divided into two main
stages, namely an o�ine and an online stage, represented in Figure 3.1. In the o�ine
stage the required data is gathered and preprocessed and the actual positioning is
done in the online stage.
Figure 3.1 A diagram of the positioning method described in this thesis.
The �rst step 1(a) in the o�ine stage is to map the target area, by collecting �nger-
prints in multiple locations. A �ngerprint consists of a three dimensional position
and a vector containing the media access control (MAC) address and received signal
strength (RSS) of each visible access points at that positions. Afterwards in 1(b) the
gathered data points are clustered together to increase performance. The reasoning
for this will be better detailed in Section 3.3. In the online stage the positioning is
done by �rst measuring the RSS vector at the current location using any WLAN
receiver. In the next step 2(b) an approximate location is calculated by comparing
the MAC addresses of the currently visible access points and the access points seen
in each cluster. Based on the coarse positioning only a few clusters are selected
as potential locations. To determine the exact position in step 2(c) a more precise
positioning method, based on a Naive Bayes classi�er, is used inside each of the
clusters selected in the previous step. An illustration of how the method works is in
Figure 3.2.
3.1. Indoor measurement data 9
(a) The starting situation, inwhich the building has been �n-gerprinted and we want to esti-mate the current position.
(b) Due to the preprocessingthe �ngerprints are divided into4 cluster.
(c) Using the coarse position-ing method two clusters are se-lected based on the surroundingaccess points.
(d) Using the Naive Bayes clas-si�er 3 �ngerprints are selected,that are closest to the currentposition.
(e) The positions of the �nger-prints are averaged to obtain anestimate for the current posi-tion.
(f) One of the estimates is cho-sen as the �nal position, basedon the quality given by theNaive Bayes classi�er.
Figure 3.2 An illustrated example of how the clustering based positioning method studied
in this thesis works.
3.1 Indoor measurement data
The �ngerprint data referred to in step 1(a) in Figure 3.1 was collected from several
di�erent types of building, with varying indoor geometrics. The measurements were
done using a tablet and therefore the positioning results correspond to what can be
done using any modern portable device that has a WLAN-receiver.
To more easily store the data the access points' media access control (MAC) ad-
dresses are used as access point identi�ers. For simplicity the MAC addresses were
converted into indexes from 1 to NAP, where NAP is the number of access points in a
3.2. Fine positioning method 10
(a) Building 1 (b) Building 4
Figure 3.3 The �ngerprints collected in two di�erent buildings, the one on the left is an
o�ce building while the one on the right is a mall.
Table 3.1 Summary of indoor measurement data
Building Building type Floors Access points Training points Test points
1 o�ce 4 309 1479 490
2 o�ce 4 238 505 366
3 o�ce 3 176 584 354
4 mall 6 468 1633 3503
5 mall 9 573 624 2611
building. For each building a set of �ngerprints was provided as training data, where
each �ngerprint consists of a three dimensional position and a list of the indexes and
associated RSS values of the heard access points at that position. The RSS mea-
surements have a precision of 1dBm and the �ngerprints were mapped into a grid
with a 1 meter step. The data does not contain information about the location of
the access points or the structure of the indoor environment.
For each building a separate set of user tracks were recorded for testing purposes,
where the RSS vectors and the position were stored. A short summary of the used
data and the type of the building where it was recorded can be seen in Table 3.1.
3.2 Fine positioning method
The �ne positioning method used in step 2(c) in Fig. 3.1 uses Naive Bayes clas-
si�cation based on Gaussian distributions [6]. As input the classi�er only needs a
vector of the received signal strengths at the current position and a set of collected
�ngerprints. This is a standalone method for estimating the current position, which
can be used without clustering. The reasoning for why clustering could improve
3.2. Fine positioning method 11
Algorithm 2 Positioning using RSS �ngerprinting and Naive Bayes classi�cationInput: A matrix RSSF of size NGP×NAP containing the measured �ngerprint map,where NGP is the number of grid points and NAP is the number of access points.An array RSSC of size NAP containing the received signal strengths at the currentposition. Pavg, the number of points to average.Output: A three dimensional coordinate Position containing the estimatedposition and Quality the quality of the estimated position.
1: Likelihoodtotal ← 0 . Cumulative likelihood of being in a point2:
3: for i← 1 to NGP do
4: for j ← 1 to NAP do
5: Likelihood← GaussianSimilarity(RSSF (i, j), RSSC(j))6: Likelihood← log(Likelihood)7: Likelihoodtotal(i)← Likelihoodtotal(i) + Likelihood8: end for
9: end for
10:
11: Points← the Pavg points with the highest total likelihood12:
13: Position← average position of the Points14: Quality ← average likelihood of the Points
the result and execution time is based on how the classi�er works, therefore �rst
understanding how it works is important in understanding the reason behind using
clustering.
The algorithm calculates a likelihood for each point in the grid formed by the coor-
dinates of the �ngerprint measurements. The likelihood of being in a speci�c grid
point based on only one access point is the Gaussian similarity between the RSS at
that point and the current location. If the access points is not visible at the cur-
rent location or at the grid point then the likelihood is 0. The Gaussian similarity
between two arbitrary values x and y is de�ned as
GaussianSimilarity(x, y) =
{1√2πσ2
exp(− (x−y)2
2σ2
), if x 6= 0 and y 6= 0
0, otherwise(3.1)
Equation ( 3.1) is relevant in determining the similarity between two RSS vectors
since it statistically models the signal attenuation and how the RSSs of an access
point are distributed around it [12].
For each grid point the logarithms of the individual likelihoods are summed to obtain
the total likelihood of being in that point. The logarithm is why in practice all zero
3.3. Clustering 12
values must be replaced by a very small value ε since the logarithm is not de�ned for
zero. Logarithmic probabilities are used for increased speed and stability, since they
transform multiplications in to additions and avoid the use of very small �oating
points.
To determine the current position using the previously calculated probabilities a
number Pavg ≥ 1 of grid points with the highest likelihood are selected and their
positions averaged. The quality of the position estimate is the average of the likeli-
hoods of the Pavg points. The quality of the estimated position is important, since
it can be compared to that of other position estimates to determine which is better.
3.3 Clustering
Large indoor areas such as public buildings also require a big number of �ngerprints.
Segmenting the �ngerprint data using clustering can signi�cantly increase speed and
reduce the amount of data that needs to be transmitted. The segmentation is made
possible by how the Naive Bayes classi�er described in Section 3.2 works. To produce
the best possible result the classi�er would only require the Pavg �ngerprints closest
to the current location. Fingerprints that are far away do not contribute to the
result and are mostly ignored due to having a low likelihood. Ideally clustering
does not reduce the accuracy of the positioning since no necessary information was
removed. Clustering could also improve the result since it removes points that have
a high probability, but are spatially unrelated to other points with a high likelihood.
3.4 Coarse positioning
One of the challenges of using clustering is determining in which cluster the current
position is located. A broad approach is to use rank based similarity, which is done
by comparing the indexes of the currently visible access points with those of the
access points visible in each cluster. The rank of a cluster is therefore the number
of common access points. Since neighbouring clusters can have similar ranks and
the rank based method is not very precise, the Naive Bayes method can be applied
individually to a subset of clusters that have the highest rank. The number of
clusters chosen to be used will be denoted by Kfine. The �nal position is then
chosen by comparing the quality calculated by the classi�er for each positioning
done in this subset.
3.5. Time complexity 13
3.5 Time complexity
An analysing of the time complexity of the positioning method using clustering can
be useful in predicting the outcome as well as interpreting the results. From the
description of the Bayes classi�er in Algorithm 2 it can be seen that the complexity
is O(NGP ×NAP +NGP ×Pavg) where the �rst term is calculating the likelihood and
the second term is �nding the Pavg points with the highest likelihood. Since Pavg is
very small compared to both NGP and NAP a good approximation is O(NGP×NAP ).
When estimating the complexity of the clustering based method the time required
to cluster the points is not considered since it is part of the preprocessing. To
calculate the complexity the assumption must be made that the clustering will divide
the points into K clusters that all have approximately the same size. Because the
positioning is done in a subset ofKfine out of theK clusters the resulting complexity
is O(NGP × NAP × Kfine
K). This is clearly more e�cient that positioning using all
the points since KFine should be chosen as less than K in which caseKfine
K< 1.
14
4. RESULTS AND ANALYSIS
The quality of the positioning was judged based on the Euclidean distance between
the current and the calculated position as well as the probability that the calculated
position is on the correct �oor. These measures will be referred to as error and �oor
detection accuracy respectively. The �oor detection accuracy is a useful measure
since correct detection of the �oor alone can be useful in certain applications [13] and
also because knowing the correct �oor can be very important in indoor navigation.
A brief summary of the results obtained by only using the Naive Bayes classi�er
can be seen in Table 4.1. The e�ect of clustering the �ngerprints can be observed
by using the previously mentioned results as a baseline to see what happens with
and without clustering. In order for the results to be comparable the parameters
of the classi�er were �xed throughout all the tests. The parameters were chosen as
σ = 5 and Pavg = 3, where σ is the shading factor and Pavg is the the number of
points that are averaged to obtain the �nal position. The variance caused by the
non-deterministic nature of the K-means algorithm, can be reduced by averaging
over multiple test runs of the program with the same parameters. It is worth noting
that the accuracy is not very high according to modern standards due to the fact
that no type of �ltering or secondary positioning method was used.
The parameters of the clustering algorithm that should be analysed are the number
of clusters K as well as the size Kfine of the subset of clusters in which the �ne
positioning is done. Because iterating through every possible combination of K
and Kfine takes too long the analysis is done by �xing one of the parameters and
iterating through the other. By �xing one of the parameters to a few values and
iterating through the other, it is possible to search for a pattern.
4.1 Clusters used in positioning
Ideally the value of Kfine should be chosen small enough to allow for a decrease in
the execution time, while maintaining the best possible positioning accuracy. Since
the optimal value ofK is undetermined multiple test were made, where all the values
of Kfine were iterated for a certain value of K. The e�ect of Kfine on the error, �oor
4.1. Clusters used in positioning 15
(a) Increase in error (Building 1). (b) Increase in error (Building 5).
(c) Increase in �oor detection accuracy (Build-ing 1).
(d) Increase in �oor detection accuracy (Build-ing 5).
(e) Di�erence in execution time (Building 1). (f) Di�erence in execution time (Building 5).
Figure 4.1 The quality of the positioning depending on the value of Kfine for Buildings
1 and 5. It can be seen that independent of the number of clusters the plots converge to
the same values. A few exceptions are visible in (b) where high numbers of clusters cause
a signi�cant increase in error. Also in (d) for K = 6, due to the number of clusters being
smaller than the number of �oors the increase in �oor detection accuracy is smaller.
detection accuracy and execution time is plotted for two of the buildings in Figure
4.1. From the plots it can be observed that after a certain point increasing Kfine
has no e�ect, except an increase in execution time. This point also seems to be the
same when considering either the error or the �oor detection accuracy, since they
seem to be inversely proportional. Therefore, analysing either one should yield the
same result.
4.1. Clusters used in positioning 16
(a) 3% margin of error. For small margins of error the optimal valueof Kfine seems to be highly dependant on the structure of the buildingand how it was �ngerprinted.
(b) 10% margin of error. Higher margins of error cause the optimalvalues of Kfine for each building to converge towards more similarvalues.
Figure 4.2 Optimal value of Kfine allowing for a certain margin of error for all the
building. The plots are created based on the results in Figure 4.1 as well as similar ones
for the other building that were not displayed.
The points in Figure 4.2 represent the lowest values of Kfine for which the error
is smaller than the minimum error achieved for that value of K plus an allowed
margin of error. For example, in 4.1(b) the smallest increase in error achievable for
Kfine = 32 is 8%. If the allowed margin of error is for example 2% the critical point
will be the �rst value of Kfine for which the error is smaller than 10%.
In many cases the minimum error will be very close to the error when not using
clusters, since when Kfine = 100% it means that all the points will be used in the
4.2. Number of clusters 17
�ne positioning method. For example, this assumption is not true in 4.1(b) where
having too many clusters can signi�cantly impact the precision. These di�erences
in accuracy are caused by fact that even though all points are considered, a group
of points can only be averaged if they all belong to the same cluster.
The margins of error used to plot Figures 4.2(a) and 4.2(b) were chosen arbitrarily
only as examples. Depending on the application and the type of position �ltering
used other values might be more appropriate. From Figure 4.2(a) it is clear that
the geometry of the building and the way the �ngerprints are distributed can cause
signi�cant di�erences in the number of clusters needed to avoid a large increase in
error.
The optimal value of Kfine appears to follow an inverse exponential curve in relation
to the total number of clusters. For higher margins of error the datasets quickly
converge to approximately the same values, as can be seen when comparing the
distance between the plots in the two Figures 4.2(a) and 4.2(b). Lower margins of
error have more noise due to the quantization of the data causing larger jumps in
the values.
Based on the average of the plots a value of Kfine can be chosen so that on average
it respects the chosen margin of error. For the purpose of further calculations the
margin of error chosen is 3% as in Figure 4.2(a). The average of the plots yields the
following relation between K and M
Kfine = 130×K−0.37 − 8.5. (4.1)
This leads to the values Kfine = {92%, 78%, 69%, 63%, 58%, . . .}, for K > 1. These
values are approximations with only a statistical guarantee that on average the
increase in error will be around 3%. The percentage values will also have to be
rounded up, since due to the discreet nature of the number of clusters only certain
percentages are possible.
4.2 Number of clusters
By using the method discussed in the previous section for choosingKfine, choosingK
will be much easier. The relation between the number of clusters and the error, �oor
detection accuracy and execution time, is presented in Figure 4.3, when choosing
Kfine with a 3% margin of error as presented in Equation (4.1). Due to the method
performing di�erently for each building it is easier to judge the results based on the
averages rather than for each building individually.
4.2. Number of clusters 18
(a) Increase of the mean absolute error. The increasein error reaches a minimum at around K = 10 afterwhich is slowly increases. Except for Building 3 the otherbuildings have very similar increases in error.
(b) Increase in the probability to correctly determinethe correct �oor. The �oor detection accuracy seemsto always increase and after it reaches it's maximum ataround K = 10 it remains constant.
(c) Decrease in the execution time. For small numbers ofclusters the overhead is signi�cant, causing the improve-ment to be marginal. The most signi�cant improvementis obtained on average at around 20 < K < 30.
Figure 4.3 The quality of the positioning depending on the number of clusters K, with
Kfine chosen with a 3% margin of error according to Equation (4.1).
4.2. Number of clusters 19
From Figure 4.3 it can be seen that the method is imprecise for very small numbers
of clusters. After the average error reaches its minimum at around K = 12 it begins
slowly increasing. The opposite is true for the �oor detection accuracy which reaches
its maximum at K = 12 after which it start slowly decreasing. The time complexity
analysis in Section 3.5 explains why the time decreases exponentially. Since Kfine
is exponentially decreasing in function of K, the ratioKfine
Kis also exponentially
decreasing. It is notable that on average there is always an improvement in execution
time when using clusters.
The unreliability of small numbers of cluster is probably due to the fact that deter-
mining the correct cluster to use in the �ne positioning method is unreliable. The
coarse positioning method is especially unreliable on the border between the clusters
where they all contain �ngerprints from the same access points. For large numbers
of clusters the cause of the increase in error could be due to several factors, including
the restriction that the positions of �ngerprints can only be averaged if the are in
the same cluster and the possibility of the coarse positioning not selecting the best
cluster.
Since the execution time is exponentially decreasing there is no reason to choose a
very high value of K since there are no longer any bene�ts. A reasonable choice is
K = 12 since it has the smallest error and the highest increase in �oor detection ac-
curacy, while maintaining a signi�cant improvement in execution time. The precise
results for each building when K = 12 and Kfine = 7 are presented in Table 4.1.
Table 4.1 A comparison of the numeric results obtained when using only the Naive
Bayes classi�er versus when using the Naive Bayes classi�er in conjunction with K-means
clustering.
Error (m) Floor detection (%) Execution time (ms)
NaiveClustering
NaiveClustering
NaiveClustering
Building Bayes Bayes Bayes
1 4.9 5.0 89 95 13 6
2 4.5 4.7 93 98 4 2
3 9.0 10.6 85 89 6 3
4 11.3 11.6 63 81 17 11
5 6.6 6.9 63 73 8 5
Average 7.3 7.8 79 87 10 5
From Table 4.1 it can be seen that on average the error increased by approximately
7%. Half of the increase in error is cause by the clustering algorithms poor perfor-
mance for Building 3, which has a large surface and a relatively small number of
�ngerprints. If Building 3 were removed from consideration, the increase in error
would be around 3%, which is equal to the chosen margin of error. This further
4.2. Number of clusters 20
emphasizes how much the results are in�uenced by how the �ngerprinting was done
and by the geometry of the building. An increase in �oor detection accuracy was
noticeable for all building, with the average increase being around 10% or 8 %-
points. The increase was much higher for Buildings 4 and 5, probably due to the
fact that they have more �oors than the other buildings. All the execution times
were approximately halved, with the average execution time dropping to 50% of the
original. Even though the errors seem large it is important to remember that the
results are un�ltered, meaning that all the positions were calculated without prior
knowledge about the previous positions.
21
5. CONCLUSIONS
Segmenting the �ngerprints using K-means clustering proved to have some bene�ts.
As a result the Naive Bayes-based positioning algorithm needed to estimate the
position only in a part of the clusters to maintain accuracy. The combination of the
two algorithms caused an average increase in error of 7% while on average increasing
the accuracy with which the correct �oor was detected by 10%. It is worth noting
that half of the increase in error was caused by one out of the 5 building in which
the test were performed. The clustering also resulted in the execution time being
reduced to half, due to only using half of the provided �ngerprints.
Overall the obtained results are promising, since they show the data can be seg-
mented without signi�cantly increasing the error and also potentially adding some
bene�ts. The increase in �oor detection accuracy seems to always be present since
the clusters naturally tend to form on separate �oors, thus forcing the decision to
either �oor. The �oor detection accuracy is important since knowledge of the correct
�oor alone can be used in a number of useful applications.
The ability to segment the data can be useful in multiple situations, especially in
storing and transmitting large �ngerprint maps. In a situation where every building
would be �ngerprinted the amount of data for a certain area would be huge. Clus-
tering allows for an intuitive way of dividing the �ngerprints into groups, that can be
transmitted individually. Indoor positioning devices would not require large storing
capacity or data transfer since clustering provides a logical way of determining which
segments of data should be transmitted and which segments have become obsolete
and can be deleted, without having to analyse individual �ngerprints. Outdoor
positioning methods, such as GPS, do not face similar problems since no previous
knowledge about the current area is required for determining the position.
Clustering also provides a means of balancing time e�ciency and accuracy. It is
conceivable that even though for other RSS positioning methods clustering might
not provide an increase in accuracy, the increase in time e�ciency would still exist.
Even though at this point no RSS positioning method is accurate enough to warrant
reducing its accuracy to gain speed, in the future this might be a possibility.
Chapter 5. Conclusions 22
All the test done in this thesis were to calculate the initial position, meaning that
no �ltering or prediction was used based on previous movements. Adding �ltering
to continuous positioning signi�cantly increases accuracy [14] and would be the next
step in improving the discussed positioning method. Continuous positioning would
also allow for improving the method for selecting the Kfine clusters. Selection of the
clusters surrounding the current position could be done also based on the cluster
centroids position in relation to the previous position and not only based on the
similarity of the seen access points.
23
BIBLIOGRAPHY
[1] Y. Gu, A. Lo and I. Niemegeers. �A Survey of Indoor Positioning Systems
for Wireless Personal Networks�. IEEE Communications Surveys & Tutorials,
March 2009, 11(1), pp. 13�32.
[2] H. Koyuncu and S. H. Yang. �A Survey of Indoor Positioning and Object Locat-
ing Systems�. IJCSNS International Journal of Computer Science and Network
Security, May 2010, 10(5), pp. 121�128.
[3] C.N. Huang, C.Y. Chiang, J.S. Chang, Y.C. Chou, Y.X. Hong, S.J. Hsu, W.C.
Chu and C.T. Chan. �Location-aware fall detection system for medical care
quality improvement�. IEEE Third International conference on Multimedia and
Ubiquitos Engineering, June 2009, Qingdao, pp. 477�480.
[4] G. Dedes and A.G. Dempster. �Indoor gps positioning�. IEEE Semiannual Ve-
hicular Technology Conference, 2005.
[5] L. Mengual, O. Marban and S. Eibe. �Clustering-based location in wireless
networks�. Expert Systems with applications, 2010, 37(9).
[6] H. Nurminen el al. �Statistical path loss parameter estimation and position-
ing using RSS measurements�. Ubiquitous Positioning, Indoor Navigation, and
Location Based Service (UPINLBS), October 2012, Helsinki, pp. 1�8.
[7] Z. Tian, X. Tang, M. Zhou and Z. Tan. �Fingerprint indoor positioning algo-
rithm basedon a�nity propagation clustering�. EURASIP Journal on Wireless
Communications and Networking, 2013.
[8] N. Swangmuang. �A Location Fingerprint Framework Towards E�cient Wire-
less Indoor Positioning Systems�. Ph.D. thesis, University of Pittsburgh, 2008,
pp. 76�93.
[9] M.E. Celebi, H.A. Kingravi, P.A. Vela. �A comparative study of e�cient ini-
tialization methods for the k-means clustering algorithm�. Expert Systems with
Applications, January 2013, 40(1), pp. 200�210.
[10] Tampere University of Technology.WLAN indoor measurements. [Online] Avail-
able from: http://www.cs.tut.fi/tlt/pos/Measurements.htm [Accessed
Feb 2015].
[11] R.O. Duda, P.E. Hart and D.G. Stork. �Pattern Classi�cation�, 2nd edition,
Wiley, 2012.
24
[12] A. Hatami and K. Pahlavan. �A comparative performance evaluation of RSS-
based positioning algorithms used in WLAN networks�. IEE Wireless Commu-
nications and Networking Conference, March 2005, pp. 2331�2337.
[13] F. Alsehly, T. Arslan and Z. Sevak. �Indoor positioning with �oor determination
in multi story buildings�. IEEE International Conference on Indoor Positioning
and Indoor Navigation(IPIN), September 2011, pp. 1�7.
[14] W. Chai, C. Chen, E. Edwan, J. Zhang, and O. Lo�eld. �INS/Wi-Fi based
indoor navigation using adaptive Kalman �ltering and vehicle constraints".
Proceedings of 9th Workshop on Positioning, Navigation and Communication
(WPNC), 2012.