andrei cramariuc indoor positioning using wireless … cramariuc.pdf · of outdoor positioning...

ANDREI CRAMARIUC

INDOOR POSITIONING USING WIRELESS LOCAL AREA

NETWORK FINGERPRINTING

Bachelor of Science Thesis

Examiner: University Lecturer

Heikki Huttunen

Supervisor: Associate Professor

Simona Lohan

I

ABSTRACT

ANDREI CRAMARIUC: Indoor Positioning Using Wireless Local Area NetworkFingerprintingTampere University of Technology

Bachelor of Science Thesis, 22 pages

May 2015

Bachelor's Degree Programme in Electrical Engineering

Major: Signal Processing

Examiner: University Lecturer Heikki Huttunen

Supervisor: Associate Professor Simona Lohan

Keywords: indoor positioning, WLAN, RSS �ngerprinting, clustering

Following the success of outdoor positioning systems the focus has now shifted to

developing equally precise positioning methods for indoor environments. Indoor po-

sitioning can be achieved based on �ngerprinting the received signal strength (RSS)

of the already widely available wireless local area network (WLAN). Fingerprinting

means that the target area is mapped beforehand by measuring the RSS in numerous

positions. In this thesis we use a RSS positioning method based on a Naive Bayes

classi�er to study the e�ect of clustering the �ngerprints. The results of the simu-

lations were obtained by using K-means to cluster real data for which the ground

truth was known. On average, the positioning error increased by 7 percent while on

the other hand the accuracy by which the correct �oor was detected increased by

10 percent. At the same time the localization process was almost two times faster,

because half of the �ngerprints were not considered in the process. Clustering the

�ngerprints o�ers a way to balance precision, time e�ciency and the amount of re-

quired data. The studied method is generic and could be also used in combination

with other positioning methods.

II

TIIVISTELMÄ

ANDREI CRAMARIUC: Sisätilapaikannus käyttäen langattoman lähiverkon sor-menjäljentämistäTampereen teknillinen yliopisto

Diplomityö, 22 sivua

Toukokuu 2015

Sähkötekniikan koulutusohjelma

Pääaine: Signaalinkäsittely

Tarkastaja: Yliopistolehtori Heikki Huttunen

Ohjaaja: Associate Pro�essor Simona Lohan

Avainsanat: sisätilapaikannus, WLAN, RSS sormenjäljentäminen, klusterointi

Ulkotilapaikannuksen systeeminen menestyksen seuruksena on alettu etsimään vas-

taavan laatuisia paikannusmenetelmiä sisätiloja varten. Käyttäen jo olemassa olevaa

langatonta verkkoa (WLAN), sisätilojen paikannus voidaan tehdä sormenjäljentä-

mällä tulevan WLAN signaalin vahvuus (RSS). Sormenjäljentämisellä tarkoitetaan

halutun alueen kartoitusta, mittaamalla WLAN signaalin vahvuus useassa eri pis-

teessä. Tässä työssä käytetään Naive Bayes luokitinta ja tulevan signaalin vahvuu-

den sormenjäljentämistä paikannukseen. Tarkoituksena on tutkia miten sormenjäl-

kien klusterointi vaikuttaa tulokseen. Työssä tehtyjen simulaatioden tulokset saat-

tiin käyttämällä K-means algoritmia klusteroimaan todellista dataa, jolle oli mitattu

myös oikea sijainti. Keskimäärin paikannusvirhe kasvoi 7 prosenttia ja samalla tark-

kuus, jolla oikea kerros löydettiin, kasvoi 10 prosenttia. Paikannukseen kuluva aika

puoliintui, sillä klusteroinnin avulla puolitettiin tarvittavien sormenjälkien määrä.

Sormenjälkien klusterointi tarjoaa tavan tasapainottaa tarkkuutta, tehokkuutta ja

tarvittavan datan määrää. Tutkittu menetelmä on yleiskäyttöinen ja sitä olisi mah-

dollista soveltaa myös muihin sisätilojen paikannusmenetelmiin.

III

PREFACE

This thesis is based on the work and measurements done at the Laboratory of

Electronics and Communications Engineering, Department of Signal Processing, at

Tampere University of Technology. I am grateful for the guidance, comments and

feedback provided during the writing of this thesis by my examiner Heikki Huttunen.

I want to thank my supervisor, Simona Lohan, for her assistance, patience and help

in choosing and understanding such an interesting topic. I also want thank to my

opponent Eetu Kuusisto and the other participants to the Bachelor's thesis seminar

for the interesting discussions and constructive remarks.

May 17, 2015

Andrei Cramariuc

IV

TABLE OF CONTENTS

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Naive Bayes classi�er . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Indoor measurement data . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Fine positioning method . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Coarse positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.5 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4. Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 Clusters used in positioning . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1

LIST OF ABBREVIATIONS AND SYMBOLS

WLAN Wireless Local Area Network

ToF Time of Flight

AoA Angle of Arrival

RSS Received Signal Strength

AP Access Point

MAC Media Access Control; an identi�er for electronic devices

NAP Number of access points

NGP Number of grid points, which is also the number of �ngerprints

Pavg Number of points to average in the Naive Bayes classi�er to obtain

the position estimate

K Number of clusters

Kfine Number of clusters in which to use the �ne positioning method

2

1. INTRODUCTION

People spend most of their time indoors in o�ces, shopping malls, restaurants, hos-

pitals, metros, museums, etc. It is therefore not surprising that following the success

of outdoor positioning technologies in a wide range of applications, the focus has

now shifted to developing suitable technologies and methods for indoor position-

ing and navigation. Advances in indoor positioning performance can facilitate a

large number of applications �elds such as medical care, where the monitoring of

a patients position can be very important [3]. Other possible applications include

indoor navigation for �re�ghters, museums or large public buildings [2] as well as net-

work optimisation and load balancing [1]. Most modern indoor positioning systems

use Wireless Local Area Networks (WLAN), Bluetooth, Ultra-wideband, Radio-

frequency identi�cation, infrared, vision based or ultrasound technologies [1][2].

Whereas the popular satellite and cellular network-based positioning methods pro-

vide su�cient accuracy outdoors, they are ine�cient in the shielded and complex

indoor environment [4]. The challenges of indoor positioning include multipath sig-

nal propagation, non line-of-sight conditions and high attenuation and scattering of

the signal [4]. The lack of a widespread indoor positioning infrastructure is due to

the fact that the currently available technologies either lack precision or they are

too expensive and require speci�c locally installed infrastructure [1][2]. The increas-

ing density and availability of Wireless Local Area Networks (WLAN) has sparked

interest in developing precise WLAN-based solutions for indoor positioning.

The three parameters of an incoming signal that can be used to determine the

receiver's relative position to the source are its time of �ight (ToF), angle of arrival

(AoA), or the received signal strength (RSS) [5]. In this work we will focus on RSS-

based methods, which compared to the ones that rely on ToF or AoA, can be easily

implemented using existing protocols and hardware, available in all modern mobile

phones. RSS positioning consists of two phases, an o�ine and an online phase. In

the o�ine phase RSS �ngerprints are collected from several points throughout the

target area, where a �ngerprint consists of a three dimensional position and a vector

of RSS values from the visible access points measured at that position. In the online

phase positioning is done using the gathered �ngerprint map and the RSS vector

Chapter 1. Introduction 3

at the current location. Approaches to positioning using RSS �ngerprinting can be

divided into three main categories: deterministic, probabilistic or machine learning

based [5].

This thesis extends a probabilistic approach based on a Naive Bayes classi�er to

calculate a probability for each �ngerprint that corresponds to how likely it is that

the current position is at that location [6]. Afterwards, a group of points with the

highest probability are chosen and their positions are averaged to obtain an estimate

for the current location.

The aim of this thesis is to study the inclusion of an additional o�ine phase, of

preprocessing, where the RSS �ngerprints are clustered. A spatially relevant seg-

mentation of the �ngerprints would allow the Naive Bayes classi�er to use less points

for positioning, thus increasing e�ciency, while maintaining the same accuracy. The

clusters could further be used to increase accuracy by discarding points that have

high probability but are not spatially close the the other high probability points.

This thesis focuses on the use of the standard K-means clustering algorithm, with

varying parameters. Previous works by various authors have reached positive re-

sults employing similar methods, based on a di�erent combinations of clustering

and positioning algorithms [5][7][8].

Chapter 1 represents the introductory part of the thesis. In Chapter 2 the basic

theoretical concepts used in subsequent chapters will be brie�y explained. Chapter

3 focuses on describing the implementation of the discussed positioning method.

In Chapter 4 the obtained results are presented and analysed. The �nal chapter

contains a brief summary of the work as well as conclusions that can be drawn from

the results.

4

2. THEORY

This chapter is a review of the basic theoretical concepts of Naive Bayes classi�cation

and K-means clustering. These are common signal processing methods that were

adapted in this work for the purpose of indoor positioning.

2.1 Naive Bayes classi�er

The Naive Bayes algorithm is a classi�cation algorithm that classi�es data based on

a set of attributes X = (x1, · · · , xn) for which the target classes (C1, C2, . . . , Cn) are

known. The assumption that is made to simplify the system is that all the attributes

are conditionally independent of one another. Due to this the probability that X

belongs to a class C can be written as

P (C|X) ∝ P (C)n∏i=1

P (xi|C) (2.1)

The left part of the equation is the posterior probability, which is the probability of

class C given the attributes X. Using the Bayes' theorem the conditional probability

P (C|X) can be decomposed. In equation (2.1) P (C) is the prior probability of C,

which is the probability of observing C regardless of any other information. The last

term P (xi|C) is the likelihood of the feature xi occurring given the class C. These

probabilities can be added separately due to the assumption of independence. The

assumption therefore allows for the classi�cation to be done independently for each

feature and the probabilities combined to obtain the total probability for each class.

[11, pp. 20�23]

For example, let's assume �ve kids have the features described in Table 2.1. Using

2.1. Naive Bayes classi�er 5

Table 2.1

Clothing (x1) Hair color (x2) Gender (Class)

Pants Black Boy

Pants Blonde Boy

Skirt Black Girl

Pants Blonde Girl

Skirt Red Girl

Naive Bayes to classify a kid that is blonde and wears trousers the probabilities are

P (Boy|Pants, Blonde) ∝ P (Boy)P (Boy|Pants)P (Boy|Blonde) =

=2

5× 2

2× 1

2=

1

5

P (Girl|Pants, Blonde) ∝ P (Girl)P (Girl|Pants)P (Girl|Blonde) =

=3

5× 1

3× 1

3=

1

15

(2.2)

Therefore, it is more probable that it is a boy rather than a girl. A more complex

example is in Figure 2.1 where Naive Bayes is used to classify �owers based on

continuous features.

Even though the assumption of independence might not always be true, it greatly

simpli�es the system and in many cases leads to accurate results. The main advan-

tages of the Naive Bayes classi�er are its speed and that its insensitivity to irrelevant

features. [11, pp. 20�23]

(a) Training data (b) Calculated decision boundaries

Figure 2.1 The �gure on the left shows the sepal width and length measurements for

three categories of �owers Setosa, Versicolor and Virginica. The resulting classi�cation

boundaries are visible in the �gure on the right.

2.2. K-means clustering 6

2.2 K-means clustering

K-means is a popular clustering algorithm that assigns each point x = (x1,x2, · · · ,xN)in a dataset to cluster c = (c1, c2, · · · , cK). The total number of clusters K is �xed

and has to be determined beforehand. The clustering is done by minimizing the

within-cluster sum of squares, in other words the objective is to �nd c minimizing

argminc

K∑i=1

∑x∈ci

||x− µi||2 (2.3)

where µi is the mean of the points in cluster ci. The obstacle is that �nding a

solution to satisfy Eq. (2.3) is a NP-hard problem. However, there exist di�erent

heuristic methods that converge quickly to a local optimum. [11, pp. 526�528]

Algorithm 1 Lloyd's algorithm

1. Initialize the cluster centres

2. Assign each data point to the nearest cluster

3. Recalculate the center of the cluster as the mean of all the data points belong-ing to that cluster

4. Repeat steps 2.�3. until convergence

(a) Initialization of the clustersat random

(b) After the �rst iteration (c) Final result after 5 itera-tions

Figure 2.2 An example of how the iterative K-means algorithm works, where the centroid

are marked by black crosses. The existence of three clusters is assumed before running the

algorithm.

The most commonly used method is referred to as Lloyd's algorithm, described in

Algorithm 1 and brie�y illustrated in Figure 2.2. Lloyd's algorithm uses iterative

2.2. K-means clustering 7

re�nement to �nd the solution, but before the algorithm can be applied the number

of clusters must be determined and the centroids must be initialize [11, pp. 526�

528]. The original initialization of the clusters can be done at random, but this

can cause the algorithm to not converge or to converge to a local minimum. Better

methods exist that cause the algorithm to converge more quickly and to a better

result . A few of the commonly used methods for cluster initialization are Forgy's,

Jacey's, MacQueen's and k-means++ [9]. Another parameter that a�ects the result

of the clustering is the choice of the distance measure between the clustered points

[9].

8

3. IMPLEMENTATION

The positioning method implemented in this thesis can be divided into two main

stages, namely an o�ine and an online stage, represented in Figure 3.1. In the o�ine

stage the required data is gathered and preprocessed and the actual positioning is

done in the online stage.

Figure 3.1 A diagram of the positioning method described in this thesis.

The �rst step 1(a) in the o�ine stage is to map the target area, by collecting �nger-

prints in multiple locations. A �ngerprint consists of a three dimensional position

and a vector containing the media access control (MAC) address and received signal

strength (RSS) of each visible access points at that positions. Afterwards in 1(b) the

gathered data points are clustered together to increase performance. The reasoning

for this will be better detailed in Section 3.3. In the online stage the positioning is

done by �rst measuring the RSS vector at the current location using any WLAN

receiver. In the next step 2(b) an approximate location is calculated by comparing

the MAC addresses of the currently visible access points and the access points seen

in each cluster. Based on the coarse positioning only a few clusters are selected

as potential locations. To determine the exact position in step 2(c) a more precise

positioning method, based on a Naive Bayes classi�er, is used inside each of the

clusters selected in the previous step. An illustration of how the method works is in

Figure 3.2.

3.1. Indoor measurement data 9

(a) The starting situation, inwhich the building has been �n-gerprinted and we want to esti-mate the current position.

(b) Due to the preprocessingthe �ngerprints are divided into4 cluster.

(c) Using the coarse position-ing method two clusters are se-lected based on the surroundingaccess points.

(d) Using the Naive Bayes clas-si�er 3 �ngerprints are selected,that are closest to the currentposition.

(e) The positions of the �nger-prints are averaged to obtain anestimate for the current posi-tion.

(f) One of the estimates is cho-sen as the �nal position, basedon the quality given by theNaive Bayes classi�er.

Figure 3.2 An illustrated example of how the clustering based positioning method studied

in this thesis works.

3.1 Indoor measurement data

The �ngerprint data referred to in step 1(a) in Figure 3.1 was collected from several

di�erent types of building, with varying indoor geometrics. The measurements were

done using a tablet and therefore the positioning results correspond to what can be

done using any modern portable device that has a WLAN-receiver.

To more easily store the data the access points' media access control (MAC) ad-

dresses are used as access point identi�ers. For simplicity the MAC addresses were

converted into indexes from 1 to NAP, where NAP is the number of access points in a

3.2. Fine positioning method 10

(a) Building 1 (b) Building 4

Figure 3.3 The �ngerprints collected in two di�erent buildings, the one on the left is an

o�ce building while the one on the right is a mall.

Table 3.1 Summary of indoor measurement data

Building Building type Floors Access points Training points Test points

1 o�ce 4 309 1479 490

2 o�ce 4 238 505 366

3 o�ce 3 176 584 354

4 mall 6 468 1633 3503

5 mall 9 573 624 2611

building. For each building a set of �ngerprints was provided as training data, where

each �ngerprint consists of a three dimensional position and a list of the indexes and

associated RSS values of the heard access points at that position. The RSS mea-

surements have a precision of 1dBm and the �ngerprints were mapped into a grid

with a 1 meter step. The data does not contain information about the location of

the access points or the structure of the indoor environment.

For each building a separate set of user tracks were recorded for testing purposes,

where the RSS vectors and the position were stored. A short summary of the used

data and the type of the building where it was recorded can be seen in Table 3.1.

3.2 Fine positioning method

The �ne positioning method used in step 2(c) in Fig. 3.1 uses Naive Bayes clas-

si�cation based on Gaussian distributions [6]. As input the classi�er only needs a

vector of the received signal strengths at the current position and a set of collected

�ngerprints. This is a standalone method for estimating the current position, which

can be used without clustering. The reasoning for why clustering could improve

3.2. Fine positioning method 11

Algorithm 2 Positioning using RSS �ngerprinting and Naive Bayes classi�cationInput: A matrix RSSF of size NGP×NAP containing the measured �ngerprint map,where NGP is the number of grid points and NAP is the number of access points.An array RSSC of size NAP containing the received signal strengths at the currentposition. Pavg, the number of points to average.Output: A three dimensional coordinate Position containing the estimatedposition and Quality the quality of the estimated position.

1: Likelihoodtotal ← 0 . Cumulative likelihood of being in a point2:

3: for i← 1 to NGP do

4: for j ← 1 to NAP do

5: Likelihood← GaussianSimilarity(RSSF (i, j), RSSC(j))6: Likelihood← log(Likelihood)7: Likelihoodtotal(i)← Likelihoodtotal(i) + Likelihood8: end for

9: end for

10:

11: Points← the Pavg points with the highest total likelihood12:

13: Position← average position of the Points14: Quality ← average likelihood of the Points

the result and execution time is based on how the classi�er works, therefore �rst

understanding how it works is important in understanding the reason behind using

clustering.

The algorithm calculates a likelihood for each point in the grid formed by the coor-

dinates of the �ngerprint measurements. The likelihood of being in a speci�c grid

point based on only one access point is the Gaussian similarity between the RSS at

that point and the current location. If the access points is not visible at the cur-

rent location or at the grid point then the likelihood is 0. The Gaussian similarity

between two arbitrary values x and y is de�ned as

GaussianSimilarity(x, y) =

{1√2πσ2

exp(− (x−y)2

2σ2

), if x 6= 0 and y 6= 0

0, otherwise(3.1)

Equation ( 3.1) is relevant in determining the similarity between two RSS vectors

since it statistically models the signal attenuation and how the RSSs of an access

point are distributed around it [12].

For each grid point the logarithms of the individual likelihoods are summed to obtain

the total likelihood of being in that point. The logarithm is why in practice all zero

3.3. Clustering 12

values must be replaced by a very small value ε since the logarithm is not de�ned for

zero. Logarithmic probabilities are used for increased speed and stability, since they

transform multiplications in to additions and avoid the use of very small �oating

points.

To determine the current position using the previously calculated probabilities a

number Pavg ≥ 1 of grid points with the highest likelihood are selected and their

positions averaged. The quality of the position estimate is the average of the likeli-

hoods of the Pavg points. The quality of the estimated position is important, since

it can be compared to that of other position estimates to determine which is better.

3.3 Clustering

Large indoor areas such as public buildings also require a big number of �ngerprints.

Segmenting the �ngerprint data using clustering can signi�cantly increase speed and

reduce the amount of data that needs to be transmitted. The segmentation is made

possible by how the Naive Bayes classi�er described in Section 3.2 works. To produce

the best possible result the classi�er would only require the Pavg �ngerprints closest

to the current location. Fingerprints that are far away do not contribute to the

result and are mostly ignored due to having a low likelihood. Ideally clustering

does not reduce the accuracy of the positioning since no necessary information was

removed. Clustering could also improve the result since it removes points that have

a high probability, but are spatially unrelated to other points with a high likelihood.

3.4 Coarse positioning

One of the challenges of using clustering is determining in which cluster the current

position is located. A broad approach is to use rank based similarity, which is done

by comparing the indexes of the currently visible access points with those of the

access points visible in each cluster. The rank of a cluster is therefore the number

of common access points. Since neighbouring clusters can have similar ranks and

the rank based method is not very precise, the Naive Bayes method can be applied

individually to a subset of clusters that have the highest rank. The number of

clusters chosen to be used will be denoted by Kfine. The �nal position is then

chosen by comparing the quality calculated by the classi�er for each positioning

done in this subset.

3.5. Time complexity 13

3.5 Time complexity

An analysing of the time complexity of the positioning method using clustering can

be useful in predicting the outcome as well as interpreting the results. From the

description of the Bayes classi�er in Algorithm 2 it can be seen that the complexity

is O(NGP ×NAP +NGP ×Pavg) where the �rst term is calculating the likelihood and

the second term is �nding the Pavg points with the highest likelihood. Since Pavg is

very small compared to both NGP and NAP a good approximation is O(NGP×NAP ).

When estimating the complexity of the clustering based method the time required

to cluster the points is not considered since it is part of the preprocessing. To

calculate the complexity the assumption must be made that the clustering will divide

the points into K clusters that all have approximately the same size. Because the

positioning is done in a subset ofKfine out of theK clusters the resulting complexity

is O(NGP × NAP × Kfine

K). This is clearly more e�cient that positioning using all

the points since KFine should be chosen as less than K in which caseKfine

K< 1.

14

4. RESULTS AND ANALYSIS

The quality of the positioning was judged based on the Euclidean distance between

the current and the calculated position as well as the probability that the calculated

position is on the correct �oor. These measures will be referred to as error and �oor

detection accuracy respectively. The �oor detection accuracy is a useful measure

since correct detection of the �oor alone can be useful in certain applications [13] and

also because knowing the correct �oor can be very important in indoor navigation.

A brief summary of the results obtained by only using the Naive Bayes classi�er

can be seen in Table 4.1. The e�ect of clustering the �ngerprints can be observed

by using the previously mentioned results as a baseline to see what happens with

and without clustering. In order for the results to be comparable the parameters

of the classi�er were �xed throughout all the tests. The parameters were chosen as

σ = 5 and Pavg = 3, where σ is the shading factor and Pavg is the the number of

points that are averaged to obtain the �nal position. The variance caused by the

non-deterministic nature of the K-means algorithm, can be reduced by averaging

over multiple test runs of the program with the same parameters. It is worth noting

that the accuracy is not very high according to modern standards due to the fact

that no type of �ltering or secondary positioning method was used.

The parameters of the clustering algorithm that should be analysed are the number

of clusters K as well as the size Kfine of the subset of clusters in which the �ne

positioning is done. Because iterating through every possible combination of K

and Kfine takes too long the analysis is done by �xing one of the parameters and

iterating through the other. By �xing one of the parameters to a few values and

iterating through the other, it is possible to search for a pattern.

4.1 Clusters used in positioning

Ideally the value of Kfine should be chosen small enough to allow for a decrease in

the execution time, while maintaining the best possible positioning accuracy. Since

the optimal value ofK is undetermined multiple test were made, where all the values

of Kfine were iterated for a certain value of K. The e�ect of Kfine on the error, �oor

4.1. Clusters used in positioning 15

(a) Increase in error (Building 1). (b) Increase in error (Building 5).

(c) Increase in �oor detection accuracy (Build-ing 1).

(d) Increase in �oor detection accuracy (Build-ing 5).

(e) Di�erence in execution time (Building 1). (f) Di�erence in execution time (Building 5).

Figure 4.1 The quality of the positioning depending on the value of Kfine for Buildings

1 and 5. It can be seen that independent of the number of clusters the plots converge to

the same values. A few exceptions are visible in (b) where high numbers of clusters cause

a signi�cant increase in error. Also in (d) for K = 6, due to the number of clusters being

smaller than the number of �oors the increase in �oor detection accuracy is smaller.

detection accuracy and execution time is plotted for two of the buildings in Figure

4.1. From the plots it can be observed that after a certain point increasing Kfine

has no e�ect, except an increase in execution time. This point also seems to be the

same when considering either the error or the �oor detection accuracy, since they

seem to be inversely proportional. Therefore, analysing either one should yield the

same result.

4.1. Clusters used in positioning 16

(a) 3% margin of error. For small margins of error the optimal valueof Kfine seems to be highly dependant on the structure of the buildingand how it was �ngerprinted.

(b) 10% margin of error. Higher margins of error cause the optimalvalues of Kfine for each building to converge towards more similarvalues.

Figure 4.2 Optimal value of Kfine allowing for a certain margin of error for all the

building. The plots are created based on the results in Figure 4.1 as well as similar ones

for the other building that were not displayed.

The points in Figure 4.2 represent the lowest values of Kfine for which the error

is smaller than the minimum error achieved for that value of K plus an allowed

margin of error. For example, in 4.1(b) the smallest increase in error achievable for

Kfine = 32 is 8%. If the allowed margin of error is for example 2% the critical point

will be the �rst value of Kfine for which the error is smaller than 10%.

In many cases the minimum error will be very close to the error when not using

clusters, since when Kfine = 100% it means that all the points will be used in the

4.2. Number of clusters 17

�ne positioning method. For example, this assumption is not true in 4.1(b) where

having too many clusters can signi�cantly impact the precision. These di�erences

in accuracy are caused by fact that even though all points are considered, a group

of points can only be averaged if they all belong to the same cluster.

The margins of error used to plot Figures 4.2(a) and 4.2(b) were chosen arbitrarily

only as examples. Depending on the application and the type of position �ltering

used other values might be more appropriate. From Figure 4.2(a) it is clear that

the geometry of the building and the way the �ngerprints are distributed can cause

signi�cant di�erences in the number of clusters needed to avoid a large increase in

error.

The optimal value of Kfine appears to follow an inverse exponential curve in relation

to the total number of clusters. For higher margins of error the datasets quickly

converge to approximately the same values, as can be seen when comparing the

distance between the plots in the two Figures 4.2(a) and 4.2(b). Lower margins of

error have more noise due to the quantization of the data causing larger jumps in

the values.

Based on the average of the plots a value of Kfine can be chosen so that on average

it respects the chosen margin of error. For the purpose of further calculations the

margin of error chosen is 3% as in Figure 4.2(a). The average of the plots yields the

following relation between K and M

Kfine = 130×K−0.37 − 8.5. (4.1)

This leads to the values Kfine = {92%, 78%, 69%, 63%, 58%, . . .}, for K > 1. These

values are approximations with only a statistical guarantee that on average the

increase in error will be around 3%. The percentage values will also have to be

rounded up, since due to the discreet nature of the number of clusters only certain

percentages are possible.

4.2 Number of clusters

By using the method discussed in the previous section for choosingKfine, choosingK

will be much easier. The relation between the number of clusters and the error, �oor

detection accuracy and execution time, is presented in Figure 4.3, when choosing

Kfine with a 3% margin of error as presented in Equation (4.1). Due to the method

performing di�erently for each building it is easier to judge the results based on the

averages rather than for each building individually.


(a) Increase of the mean absolute error. The increasein error reaches a minimum at around K = 10 afterwhich is slowly increases. Except for Building 3 the otherbuildings have very similar increases in error.

(b) Increase in the probability to correctly determinethe correct �oor. The �oor detection accuracy seemsto always increase and after it reaches it's maximum ataround K = 10 it remains constant.

(c) Decrease in the execution time. For small numbers ofclusters the overhead is signi�cant, causing the improve-ment to be marginal. The most signi�cant improvementis obtained on average at around 20 < K < 30.

Figure 4.3 The quality of the positioning depending on the number of clusters K, with

Kfine chosen with a 3% margin of error according to Equation (4.1).


From Figure 4.3 it can be seen that the method is imprecise for very small numbers

of clusters. After the average error reaches its minimum at around K = 12 it begins

slowly increasing. The opposite is true for the �oor detection accuracy which reaches

its maximum at K = 12 after which it start slowly decreasing. The time complexity

analysis in Section 3.5 explains why the time decreases exponentially. Since Kfine

is exponentially decreasing in function of K, the ratioKfine

Kis also exponentially

decreasing. It is notable that on average there is always an improvement in execution

time when using clusters.

The unreliability of small numbers of cluster is probably due to the fact that deter-

mining the correct cluster to use in the �ne positioning method is unreliable. The

coarse positioning method is especially unreliable on the border between the clusters

where they all contain �ngerprints from the same access points. For large numbers

of clusters the cause of the increase in error could be due to several factors, including

the restriction that the positions of �ngerprints can only be averaged if the are in

the same cluster and the possibility of the coarse positioning not selecting the best

cluster.

Since the execution time is exponentially decreasing there is no reason to choose a

very high value of K since there are no longer any bene�ts. A reasonable choice is

K = 12 since it has the smallest error and the highest increase in �oor detection ac-

curacy, while maintaining a signi�cant improvement in execution time. The precise

results for each building when K = 12 and Kfine = 7 are presented in Table 4.1.

Table 4.1 A comparison of the numeric results obtained when using only the Naive

Bayes classi�er versus when using the Naive Bayes classi�er in conjunction with K-means

clustering.

Error (m) Floor detection (%) Execution time (ms)

NaiveClustering

NaiveClustering

NaiveClustering

Building Bayes Bayes Bayes

1 4.9 5.0 89 95 13 6

2 4.5 4.7 93 98 4 2

3 9.0 10.6 85 89 6 3

4 11.3 11.6 63 81 17 11

5 6.6 6.9 63 73 8 5

Average 7.3 7.8 79 87 10 5

From Table 4.1 it can be seen that on average the error increased by approximately

7%. Half of the increase in error is cause by the clustering algorithms poor perfor-

mance for Building 3, which has a large surface and a relatively small number of

�ngerprints. If Building 3 were removed from consideration, the increase in error

would be around 3%, which is equal to the chosen margin of error. This further


emphasizes how much the results are in�uenced by how the �ngerprinting was done

and by the geometry of the building. An increase in �oor detection accuracy was

noticeable for all building, with the average increase being around 10% or 8 %-

points. The increase was much higher for Buildings 4 and 5, probably due to the

fact that they have more �oors than the other buildings. All the execution times

were approximately halved, with the average execution time dropping to 50% of the

original. Even though the errors seem large it is important to remember that the

results are un�ltered, meaning that all the positions were calculated without prior

knowledge about the previous positions.

21

5. CONCLUSIONS

Segmenting the �ngerprints using K-means clustering proved to have some bene�ts.

As a result the Naive Bayes-based positioning algorithm needed to estimate the

position only in a part of the clusters to maintain accuracy. The combination of the

two algorithms caused an average increase in error of 7% while on average increasing

the accuracy with which the correct �oor was detected by 10%. It is worth noting

that half of the increase in error was caused by one out of the 5 building in which

the test were performed. The clustering also resulted in the execution time being

reduced to half, due to only using half of the provided �ngerprints.

Overall the obtained results are promising, since they show the data can be seg-

mented without signi�cantly increasing the error and also potentially adding some

bene�ts. The increase in �oor detection accuracy seems to always be present since

the clusters naturally tend to form on separate �oors, thus forcing the decision to

either �oor. The �oor detection accuracy is important since knowledge of the correct

�oor alone can be used in a number of useful applications.

The ability to segment the data can be useful in multiple situations, especially in

storing and transmitting large �ngerprint maps. In a situation where every building

would be �ngerprinted the amount of data for a certain area would be huge. Clus-

tering allows for an intuitive way of dividing the �ngerprints into groups, that can be

transmitted individually. Indoor positioning devices would not require large storing

capacity or data transfer since clustering provides a logical way of determining which

segments of data should be transmitted and which segments have become obsolete

and can be deleted, without having to analyse individual �ngerprints. Outdoor

positioning methods, such as GPS, do not face similar problems since no previous

knowledge about the current area is required for determining the position.

Clustering also provides a means of balancing time e�ciency and accuracy. It is

conceivable that even though for other RSS positioning methods clustering might

not provide an increase in accuracy, the increase in time e�ciency would still exist.

Even though at this point no RSS positioning method is accurate enough to warrant

reducing its accuracy to gain speed, in the future this might be a possibility.

Chapter 5. Conclusions 22

All the test done in this thesis were to calculate the initial position, meaning that

no �ltering or prediction was used based on previous movements. Adding �ltering

to continuous positioning signi�cantly increases accuracy [14] and would be the next

step in improving the discussed positioning method. Continuous positioning would

also allow for improving the method for selecting the Kfine clusters. Selection of the

clusters surrounding the current position could be done also based on the cluster

centroids position in relation to the previous position and not only based on the

similarity of the seen access points.

23

BIBLIOGRAPHY

[1] Y. Gu, A. Lo and I. Niemegeers. �A Survey of Indoor Positioning Systems

for Wireless Personal Networks�. IEEE Communications Surveys & Tutorials,

March 2009, 11(1), pp. 13�32.

[2] H. Koyuncu and S. H. Yang. �A Survey of Indoor Positioning and Object Locat-

ing Systems�. IJCSNS International Journal of Computer Science and Network

Security, May 2010, 10(5), pp. 121�128.

[3] C.N. Huang, C.Y. Chiang, J.S. Chang, Y.C. Chou, Y.X. Hong, S.J. Hsu, W.C.

Chu and C.T. Chan. �Location-aware fall detection system for medical care

quality improvement�. IEEE Third International conference on Multimedia and

Ubiquitos Engineering, June 2009, Qingdao, pp. 477�480.

[4] G. Dedes and A.G. Dempster. �Indoor gps positioning�. IEEE Semiannual Ve-

hicular Technology Conference, 2005.

[5] L. Mengual, O. Marban and S. Eibe. �Clustering-based location in wireless

networks�. Expert Systems with applications, 2010, 37(9).

[6] H. Nurminen el al. �Statistical path loss parameter estimation and position-

ing using RSS measurements�. Ubiquitous Positioning, Indoor Navigation, and

Location Based Service (UPINLBS), October 2012, Helsinki, pp. 1�8.

[7] Z. Tian, X. Tang, M. Zhou and Z. Tan. �Fingerprint indoor positioning algo-

rithm basedon a�nity propagation clustering�. EURASIP Journal on Wireless

Communications and Networking, 2013.

[8] N. Swangmuang. �A Location Fingerprint Framework Towards E�cient Wire-

less Indoor Positioning Systems�. Ph.D. thesis, University of Pittsburgh, 2008,

pp. 76�93.

[9] M.E. Celebi, H.A. Kingravi, P.A. Vela. �A comparative study of e�cient ini-

tialization methods for the k-means clustering algorithm�. Expert Systems with

Applications, January 2013, 40(1), pp. 200�210.

[10] Tampere University of Technology.WLAN indoor measurements. [Online] Avail-

able from: http://www.cs.tut.fi/tlt/pos/Measurements.htm [Accessed

Feb 2015].

[11] R.O. Duda, P.E. Hart and D.G. Stork. �Pattern Classi�cation�, 2nd edition,

Wiley, 2012.

http://www.cs.tut.fi/tlt/pos/Measurements.htm

24

[12] A. Hatami and K. Pahlavan. �A comparative performance evaluation of RSS-

based positioning algorithms used in WLAN networks�. IEE Wireless Commu-

nications and Networking Conference, March 2005, pp. 2331�2337.

[13] F. Alsehly, T. Arslan and Z. Sevak. �Indoor positioning with �oor determination

in multi story buildings�. IEEE International Conference on Indoor Positioning

and Indoor Navigation(IPIN), September 2011, pp. 1�7.

[14] W. Chai, C. Chen, E. Edwan, J. Zhang, and O. Lo�eld. �INS/Wi-Fi based

indoor navigation using adaptive Kalman �ltering and vehicle constraints".

Proceedings of 9th Workshop on Positioning, Navigation and Communication

(WPNC), 2012.

andrei cramariuc indoor positioning using wireless … cramariuc.pdf · of outdoor positioning...

Documents