week 12 part a machine learningen1811/17s2/lectures/week12/week1… · 7.0 3.2 fault1 5.7 2.8...

ENGG1811 © UNSW, CRICOS Provider No: 00098G1 W11 slide 1

Week 12 Part A Machine Learning

ENGG1811 Computing for Engineers

We are collecting a lot of data •  Example: Tyre pressure is important for vehicle

fuel efficiency –  A company puts sensors inside the tyres of its fleet

and transmits the pressure data via the Internet

ENGG1811 © UNSW, CRICOS Provider No: 00098G W9 slide 2

Data versus information

•  Data: Raw records of facts / measurements •  Information: Patterns in the underlying data

–  We also call the pattern a model •  Example:

–  In lab11, you worked on the data on the extent of sea ice • Data: raw measurements •  Information: The extent of sea ice in the 1ast 10 years

was lowest in record, … –  How universities select students?

• Data: The past performance indicators of students •  Information to discover: Are there patterns or indicators

that can accurately predict who can do well at uni?


Machine learning

•  Given the huge amount of data generated, can data analysis be automated?

•  Machine learning –  Algorithms to automatically extract patterns from the

data

•  Let us see what machine learning can do


•  Apply pressure at one end and measure pressure at the other end

•  Use the measured pressure at the other end to detect whether the pipe is damaged

Damage detection


Detection problem

•  Detection problem: A yes or no answer –  Is it damaged? Is it not damaged? –  Is it present? Is it absent?


Damage classification


Classification problem

•  There are a number of classes and the problem is to determine which class the data is coming from –  Example: The wing is damaged and each class is a

location at which the wing is damaged. –  The classification problem is to determine which class

it is. This is the same as determining where the damage is located.

–  Detection problem is a special case of classification problem with two classes. For example, the damage detection problem is equivalent to a classification problem with 2 classes: damaged or not damaged.


Some other classification problems


•  Google driverless cars –  Classify road sign. Classes: Give way, Stop, speed limit, … –  Classify junction type. Classes: Traffic lights, roundabout, …

•  http://googleblog.blogspot.com.au/2014/05/just-press-go-designing-self-driving.html

Machine learning for prediction


http://www.gereports.com/post/123572457345/deep-machine-learning-ge-and-bp-will-connect/

Our agenda on machine learning •  We will not examine the technical details of machine

learning algorithms •  We take a black box approach


•  We will give you some intuition of how machines can learn

•  We will start off with a small example so that we can visualise what’s happening

Machine Learning Algorithm

Data

Pattern or Model

Fault detection – Problem setting

•  You want to detect whether a device is faulty or not •  You have conducted a lot of measurements on both

working and faulty devices •  For each device:

–  You measure two physical properties of the device. We denote these two physical properties by p and q

–  A label which says whether the measurements come from a working or faulty device

•  It’s not important to know what type of physical measurements p and q are in this discussion


Fault detection – the data


p q Normal/Fault 5.1 3.5 Normal 4.9 3.0 Normal … … …

5.0 3.3 Normal 7.0 3.2 Faulty

6.2 3.4 Faulty 5.9 3.0 Faulty

50 sets of measurements


Fault detection = Finding Boolean condition



5.0 3.3 Normal 7.0 3.2 Faulty

6.2 3.4 Faulty 5.9 3.0 Faulty

cond?

Normal Faulty

True False

This condition depends on p, q

Plotting the data


Question: If you do the detection manually, how will you separate the normal from the faulty?

What pattern do you observe?

Manual detection


q - 1.4p + 4.3 = 0

q - 1.4p + 4.3 > 0

q - 1.4p + 4.3 < 0

Can this line be found automatically from the data?

Machine learning method: Support vector machines


The key idea of support vector machine is to find hyperplane to separate the data Minimise the chance of getting the detection wrong

Model from Support Vector Machines •  For each pair of (p,q), the model from support vector

machine will tell us whether the device is normal or faulty


p

q

If (p,q) is in the red region, then the device is faulty

If (p,q) is in the green region then the device is normal

Machine learning method: Classification tree

•  This method comes out with a set of rules automatically


if p > 5.45 if q > 3.45 % normal

else % fault end else if q > 2.8 %normal else %fault end end

p >= 5.45 p < 5.45

q < 2.8 q >= 2.8 q < 3.45 q>= 3.45

fault normal normal fault

Compare the two methods

•  Matlab file classifyFaults2Classes.m

•  Using SVM and classification trees •  Display the classification trees

•  Display the classification regions for the two methods –  Key: Green means normal, red means faulty –  Are the classification regions the same for the two

methods?


Oops! Two algorithms say different things

•  Can we tell which one is better? •  How do we tell? •  We will answer these questions for classification

problem


Fault classification

•  The same set of data as before

•  The device can have two types of faults, which we call fault1 and fault2

•  Every pair of (p,q) measured is given a label, which can be –  Normal –  Fault1 –  Fault2


Fault classification – the data



5.0 3.3 Normal 7.0 3.2 Fault1

5.7 2.8 Fault1 6.3 3.3 Fault2

6.2 3.4 Fault2 5.9 3.0 Fault2




What is learning? •  We learn by memorisation and generalisation


•  You have used memorisation and generalisation

•  We showed you these four images first •  This image is different from the first four, you are

generalising! http://www.cs.ubc.ca/~murphyk/Teaching/CS340-Fall07/L1_intro.pdf

Examples of generalisation

•  We are extremely good at generalisation •  When you were a child

–  You were shown a few pictures of giraffe –  You can recognise any giraffe –  You did not see and have not seen all the giraffes in this

world


•  In ENGG1811 –  We showed you examples on how to write code –  You can now write code for problems that you have

never seen before

Two steps of supervised learning

•  When you were a child –  You were shown a few pictures of giraffe/tiger/lion/… –  You can recognise any giraffe/tiger/lion/…


•  Two steps of supervised learning –  Training by examples –  Generalisation

•  Generalisation is the ability to make reasonable guesses when you are given situations that you haven’t encountered during training

What are the attributes of good learning algorithms?

•  Need few training samples •  Can generalise accurately


Measuring the ability to generalise

•  We divide the data into two disjoint sections. We call them training data and test data respectively

•  Example: For our data set –  We have 50 sets of measurements for normal, fault1

and fault2 –  We use 40 sets in each class for training, the rest for

testing


Example: Training and Testing data



Normal

5.0 3.3 Normal 7.0 3.2 Fault1

5.7 2.8 Fault1 6.3 3.3 Fault2

6.2 3.4 Fault2 5.9 3.0 Fault2

40 sets of measurements for training

40 sets (training)

10 sets of measurements For testing

40 sets (training)

10 sets (testing)

10 sets (testing)

Using training data

•  We do not feed all the measurements to the machine learning algorithm –  We feed only the training data –  The machine learning algorithms do not know what the

test data are

•  The machine learning algorithm calculates a model based on the training data


Using test data

•  An exam for the model calculated by machine learning –  We give (p,q) of the test data to the model and see what

the model gives


p q Normal/Fault 5.9 3.0 Fault2

1 test sample from fault2

We know what the correct answer is.

Model p = 5.9 q = 3.0

If the model answers “fault2”, then it’s correct; otherwise wrong.

Confusion matrix


•  We present results in the form of confusion matrix:

Normal Fault1 Fault2 Normal Fault1 Fault2

êAnswers from model ê Correct answers from test data èè

#wrong answers Expecting: Fault2 But model says Normal

#right answers Expecting: Fault2 Model also says Fault2

Quiz •  A better model gives more correct (or fewer wrong)

answers for the test data •  Which model is better?


Normal Fault1 Fault2 Normal 9 1 0 Fault1 0 10 0 Fault2 0 3 7

Normal Fault1 Fault2 Normal 9 1 0 Fault1 0 6 4 Fault2 0 6 4

Model A

Model B

Why is it difficult to tell fault1 from fault2 The regions of fault1 and fault2 overlap. The boundary between them is not clear.


What can we do to improve classification results?

•  Possibility 1: Measure some other quantities –  Say if you measure a and b instead of p and q, you get the

following plot. Use characteristics that allow use to clearly differentiate the classes


•  Possibility 2: Measure many other quantities and hopefully some of them will help

The need to rotate training and test data



Normal

5.0 3.3 Normal 7.0 3.2 Fault1

5.7 2.8 Fault1 6.3 3.3 Fault2

6.2 3.4 Fault2 5.9 3.0 Fault2

40 sets of measurements for training

40 sets (training)

10 sets of measurements for testing

40 sets (training)

10 sets (testing)

10 sets (testing)

•  Why should we choose these particular ten measurements as test data?

•  In fact, there is no particular reason to do that •  We should rotate training and test data

N-fold validation •  For example, 5-fold validation •  Divide the data into 5 sections

–  Section 1 (Rows 1-10), Section 2 (Rows 11-20), …. Section 5 (Rows 41-50)

•  Perform 5 rounds of training and testing



Normal

5.0 3.3 Normal

Round Training Test 1 Sections 2,3,4,5 Section 1 2 Sections 1,3,4,5 Section 2 3 Sections 1,2,4,5 Section 3 4 Sections 1,2,3,5 Section 4 5 Sections 1,2,3,4 Section 5

Features

•  For good classification results, we need to find characteristics or features that can help us to distinguish between different classes

•  A key lesson from machine learning: You need informative features to get good classification results

•  How can you get informative features? –  Domain knowledge of the problem –  Trial and error


Feature selection

•  Feature selection –  Give the machine learning algorithms a lot of features and

let algorithms rank the features for you • Used this method in one of my research work


A Collaborative Approach to Heading Estimationfor Smartphone-based PDR Indoor Localisation

Marzieh Jalal Abadi⇤†, Luca Luceri‡⇤, Mahbub Hassan⇤†, Chun Tung Chou⇤, Monica Nicoli‡⇤School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia

Email: abadim, lucal, mahbub, [email protected]†National ICT Australia, Locked Bag 9013, Alexandria, NSW 1435, Australia

Email: Marzieh.Abadi, [email protected]‡Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Italy

Email: [email protected], [email protected]

Abstract—Pedestrian dead reckoning (PDR) is widely usedfor indoor localisation. Its principle is to recursively update thelocation of the pedestrian by using step length and step heading.A common method to estimate the heading in PDR is to usemagnetometer measurements. However, unlike outdoor environ-ments, the Earth’s magnetic field is strongly perturbed insidebuildings making the magnetometer measurements unreliablefor heading estimation. This paper presents a new method toreduce heading estimation errors when magnetometers are used.The method consists of two components. The first componentuses a machine learning algorithm to detect whether a headingestimate is within a specific error margin. Only heading estimateswithin the error margin are retained and passed to the secondcomponent, while the other estimates are discarded. The secondcomponent uses data fusion to average the heading estimatesfrom multiple people walking in the same direction. The rationaleof this component is based on the observation that magneticperturbations are often highly localised in space and if multiplepeople are walking in the same direction, then only some oftheir magnetometers are likely to be perturbed. Data fusionbetween users can be carried out in a distributed mannerby using a consensus algorithm with information sharing overwireless links. We tested the performance of our method using92 datasets. Our proposed method gives an average headingestimate error of 2.06�. This is more than 6-fold error reductioncompared with using only the raw magnetometer measurementsto compute heading estimates (without any filtering and fusion),whose average error is 12.63�. With highly accurate step lengthestimation, the improved heading estimation leads to an averagelocalisation accuracy of 55cm, which is an 89% improvement overPDR localisation using only raw magnetometer measurements.

I. INTRODUCTION

Indoor localisation has many applications, such as locationenabled services, indoor navigation, emergency response andmany others. Although many network-based techniques havebeen proposed for indoor localisation [1], [2], they requireinfrastructure support which may not be always available,such as in emergency situations. Pedestrian Dead Reckoning(PDR) is an infrastructure-free localisation method which de-termines the position of people using inertial sensors, such asaccelerometers, gyroscopes and magnetometers. The principleof PDR is to update the position of the subject after a step hasbeen taken. The update can be performed recursively usingthe last known position, the step length and the estimatedheading. The step length can be estimated relatively accu-

rately [3], however the estimation of heading in the indoorenvironment is highly problematic. This is because headingestimation uses magnetometer measurements and the magneticfield in an indoor environment is highly perturbed due tostructural elements of the building, electric systems, industrialand electronic devices [4]. A number of methods have beenproposed to address the problem of heading estimation. Forexample, Kalman filter (KF) is used in [5] to give bestheading estimation. Another class of methods [1], [6], [7]exploit the magnetic field or WiFi fingerprinting to improvethe localisation accuracy.

In this paper, we propose a self-contained method, whichuses only magnetometer measurements to reduce the headingestimation error. The method consists of two components. Theaim of the first one is to detect whether a heading estimateobtained from the magnetometer measurements is within aspecific error threshold. We show that this detection can beperformed by machine learning using magnetometer measure-ments alone. Once heading estimates whose error is beyonda threshold are detected, these estimates are discarded. Onlythose estimates that are within an error threshold are retainedand passed to the second component. The second componentexploits the observation that magnetic perturbation can behighly localised in space, which means that it is possible toimprove the heading accuracy by fusing the estimated headingfrom multiple subjects [8]. We exploit consensus algorithms[9], [10] to perform the fusion in a distributed fashion.

The contributions of the paper are as follows:• We performed experiments to measure the magnetic fields

in several areas of the university campus. We used thesedatasets to analyse the impact of magnetic perturbationon heading estimation. The key conclusion is that headingerrors are highly localised in space.

• We proposed to use machine learning to detect whetherthe magnetometer measurements would result in a head-ing estimate within an error threshold. This detectionmethod is self-contained and uses only magnetometermeasurements.

• Given that heading estimation error is highly localised inspace, we proposed to use consensus algorithms to fusein a distributed way the heading estimates of multiple

A Collaborative Approach to Heading Estimationfor Smartphone-based PDR Indoor Localisation

Marzieh Jalal Abadi⇤†, Luca Luceri‡⇤, Mahbub Hassan⇤†, Chun Tung Chou⇤, Monica Nicoli‡⇤School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia

Email: abadim, lucal, mahbub, [email protected]†National ICT Australia, Locked Bag 9013, Alexandria, NSW 1435, Australia

Email: Marzieh.Abadi, [email protected]‡Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Italy

Email: [email protected], [email protected]

Abstract—Pedestrian dead reckoning (PDR) is widely usedfor indoor localisation. Its principle is to recursively update thelocation of the pedestrian by using step length and step heading.A common method to estimate the heading in PDR is to usemagnetometer measurements. However, unlike outdoor environ-ments, the Earth’s magnetic field is strongly perturbed insidebuildings making the magnetometer measurements unreliablefor heading estimation. This paper presents a new method toreduce heading estimation errors when magnetometers are used.The method consists of two components. The first componentuses a machine learning algorithm to detect whether a headingestimate is within a specific error margin. Only heading estimateswithin the error margin are retained and passed to the secondcomponent, while the other estimates are discarded. The secondcomponent uses data fusion to average the heading estimatesfrom multiple people walking in the same direction. The rationaleof this component is based on the observation that magneticperturbations are often highly localised in space and if multiplepeople are walking in the same direction, then only some oftheir magnetometers are likely to be perturbed. Data fusionbetween users can be carried out in a distributed mannerby using a consensus algorithm with information sharing overwireless links. We tested the performance of our method using92 datasets. Our proposed method gives an average headingestimate error of 2.06�. This is more than 6-fold error reductioncompared with using only the raw magnetometer measurementsto compute heading estimates (without any filtering and fusion),whose average error is 12.63�. With highly accurate step lengthestimation, the improved heading estimation leads to an averagelocalisation accuracy of 55cm, which is an 89% improvement overPDR localisation using only raw magnetometer measurements.

I. INTRODUCTION

Indoor localisation has many applications, such as locationenabled services, indoor navigation, emergency response andmany others. Although many network-based techniques havebeen proposed for indoor localisation [1], [2], they requireinfrastructure support which may not be always available,such as in emergency situations. Pedestrian Dead Reckoning(PDR) is an infrastructure-free localisation method which de-termines the position of people using inertial sensors, such asaccelerometers, gyroscopes and magnetometers. The principleof PDR is to update the position of the subject after a step hasbeen taken. The update can be performed recursively usingthe last known position, the step length and the estimatedheading. The step length can be estimated relatively accu-

rately [3], however the estimation of heading in the indoorenvironment is highly problematic. This is because headingestimation uses magnetometer measurements and the magneticfield in an indoor environment is highly perturbed due tostructural elements of the building, electric systems, industrialand electronic devices [4]. A number of methods have beenproposed to address the problem of heading estimation. Forexample, Kalman filter (KF) is used in [5] to give bestheading estimation. Another class of methods [1], [6], [7]exploit the magnetic field or WiFi fingerprinting to improvethe localisation accuracy.

In this paper, we propose a self-contained method, whichuses only magnetometer measurements to reduce the headingestimation error. The method consists of two components. Theaim of the first one is to detect whether a heading estimateobtained from the magnetometer measurements is within aspecific error threshold. We show that this detection can beperformed by machine learning using magnetometer measure-ments alone. Once heading estimates whose error is beyonda threshold are detected, these estimates are discarded. Onlythose estimates that are within an error threshold are retainedand passed to the second component. The second componentexploits the observation that magnetic perturbation can behighly localised in space, which means that it is possible toimprove the heading accuracy by fusing the estimated headingfrom multiple subjects [8]. We exploit consensus algorithms[9], [10] to perform the fusion in a distributed fashion.

The contributions of the paper are as follows:• We performed experiments to measure the magnetic fields

in several areas of the university campus. We used thesedatasets to analyse the impact of magnetic perturbationon heading estimation. The key conclusion is that headingerrors are highly localised in space.

• We proposed to use machine learning to detect whetherthe magnetometer measurements would result in a head-ing estimate within an error threshold. This detectionmethod is self-contained and uses only magnetometermeasurements.

• Given that heading estimation error is highly localised inspace, we proposed to use consensus algorithms to fusein a distributed way the heading estimates of multiple

TABLE II: Time Domain Statistical Features

Feature ME LOBF STD ROC MN MX R MAD RMS SK K

Definition MeanSlope of theLine of Best

Fit

Standarddeviation

Rateof

ChangeMinimum Maximum Range

MeanAbsoluteDeviation

RootMeanSquare

Skewness Kurtosis

TABLE III: Features selected by CFS method

Dataset ID mx

my

mz

Herr

Ferr

Ierr

1 ME, RMS ME, STD ME2 ME, MX, MN ME ME3 ME ME, RMS MAD MX4 ME, RMS RMS STD R5 ME, RMS, MN6 ME, RMS ME SK7 ME ME, MX, R RMS8 ME RMS9 RMS ME, MX, MN

10 ME ME, MX, MN MX MN R11 ME, RMS ME ME12 ME ME, RMS ME R

the mean quantity over the time window. For each timewindow, we determine whether the absolute error for meanheading is below the threshold � or not. This allows us tolabel each mean heading estimate as class C

0

or C1

. Again,we use � = 15�.

We use the following five classification algorithms forour evaluation: Support Vector Machine, Logistic Regression,Naıve Bayes, Decision tree and Multilayer Perceptron. TheWeka [39] machine learning software is used for performanceanalysis. For each dataset, we use 10-fold validation to createtraining and test sets. We report the classification results foreach classifier and each dataset in Table IV. Our conclusionis that Logistic Regression performs the best since it hasthe highest average accuracy (98%) and the best worst caseaccuracy (90%).

The above results were obtained from using the model learntfrom the training data in one corridor to classify test data ofthe same corridor. We have also attempted to build a modelfrom the data from a corridor in a building and use that toclassify test data from a different corridor in the same ordifferent building. The classification accuracy for these testsfor 4 different classification algorithms are reported in TableV. An observation is that the classification accuracy is muchlower than those reported in Table IV. These results againconfirm that heading estimation error is highly localised inspace and we cannot build one model for all the environments.

V. DISTRIBUTED FUSION OF HEADING ESTIMATES

In this section we propose to estimate the heading by fusingthe measurements collected by N users in L time windowsusing a weighted average that accounts for the different degreeof reliability of measurements. Only measurements retained bythe first processing step are considered in the fusion. Fusion

relies on the fact that the direction of walking is the samefor the N users and it is slowly varying over time so that itcan be assumed as constant in the considered time interval2.We denote the average heading estimate obtained by useri = 1, . . . , N in time window j = 1, . . . , L as h

i,j

= h+wi,j

where h is the true heading and wi,j

is the measurementerror with variance �2

i,j

. Fusion is conveniently applied tothe Cartesian components [x

i,j

, yi,j

] = [sin(hi,j

), cos(hi,j

)]rather than to the heading h

i,j

directly, in order to avoidambiguity problems in averaging angle measurements. Belowwe describe the fusion procedure that is used to draw theglobal estimate x from {x

i,j

}; a similar procedure is appliedto {y

i,j

} to get y. The heading estimate is finally obtained ash = tan(x/y).

We consider the fusion of the estimates xi,j

according tothe Best Linear Unbiased Estimator (BLUE) [40]:

x =

PN

i=1

PL

j=1

xi,j

/�2

i,j

PN

k=1

PL

l=1

1/�2

k,l

. (4)

We propose to perform the above fusion as a cascade of twosteps, fusion over time and fusion over users, as described inthe following.

A. Fusion over TimeAt first, each user performs a weighted average of the

measurements collected in the L time windows as

xi

=

PL

j=1

xi,j

/�2

i,j

PL

k=1

1/�2

i,k

. (5)

2The assumption of constant or slow-varying heading is made according tothe experiments carried out in this study where devices were moving alwaysin the same direction. Adaptive filtering can be employed for data fusion incase of time varying heading.

There is a lot more to machine learning …

•  Machine learning is everywhere. You have probably used machine learning without knowing it e.g. automatic face detection and recognition


http://support.apple.com/kb/ht4416

•  Unfortunately we won’t have time to talk about it •  I believe machine learning is a useful tool for all

engineering disciplines – learn more if possible

ENGG1811 © UNSW, CRICOS Provider No: 00098G Artificial Intelligence slide 42

Computer defeated Grand Master

On May 12th, 1997, the best chess player in the world, Gary Kasparov, lost a six-game chess match to a computer named “Deep Blue 2”

The press called this “humanity’s endgame” and a “bloody nose for human[s]”

What was so significant about this event?

Being able to program a computer to defeat a Grand Master level chess player had been a long-standing goal of the science of artificial intelligence - and it was achieved 1.7 decades ago.

Computers understand natural languages


http://online.wsj.com/articles/SB10001424052748704171004576148974172060658

Computer can drive cars


•  Google driverless cars. •  http://googleblog.blogspot.com.au/2014/05/just-press-go-designing-self-

driving.html

Are computers that smart?


http://www.scientificamerican.com/article/computers-vs-brains/

Summary

•  Computers have changed the way how engineers work

•  There are plenty of unsolved engineering challenges and computers can help in many of them

•  Two key skills in this course –  Programming –  Algorithms

•  Plus your domain knowledge + ingenuity –  You will have plenty to offer the world


week 12 part a machine learningen1811/17s2/lectures/week12/week1… · 7.0 3.2 fault1 5.7 2.8...

Documents