rugby players, ballet dancers, and the nearest neighbour classifier

33
Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2

Upload: tessa

Post on 05-Jan-2016

47 views

Category:

Documents


2 download

DESCRIPTION

COMP24111 lecture 2. Rugby players, Ballet dancers, and the Nearest Neighbour Classifier. A Problem to Solve with Machine Learning. Distinguish rugby players from ballet dancers. You are provided with a few examples. Fallowfield rugby club (16). Rusholme ballet troupe (10). Task - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Rugby players, Ballet dancers,

and the

Nearest Neighbour Classifier

COMP24111 lecture 2

Page 2: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

A Problem to Solve with Machine Learning

Distinguish rugby players from ballet dancers.

You are provided with a few examples.Fallowfield rugby club (16).Rusholme ballet troupe (10).

TaskGenerate a program which will correctly classify ANY player/dancer in the world.

HintWe shouldn’t “fine-tune” our system too much so it only works on the local clubs.

Page 3: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Taking measurements….

We have to process the people with a computer, so it needs to be in a computer-readable form.

What are the distinguishing characteristics?

1. Height2. Weight3. Shoe size4. Sex

Page 4: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Class, or “label”

Terminology

“Features”

“Examples”

Page 5: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The Supervised Learning Pipeline

ModelModelTesting

Data(no labels)

Testing Data

(no labels)

Training dataand labels

Training dataand labels

Predicted Labels

Learning algorithm

Page 6: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Taking measurements….

Weight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

height

weight

Person

12345…1617181920…

Page 7: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The Nearest Neighbour RuleWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

height

weight

Person

12345…1617181920…

“TRAINING” DATA

Who’s this guy? - player or dancer?

height = 180cmweight = 78kg

“TESTING” DATA

Page 8: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The Nearest Neighbour RuleWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

height

weight

Person

12345…1617181920…

height = 180cmweight = 78kg

1. Find nearest neighbour

2. Assign the same class

“TRAINING” DATA

Page 9: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Model(memorize the training data)

Model(memorize the training data)

Testing Data

(no labels)

Testing Data

(no labels)

Training dataTraining data

Predicted Labels

Learning algorithm(do nothing)

Supervised Learning Pipeline for Nearest Neighbour

Page 10: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The K-Nearest Neighbour ClassifierWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

Person

12345…1617181920…

Testing point x

For each training datapoint x’

measure distance(x,x’)

End

Sort distances

Select K nearest

Assign most common class

“TRAINING” DATA

height

weight

Page 11: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Quick reminder: Pythagoras’ theorem

. . .

measure distance(x,x’)

. . .

height

weight

i

ii xxxx 2)'()',(distance

a.k.a. “Euclidean” distancea

cb 22

222

.... bacSo

cba

Page 12: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The K-Nearest Neighbour ClassifierWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

Person

12345…1617181920…

Testing point x

For each training datapoint x’

measure distance(x,x’)

End

Sort distances

Select K nearest

Assign most common class

“TRAINING” DATA

height

weight

Seems sensible.

But what are the disadvantages?

Page 13: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The K-Nearest Neighbour ClassifierWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

Person

12345…1617181920…

“TRAINING” DATA

height

weight

Here I chose k=3.

What would happen if I chose k=5?

What would happen if I chose k=26?

Page 14: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The K-Nearest Neighbour ClassifierWeight

63kg55kg75kg50kg57kg…85kg93kg75kg99kg100kg…

Height

190cm185cm202cm180cm174cm

150cm145cm130cm163cm171cm

Person

12345…1617181920…

“TRAINING” DATA

height

weight

Any point on the left of this “boundary” is closer to the red circles.

Any point on the right of this “boundary” is closer to the blue crosses.

This is called the “decision boundary”.

Page 15: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Where’s the decision boundary?

height

weight

Not always a simple straight line!

Page 16: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Where’s the decision boundary?

height

weight

Not always contiguous!

Page 17: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

The most important concept in Machine Learning

Page 18: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Looks good so far…

The most important concept in Machine Learning

Page 19: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Looks good so far…

Oh no! Mistakes!What happened?

The most important concept in Machine Learning

Page 20: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Looks good so far…

Oh no! Mistakes!What happened?

We didn’t have all the data.

We can never assume that we do.

This is called “OVER-FITTING”to the small dataset.

The most important concept in Machine Learning

Page 21: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

So, we have our first “machine learning” algorithm… ?

Testing point x

For each training datapoint x’

measure distance(x,x’)

End

Sort distances

Select K nearest

Assign most common class

The K-Nearest Neighbour Classifier

Make your own notes on its advantages / disadvantages.

I will ask for volunteers next time we meet…..

Page 22: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Model(memorize the training data)

Model(memorize the training data)

Testing Data

(no labels)

Testing Data

(no labels)

Training dataTraining data

Predicted Labels

Learning algorithm(do nothing)

Pretty dumb! Where’s the learning!

Page 23: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Now, how is this problem like

handwriting recognition?

height

weight

Page 24: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Let’s say the measurements are pixel values.

pixel 2 value

pixel 1 value

25500

255

(190, 85)

A two-pixel image

Page 25: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Three dimensions…A three-pixel image

This 3-pixel image is represented bya SINGLE point in a 3-D space.

pixel 2

pixel 1

pixel 3

(190, 85, 202)

Page 26: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

pixel 2

pixel 1

pixel 3

(190, 85, 202)

A three-pixel image

(25, 150, 75)

Another 3-pixel image

Straight line distance between them?

Distance between images

Page 27: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

pixel 2

pixel 1

pixel 3

(190, 85, 202)

A three-pixel image

A four-pixel image.

A five-pixel image

4-dimensional space? 5-d? 6-d?

Page 28: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

A four-pixel image. A different four-pixel image.

Assuming we read pixels in a systematic manner, we can now represent any image as a single point in a high dimensional space.

(190, 85, 202, 10)Same 4-dimensional vector!

(190, 85, 202, 10)

Page 29: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

16 x 16 pixel image. How many dimensions?

Page 30: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

?

We can measure distance in 256 dimensional space.

256

1

2)'()',(distancei

iii xxxx

Page 31: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

maybe

maybe

probablynot

Which is the nearest neighbour to our ‘3’ ?

Page 32: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

AT&T Research LabsThe USPS postcode reader – learnt from examples.

FAST recognition

NOISE resistant, can generalise to future

unseen patterns

Page 33: Rugby players, Ballet dancers, and the Nearest Neighbour Classifier

Your lab exercise

Use K-NN to recognise handwritten USPS digits.

Biggest mistake by students in last year’s class?

Final lab session before reading week.(see website for details)

i.e. you have 4 weeks.

…starting 2 weeks from now!