intro to modelling-supervised learning

64
INTRO TO MACHINE LEARNING Justin Sebok

Upload: justin-sebok

Post on 08-Feb-2017

207 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Intro to modelling-supervised learning

INTRO TO MACHINE LEARNING Justin Sebok

Page 2: Intro to modelling-supervised learning

CONTENTSWhat is machine learning?Types of machine learningSupervised learning and examplesUnsupervised learning and examples

Page 3: Intro to modelling-supervised learning

WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed.

Page 4: Intro to modelling-supervised learning

WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed.

WTF does that mean?!

Page 5: Intro to modelling-supervised learning

WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed.

WTF does that mean?!

Basically, Machine Learning involves using some “algorithms” which learn using data to improve their predictions of something using patterns in the data.

Data Algorithm

Predictions

Page 6: Intro to modelling-supervised learning

WHAT IS MACHINE LEARNING? “… without being explicitly programmed”

This is what makes machine learning so powerful. Rather than requiring specific instructions like in traditional computing, machine learning allows the computers to improve their predictions just using the data inputs.

Page 7: Intro to modelling-supervised learning

TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to.

Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph.

Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative.

Examples: Try to identify “clusters” of customers based on data we have on them

Page 8: Intro to modelling-supervised learning

TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to.

Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph.

Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative.

Examples: Try to identify “clusters” of customers based on data we have on them

Page 9: Intro to modelling-supervised learning

TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to.

Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph.

Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative.

Examples: Try to identify “clusters” of customers based on data we have on them

Page 10: Intro to modelling-supervised learning

TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter.

Examples: Identify the colour seen in a picture. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale.

Examples: Predict someone’s height.

Page 11: Intro to modelling-supervised learning

TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. These are categories or classes. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter.

Examples: Identify plant species. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale.

Examples: Predict someone’s height.

Page 12: Intro to modelling-supervised learning

TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. These are categories or classes. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter.

Examples: Identify plant species. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale.

Examples: Predict someone’s height.

Page 13: Intro to modelling-supervised learning

INTRO TO A FEW SUPERVISED LEARNING MODELS Nearest Neighbours (Classification and Regression) Decision Trees (Classification and Regression) Linear Regression (Regression)

Page 14: Intro to modelling-supervised learning

QUICK TERMINOLOGY Observation: One of the “things” we are looking at. Could be a person, a time, or a place. Feature: Some aspect of the observation that we know. Could be a person’s hair colour, the latitude and longitude of a city, or the number of rooms a house has. May be denoted as x Label: The feature of an observation which we are trying to predict. For labelled observations, we already know the answer. May be denoted as y

Page 15: Intro to modelling-supervised learning

NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them

Page 16: Intro to modelling-supervised learning

NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them

Method: For the 1-Nearest Neighbour algorithm, find the closest labelled observation to the unlabelled observation and apply the same label.

While it may seem very simple, it is often very effective! It can be used for classification or regression

Page 17: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

Page 18: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

?

Page 19: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

Page 20: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

?

Page 21: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

Page 22: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

?

Page 23: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

? Here there is some ambiguity. We are equal distance from both classes.

In this case, for 1-NN we would just flip a coin to choose a class at random

Page 24: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

?6

3 0

86

1.5

5

Page 25: Intro to modelling-supervised learning

1 NEAREST NEIGHBOUR PREDICTIONS

6

3 0

86

1.5

58

Page 26: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

What is an outlier?

Page 27: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

What is an outlier? Outlier is a point which is distant or very different from other observations.

This may be a legitimate datapoint, or may be an example of “noise” in the data

Page 28: Intro to modelling-supervised learning

ANY OUTLIERS HERE?

Page 29: Intro to modelling-supervised learning

ANY OUTLIERS HERE?

Page 30: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

How could we attempt to counteract this problem?

Page 31: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

How could we attempt to counteract this problem?

Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have.

Page 32: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

How could we attempt to counteract this problem?

Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have.

What happens when we have a tie?

Page 33: Intro to modelling-supervised learning

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

How could we attempt to counteract this problem?

Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have.

What happens when we have a tie?

Flip a coin… Or we could use 3-Nearest Neighbours – No ties if we only have 2 classes

Page 34: Intro to modelling-supervised learning

3-NEAREST NEIGHBOUR PREDICTIONS

?

Page 35: Intro to modelling-supervised learning

3-NEAREST NEIGHBOUR PREDICTIONS

Page 36: Intro to modelling-supervised learning

3-NEAREST NEIGHBOUR PREDICTIONS

?6

3 0

86

1.5

5How can we use the 3-nearest neighbour approach in regression?

Page 37: Intro to modelling-supervised learning

3 NEAREST NEIGHBOUR PREDICTIONS

6

3 0

86

1.5

54.67

Page 38: Intro to modelling-supervised learning

SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias

Page 39: Intro to modelling-supervised learning

SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias

Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance.

Page 40: Intro to modelling-supervised learning

SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias

Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance.

Bias is the error in prediction we get from using a simplified model to predict very complex real-world things. If we choose too large a k-value, we may have a high level of bias.

Page 41: Intro to modelling-supervised learning

VARIANCE VS BIAS One big part of machine learning is striking the right balance between these two types of errors.

Page 42: Intro to modelling-supervised learning

PROBLEM OF DIMENSIONALITY

1 Dimension: 5 observations to fill the space

How many observations do we need to fill 2 dimensions?

Page 43: Intro to modelling-supervised learning

PROBLEM OF DIMENSIONALITY

1 Dimension: 5 observations to fill the space

2 Dimensions: 25 observations to fill the space

How many observations do we need to fill 3 dimensions?

Page 44: Intro to modelling-supervised learning

PROBLEM OF DIMENSIONALITY

1 Dimension: 5 observations to fill the space

2 Dimensions: 25 observations to fill the space

3 Dimensions: 125 observations to fill the space

As dimensionality increases, the number of observations required to “fill the space” increases exponentially

Page 45: Intro to modelling-supervised learning

DECISION TREES Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.

Page 46: Intro to modelling-supervised learning

DECISION TREES

We can display these cuts in the form of a tree, hence the name.

Here is an example of such a tree used for predicting height

Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.

Page 47: Intro to modelling-supervised learning

DECISION TREE – “CUTTING THE SPACE”

Page 48: Intro to modelling-supervised learning

DECISION TREE - “CUTTING THE SPACE”

This is an example of “cutting the space”

Page 49: Intro to modelling-supervised learning

DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?

?

Page 50: Intro to modelling-supervised learning

DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?

Page 51: Intro to modelling-supervised learning

DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?

6 85 ?

Page 52: Intro to modelling-supervised learning

DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?

6 85 6.33

Page 53: Intro to modelling-supervised learning

DECISION TREE – WHERE DO WE CUT?

Each cut should improve the prediction accuracy by as much as possible

Page 54: Intro to modelling-supervised learning

DECISION TREE – WHERE DO WE CUT?

Page 55: Intro to modelling-supervised learning

DECISION TREE – WHERE DO WE CUT?

Page 56: Intro to modelling-supervised learning

HOW COULD WE CUT A “REGRESSION” DECISION TREE?

Page 57: Intro to modelling-supervised learning

HOW COULD WE CUT A “REGRESSION” DECISION TREE? Very similar to the way classification trees are cut. Each cut should reduce the difference between predicted output in an area and the actual training output

Page 58: Intro to modelling-supervised learning

BIAS AND VARIANCE IN DECISION TREESWhat would a decision tree with a high degree of bias look like?

What would a decision tree with a high degree of variance look like?

Page 59: Intro to modelling-supervised learning

LINEAR REGRESSION I will assume everyone knows the basics of linear regression. While I won’t go into any of the maths, it is very useful to look at this with the other models.

Page 60: Intro to modelling-supervised learning

LINEAR REGRESSION I will assume everyone knows the basics of linear regression. What is a very basic definition of linear regression?

Page 61: Intro to modelling-supervised learning

LINEAR REGRESSION What would a linear regression line with a high degree of bias look like?

What would a linear regression line with a high degree of variance look like?

Page 62: Intro to modelling-supervised learning

SPECTRUM OF SUPERVISED LEARNING TECHNIQUES

No assumptions about data

Lots of assumptions about data

Where do the techniques we have discussed fall on this spectrum?

Page 63: Intro to modelling-supervised learning

SPECTRUM OF SUPERVISED LEARNING TECHNIQUES

No assumptions about data

Lots of assumptions about data

Not computationally efficient

Very computationally efficient

The more assumptions we can make about our data, the more computationally efficient we can make it

Page 64: Intro to modelling-supervised learning

SPECTRUM OF SUPERVISED LEARNING TECHNIQUES

No assumptions about data

Lots of assumptions about data

Not computationally efficient

Very computationally efficient

K-Nearest Neighbours

Decision Trees

Linear Regression