intro to modelling-supervised learning

INTRO TO MACHINE LEARNING Justin Sebok

CONTENTSWhat is machine learning?Types of machine learningSupervised learning and examplesUnsupervised learning and examples

WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed.


WTF does that mean?!


WTF does that mean?!

Basically, Machine Learning involves using some “algorithms” which learn using data to improve their predictions of something using patterns in the data.

Data Algorithm

Predictions

WHAT IS MACHINE LEARNING? “… without being explicitly programmed”

This is what makes machine learning so powerful. Rather than requiring specific instructions like in traditional computing, machine learning allows the computers to improve their predictions just using the data inputs.

TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to.

Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph.

Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative.

Examples: Try to identify “clusters” of customers based on data we have on them

TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter.

Examples: Identify the colour seen in a picture. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale.

Examples: Predict someone’s height.

TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. These are categories or classes. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter.

Examples: Identify plant species. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale.

Examples: Predict someone’s height.

INTRO TO A FEW SUPERVISED LEARNING MODELS Nearest Neighbours (Classification and Regression) Decision Trees (Classification and Regression) Linear Regression (Regression)

QUICK TERMINOLOGY Observation: One of the “things” we are looking at. Could be a person, a time, or a place. Feature: Some aspect of the observation that we know. Could be a person’s hair colour, the latitude and longitude of a city, or the number of rooms a house has. May be denoted as x Label: The feature of an observation which we are trying to predict. For labelled observations, we already know the answer. May be denoted as y

NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them

NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them

Method: For the 1-Nearest Neighbour algorithm, find the closest labelled observation to the unlabelled observation and apply the same label.

While it may seem very simple, it is often very effective! It can be used for classification or regression

1 NEAREST NEIGHBOUR PREDICTIONS


?


? Here there is some ambiguity. We are equal distance from both classes.

In this case, for 1-NN we would just flip a coin to choose a class at random


?6

3 0

86

1.5

5


6

3 0

86

1.5

58

K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions.

What is an outlier?


What is an outlier? Outlier is a point which is distant or very different from other observations.

This may be a legitimate datapoint, or may be an example of “noise” in the data

ANY OUTLIERS HERE?


How could we attempt to counteract this problem?



Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have.




What happens when we have a tie?




What happens when we have a tie?

Flip a coin… Or we could use 3-Nearest Neighbours – No ties if we only have 2 classes

3-NEAREST NEIGHBOUR PREDICTIONS

?


?6

3 0

86

1.5

5How can we use the 3-nearest neighbour approach in regression?


6

3 0

86

1.5

54.67

SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias


Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance.


Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance.

Bias is the error in prediction we get from using a simplified model to predict very complex real-world things. If we choose too large a k-value, we may have a high level of bias.

VARIANCE VS BIAS One big part of machine learning is striking the right balance between these two types of errors.

PROBLEM OF DIMENSIONALITY

1 Dimension: 5 observations to fill the space

How many observations do we need to fill 2 dimensions?



2 Dimensions: 25 observations to fill the space

How many observations do we need to fill 3 dimensions?





As dimensionality increases, the number of observations required to “fill the space” increases exponentially

DECISION TREES Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.

DECISION TREES

We can display these cuts in the form of a tree, hence the name.

Here is an example of such a tree used for predicting height

Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.

DECISION TREE – “CUTTING THE SPACE”

DECISION TREE - “CUTTING THE SPACE”

This is an example of “cutting the space”

DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?

?


6 85 ?


6 85 6.33

DECISION TREE – WHERE DO WE CUT?

Each cut should improve the prediction accuracy by as much as possible

DECISION TREE – WHERE DO WE CUT?

HOW COULD WE CUT A “REGRESSION” DECISION TREE?

HOW COULD WE CUT A “REGRESSION” DECISION TREE? Very similar to the way classification trees are cut. Each cut should reduce the difference between predicted output in an area and the actual training output

BIAS AND VARIANCE IN DECISION TREESWhat would a decision tree with a high degree of bias look like?

What would a decision tree with a high degree of variance look like?

LINEAR REGRESSION I will assume everyone knows the basics of linear regression. While I won’t go into any of the maths, it is very useful to look at this with the other models.

LINEAR REGRESSION I will assume everyone knows the basics of linear regression. What is a very basic definition of linear regression?

LINEAR REGRESSION What would a linear regression line with a high degree of bias look like?

What would a linear regression line with a high degree of variance look like?

SPECTRUM OF SUPERVISED LEARNING TECHNIQUES

No assumptions about data

Lots of assumptions about data

Where do the techniques we have discussed fall on this spectrum?




Not computationally efficient

Very computationally efficient

The more assumptions we can make about our data, the more computationally efficient we can make it




Not computationally efficient

Very computationally efficient

K-Nearest Neighbours

Decision Trees

Linear Regression

intro to modelling-supervised learning

Data & Analytics