machine learning in practice lecture 2 carolyn penstein rosé language technologies institute/...

100
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Upload: sylvia-golden

Post on 19-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Overview of Machine Learning Process Skills

TRANSCRIPT

Page 1: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Machine Learning in PracticeLecture 2

Carolyn Penstein RoséLanguage Technologies Institute/ Human-Computer Interaction Institute

Page 2: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Plan for the Day Any questions? Announcements:

First homework assigned

Machine Learning process overview Learn how to use weka Introduce assignment Introduction to Cross-Validation

Page 3: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Overview of Machine Learning Process Skills

Page 4: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Naïve Approach: When all you have is a hammer…

TargetRepresentationData

Page 5: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Naïve Approach: When all you have is a hammer…

TargetRepresentation

Problem: there isn’t one universally best approach!!!!!

Data

Page 6: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Slightly less naïve approach: Aimless wandering…

TargetRepresentationData

Page 7: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Slightly less naïve approach: Aimless wandering…

TargetRepresentation

Problem 1: It takes too long!!!

Data

Page 8: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Slightly less naïve approach: Aimless wandering…

TargetRepresentation

Problem 2: You might not realize all of the options that are available to you!

Data

Page 9: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Expert Approach: Hypothesis driven

TargetRepresentationData

Page 10: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Expert Approach: Hypothesis driven

TargetRepresentation

You might end up with the same solution in the end, but you’ll get there faster.

Data

Page 11: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Expert Approach: Hypothesis driven

TargetRepresentation

Today we’ll start to learn how!

Data

Page 12: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Warm Up Exercise

Page 13: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Every combination of featurevalues is represented.

Warm Up Exercise

Page 14: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Every combination of featurevalues is represented.

What will happen if youtry to predict HairColor

from the other features?

Warm Up Exercise

Page 15: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Warm Up Exercise

If you don’t have good features,even the most powerful algorithmwon’t be able to learn an accurate

prediction rule.

Page 16: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Every combination of featurevalues is represented.

What will happen if youtry to predict HairColor

from the other features?

Warm Up Exercise

If you don’t have good features,even the most powerful algorithmwon’t be able to learn an accurate

prediction rule.

But that doesn’t mean thisdata set is a hopeless case!

For example, maybe the people wholike red and have brown hair like a different shade of red than the ones

who have blond hair.

So ask yourself: what information might be hidden

or implicit that might allow me to learna rule?

Page 17: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Getting a bit more sophisticated…

Page 18: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Example Data Set

Page 19: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Example Data Set

We’re going to consider a new algorithm

Page 20: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Example Data Set

We’re going to consider a new algorithm

We’re also going to considerdata representation issues

Page 21: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

Page 22: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

Page 23: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

What will it do with this example?

Page 24: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

What will it do with this example?

Page 25: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

What will it do with this example?

Page 26: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More Complex Algorithm… Two simple algorithms

last time0R – Predict the majority

class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

What will it do with this example?

Page 27: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Page 28: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Page 29: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Page 30: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?

Page 31: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?

Page 32: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?If you know the shape, you have fewer degrees

of freedom – less room to make a mistake.

Page 33: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?If you know the shape, you have fewer degrees

of freedom – less room to make a mistake.

Page 34: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?If you know the shape, you have fewer degrees

of freedom – less room to make a mistake.

Page 35: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Now lets say you don’t know the shape, now what would you learn?If you know the shape, you have fewer degrees

of freedom – less room to make a mistake.

Page 36: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why is it better? Not because it is more complex

Sometimes more complexity makes performance worse

What is different in what the three rule representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Page 37: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

Page 38: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

Who ran theopinion poll

Page 39: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

When the pollwas conducted

Page 40: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

Who the Democraticcandidate would be

:

Page 41: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

Who the Republicancandidate would be

:

Page 42: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

Who is runningagainst who

Page 43: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set Which party

will win

Page 44: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Back to the Opinion Poll Data Set

From http://www.swivel.com/ Example of the kind of data set you could

use for your course projectBetter to find a larger data set

This is what we wantto predict

Page 45: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Do you see any redundant information?

Page 46: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Do you see any missing or hidden information?

Page 47: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

How could you expand on what’s here?

Add features thatdescribe the source

Page 48: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

How could you expand on what’s here?

Add features that describethings that were going on

during the time when the pollwas taken

Page 49: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

How could you expand on what’s here? Add features that

describe personalcharacteristics of the

candidates

Page 50: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

What do you think would be the best rule?

Page 51: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

What would Weka do with this data?

Page 52: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka

Start Weka Open up the

Explorer interface

Page 53: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Click on Open File

Open OpinionPoll.csv from the Lectures folder

You can save it as a .arff file

Page 54: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Click on Open File

Open OpinionPoll.csv from the Lectures folder

You can save it as a .arff file

Summary stats for selected attributes are displayed

Page 55: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka

Observe interaction between attributes by selecting on interface

Select oneattribute here

Select anotherattribute here

Page 56: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka

Observe interaction between attributes by selecting on interface

Select oneattribute here

Select anotherattribute here

Based on what you see, do you think thesources of the opinion polls were biased?

Page 57: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Go to

Classify Panel

Select a classifier

Page 58: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Select a

classifier Select the

predicted value

Page 59: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Select a

classifier Select the

predicted value

Start the evaluation

Page 60: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Using Weka Select a

classifier Select the

predicted value

Start the evaluation

Observe the results

Page 61: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Looking at the Results

Percent correct

Percent correct,controlling for correct by chance

Performance onindividual categories

Confusion matrix

* Right click in Result list and select Save Result Buffer to save performance stats.

Page 62: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Looking at the Results

Percent correct

Percent correct,controlling for correct by chance

Performance onindividual categories

Confusion matrix

* Right click in Result list and select Save Result Buffer to save performance stats.

Page 63: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Notice the shape of the tree(although the text is too small to read!)

Page 64: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Notice the shape of the tree(although the text is too small to read!)

It’s making its decisionbased only on who the

Republican candidate is.

Page 65: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why did it do that?

Page 66: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Where will it make mistakes?

Page 67: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Notice the more complex rule if we force binary splits

Note that the more complexrule performs worse!!!

Page 68: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

More representation issues…

“Gyre” by Eric Rosé

Page 69: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Low resolution image gives some information

Page 70: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Higher resolution image gives more information

Page 71: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

But not if the accuracy is bad

Page 72: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

But not if the accuracy is bad

Question: Whenmight that happen?

Page 73: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Low resolution gives more information if the accuracy is higher

Page 74: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Assignment 1

Page 75: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Assignment 1

Make sure Weka is set up properly on your machine

Know the basics of using Weka

Information about you…

Page 76: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Information about You Learning goals Priority on learning activities Project goals Programming competence

Page 77: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Cross-Validation

Page 78: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…
Page 79: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…
Page 80: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes

Performance ontraining data?

Page 81: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes

Performance ontraining data?

Performance ontesting data?

Page 82: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes

IMPORTANT!If you evaluate the performanceof your rule on the same data

you trained on, you won’tget an accurate estimate of

how well it will do on new data.

Page 83: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

What is cross validation?

Page 84: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

What is cross validation?

Notice thatCross validationis for testingonly! Not forbuilding the rule!

Page 85: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

But then…..

If we are satisfied with the performance estimate we get

Then we build the model with the WHOLE SET

Now let’s see how it works…

Page 86: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

But then…..

If we are satisfied with the performance estimate we get

Then we build the model with the WHOLE SET

Now let’s see how it works…

If you are not satisfied with the performance you get,

then you should try to determine what went wrong,

and then evaluate a different model that compensates.

Page 87: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 2, 3, 4, 5, 6,7 and apply trained model to

1 The results is Accuracy1

1

2

3

4

5

6

7

TEST

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

Fold: 1

Page 88: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1, 3, 4, 5, 6,7 and apply trained model to

2 The results is Accuracy2

1

2

3

4

5

6

7

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

TEST

Fold: 2

Page 89: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1, 2, 4, 5, 6,7 and apply trained model to

3 The results is Accuracy3

1

2

3

4

5

6

7

TRAIN

TRAIN

TRAIN

TRAIN

TEST

TRAIN

TRAIN

Fold: 3

Page 90: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1,2, 3, 5, 6,7 and apply trained model to

4 The results is Accuracy4

1

2

3

4

5

6

7

TRAIN

TRAIN

TRAIN

TEST

TRAIN

TRAIN

TRAIN

Fold: 4

Page 91: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1, 2, 3, 4, 6,7 and apply trained model to

5 The results is Accuracy5

1

2

3

4

5

6

7

TRAIN

TRAIN

TEST

TRAIN

TRAIN

TRAIN

TRAIN

Fold: 5

Page 92: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1, 2, 3, 4, 5, 7 and apply trained model to

6 The results is Accuracy6

1

2

3

4

5

6

7

TRAIN

TEST

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

Fold: 6

Page 93: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Simple Cross Validation Let’s say your data has

attributes A, B, and C

You want to train a rule to predict D

First train on 1, 2, 3, 4, 5, 6 and apply trained model to 7 The results is Accuracy7 Finally: Average Accuracy1

through Accuracy7

1

2

3

4

5

6

7

TRAIN

TRAIN

TRAIN

TRAIN

TRAIN

TEST

TRAIN

Fold: 7

Page 94: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Remember!

If we are satisfied with the performance estimate we get using cross-validation

Then we build the model with the WHOLE SET

We don’t use cross-validation to build the model

Page 95: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why do we do cross validation? Use cross-validation when you do not have

enough data to have completely independent train and test sets

We are trying to estimate what performance would you get if you trained over your whole set and applied that model to an independent set of the same size

We compute that estimate by averaging over folds

Page 96: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Do we have to do all of the folds?

Yes! The test set on each fold is too small to

give you an accurate estimate of performance alone

Variation across folds Evaluation over part of the data is likely to

be misleading

Page 97: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Why do we do cross validation? Makes the most of your data – large

portion used for training Avoids testing on training data

Testing on training data will over estimate your performance!!!

But if you do multiple iterations of cross-validation, in some ways you are using insights from your testing data in building your model

Page 98: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Questions about cross-validation from in-person students…

How do you decide how many folds? How is data divided between folds? Don’t you need to have a hold-out set to

be totally sure you have a good estimate of performance?

Page 99: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Other questions from in-person students…

Do our class projects have to be classification problems per se?Clustering of pen stroke data

Will we learn to work with time series data in this course?

Page 100: Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer…

Questions?