lecture 02: evaluating models january 27, 2016 sds 293 machine learning
DESCRIPTION
Outline Evaluating Models Lab pt. 1 – Introduction to R: - Basic Commands - Graphics Overview - Indexing Data - Loading Data - Additional Graphical/Numerical Summaries Lab pt. 2 - Exploring other datasets (time permitting)TRANSCRIPT
![Page 1: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/1.jpg)
LECTURE 02:
EVALUATING MODELSJanuary 27, 2016
SDS 293Machine Learning
![Page 2: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/2.jpg)
Announcements / Questions
• Life Sciences and Technology Fair is tomorrow:
3:30-6pm in the Carroll Roomwww.smith.edu/lazaruscenter/fairs_scitech.php
• Office hours: does anyone have a conflict?
![Page 3: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/3.jpg)
Outline• Evaluating Models
• Lab pt. 1 – Introduction to R:- Basic Commands- Graphics Overview- Indexing Data- Loading Data- Additional Graphical/Numerical Summaries
• Lab pt. 2 - Exploring other datasets (time permitting)
![Page 4: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/4.jpg)
Beyond LR
Stated goal of this course:
explore methods that go beyond standard linear regression
![Page 5: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/5.jpg)
One tool to rule them all…?
Question: why not just teach you the best one first?
![Page 6: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/6.jpg)
Answer: it depends
• No single method dominates over all
• On a particular data set, for a particular question, one specific method may work well; on a related but not identical dataset or question, another might be better
• Choosing the right approach is arguably the most challenging aspect of doing statistics in practice
• So how do we do it?
![Page 7: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/7.jpg)
Measuring “Quality of Fit”
• One question we might ask: how well do my model’s predictions actually match the observations?
• What we need: a way to measure how close the predicted response is to the true response
• Flashback to your stats training: what do we use in regression?
![Page 8: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/8.jpg)
Mean Squared Error
True responsefor the ith observation
Prediction our model givesfor the ith observation
We take the averageover all observations
Of the squared difference
![Page 9: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/9.jpg)
“Training” MSE
• This version of MSE is computed using the training data that was used to fit the model
• Reality check: is this what we care about?
![Page 10: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/10.jpg)
Test MSE
• Better plan: see how well the model does on observations we didn’t train on
• Given some never-before-seen examples, we can just calculate the MSE on those using the same method
• But what if we don’t have any new observations to test? - Can we just use the training MSE?- Why or why not?
![Page 11: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/11.jpg)
Example
Avg. training MSE
Test MSE
![Page 12: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/12.jpg)
Training vs. Test MSE
• As the flexibility of the statistical learning method increases, we observe:- a monotone decrease in the
training MSE- a U-shape in the test MSE
• Fun fact: occurs regardless of the data set and statistical method being used
• As flexibility increases, training MSE will decrease, but the test MSE may not
Overfitting
![Page 13: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/13.jpg)
Trade-off between bias and variance
• The U-shaped curve in the Test MSE is the result of two competing properties: bias and variance
• Variance refers to the amount by which the model would change if we estimated it using different training data
• Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) using a much simpler model
![Page 14: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/14.jpg)
Relationship between bias and variance• In general, more flexible methods have higher variance
![Page 15: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/15.jpg)
Relationship between bias and variance• In general, more flexible methods have lower bias
![Page 16: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/16.jpg)
Trade-off between bias and variance
• It is possible to show that the expected test MSE for a given value can be decomposed into three terms:
The bias of our model on the test value
The variance of our model on the test value
The varianceof the error terms
![Page 17: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/17.jpg)
Balancing bias and variance
• We know variance and squared bias are always nonnegative (why?)
• There’s nothing we can do about the variance of the irreducible error inherent in the model
• So we’re looking for a method that minimizes the sum of the first two terms… which are (in some sense) competing
![Page 18: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/18.jpg)
Balancing bias and variance
• It’s easy to build a model with low variance but high bias (how?)
• Just as easy to build one withlow bias but high variance (how?)
• The challenge: finding a method for which both the variance and the squared bias are low
• This trade-off is one of the most important recurring themes in this course
![Page 19: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/19.jpg)
What about classification?
• So far, we’ve only talked about how to evaluate the accuracy of a regression model
• The idea of a bias-variance trade-off also translates to the classification setting, but we need some minor modifications to deal with qualitative responses
• For example: we can’t really compute MSE without numerical values, so what can we do instead?
![Page 20: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/20.jpg)
We tally upall the times
Where the model’s classificationwas different from the actual class
Training error rate
• One common approach is to use the training error rate, where we measure the proportion of the times our model incorrectly classifies a training data point:
Using an indicator function
And take theaverage
![Page 21: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/21.jpg)
Takeaways
• Choosing the “right” level of flexibility is critical for success in both the regression and classification settings
• The bias-variance tradeoff can make this a difficult task
• In Chapter 5, we’ll return to this topic and explore various methods for estimating test error rates
• We’ll then use these estimates to find the optimal level of flexibility for a given ML method
![Page 22: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/22.jpg)
Questions?
![Page 23: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/23.jpg)
Lab pt. 1: Introduction to R
• Basic Commands• Graphics• Indexing data• Loading external data• Generating summaries• Playing with real data (time permitting!)
![Page 24: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/24.jpg)
Lab pt. 1: Introduction to R
![Page 25: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/25.jpg)
Lab pt. 1: Introduction to R
![Page 26: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/26.jpg)
Lab pt. 1: Introduction to R
![Page 27: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/27.jpg)
Lab pt. 1: Introduction to R
• Today’s walkthrough (and likely many others) will be run using:
which allows me to build “notebooks” which run live R code (python, too!) in the browser
• Hint: this is also a nice way to format your homework!
![Page 28: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/28.jpg)
Lab pt. 2: Exploring Other Datasets
• More datasets from the book - ISLR package- Already installed on Smith Rstudio server- Working locally? > install.packages(‘ISLR’)- Details available at: cran.r-project.org/web/packages/ISLR- Dataset descriptions: www.inside-r.org/packages/cran/ISLR/docs
• Real world data:- Olympic Athletes: goo.gl/1aUnJW- World Bank Indicators: goo.gl/0QdN9U- Airplane Bird Strikes: goo.gl/lFl5ld- …and a whole bunch more: goo.gl/kcbqfc
![Page 29: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062306/5a4d1b5a7f8b9ab0599aac27/html5/thumbnails/29.jpg)
Coming Up
• Next class: Linear Regression 1: Simple and Multiple LR
• For planning purposes: Assignment 1 will be posted next week, and will be due the following Weds (Feb. 10th)