l10 bias variance naive bayes - virginia techf16ece5424/slides/l10_bias_variance... · • matlab...
TRANSCRIPT
![Page 1: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/1.jpg)
ECE 5424: Introduction to Machine Learning
Stefan LeeVirginia Tech
Topics: – (Finish) Model selection– Error decomposition
– Bias-Variance Tradeoff– Classification: Naïve Bayes
Readings: Barber 17.1, 17.2, 10.1-10.3
![Page 2: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/2.jpg)
Administrativia• HW2
– Due: Wed 09/28, 11:55pm– Implement linear regression, Naïve Bayes, Logistic Regression
• Project Proposal due tomorrow by 11:55pm!!
(C) Dhruv Batra 2
![Page 3: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/3.jpg)
Recap of last time
(C) Dhruv Batra 3
![Page 4: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/4.jpg)
Regression
(C) Dhruv Batra 4
![Page 5: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/5.jpg)
(C) Dhruv Batra 5Slide Credit: Greg Shakhnarovich
![Page 6: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/6.jpg)
(C) Dhruv Batra 6Slide Credit: Greg Shakhnarovich
![Page 7: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/7.jpg)
(C) Dhruv Batra 7Slide Credit: Greg Shakhnarovich
![Page 8: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/8.jpg)
What you need to know• Linear Regression
– Model– Least Squares Objective– Connections to Max Likelihood with Gaussian Conditional– Robust regression with Laplacian Likelihood– Ridge Regression with priors– Polynomial and General Additive Regression
(C) Dhruv Batra 8
![Page 9: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/9.jpg)
Plan for Today• (Finish) Model Selection
– Overfitting vs Underfitting– Bias-Variance trade-off
• aka Modeling error vs Estimation error tradeoff
• Naïve Bayes
(C) Dhruv Batra 9
![Page 10: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/10.jpg)
New Topic: Model Selection and Error Decomposition
(C) Dhruv Batra 10
![Page 11: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/11.jpg)
Model Selection• How do we pick the right model class?
• Similar questions– How do I pick magic hyper-parameters?– How do I do feature selection?
(C) Dhruv Batra 11
![Page 12: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/12.jpg)
Errors• Expected Loss/Error
• Training Loss/Error • Validation Loss/Error• Test Loss/Error
• Reporting Training Error (instead of Test) is CHEATING
• Optimizing parameters on Test Error is CHEATING
(C) Dhruv Batra 12
![Page 13: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/13.jpg)
(C) Dhruv Batra 13Slide Credit: Greg Shakhnarovich
![Page 14: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/14.jpg)
(C) Dhruv Batra 14Slide Credit: Greg Shakhnarovich
![Page 15: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/15.jpg)
(C) Dhruv Batra 15Slide Credit: Greg Shakhnarovich
![Page 16: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/16.jpg)
(C) Dhruv Batra 16Slide Credit: Greg Shakhnarovich
![Page 17: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/17.jpg)
(C) Dhruv Batra 17Slide Credit: Greg Shakhnarovich
![Page 18: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/18.jpg)
Overfitting• Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w’ such that:
18(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 19: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/19.jpg)
Error Decomposition
(C) Dhruv Batra 19
Reality
![Page 20: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/20.jpg)
Error Decomposition
(C) Dhruv Batra 20
Reality
![Page 21: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/21.jpg)
Error Decomposition
(C) Dhruv Batra 21
Reality
Higher-OrderPotentials
![Page 22: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/22.jpg)
Error Decomposition• Approximation/Modeling Error
– You approximated reality with model
• Estimation Error– You tried to learn model with finite data
• Optimization Error– You were lazy and couldn’t/didn’t optimize to completion
• Bayes Error– Reality just sucks (i.e. there is a lower bound on error for all models, usually non-zero)
(C) Dhruv Batra 22
![Page 23: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/23.jpg)
Bias-Variance Tradeoff• Bias: difference between what you expect to learn and truth– Measures how well you expect to represent true solution– Decreases with more complex model
• Variance: difference between what you expect to learn and what you learn from a from a particular dataset – Measures how sensitive learner is to specific dataset– Increases with more complex model
(C) Dhruv Batra 23Slide Credit: Carlos Guestrin
![Page 24: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/24.jpg)
Bias-Variance Tradeoff• Matlab demo
(C) Dhruv Batra 24
![Page 25: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/25.jpg)
Bias-Variance Tradeoff
• Choice of hypothesis class introduces learning bias– More complex class → less bias– More complex class → more variance
25(C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 26: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/26.jpg)
(C) Dhruv Batra 26Slide Credit: Greg Shakhnarovich
![Page 27: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/27.jpg)
Learning Curves• Error vs size of dataset
• On board– High-bias curves– High-variance curves
(C) Dhruv Batra 27
![Page 28: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/28.jpg)
Debugging Machine Learning• My algorithm doesn’t work
– High test error
• What should I do?– More training data– Smaller set of features– Larger set of features– Lower regularization– Higher regularization
(C) Dhruv Batra 28
![Page 29: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/29.jpg)
What you need to know• Generalization Error Decomposition
– Approximation, estimation, optimization, bayes error– For squared losses, bias-variance tradeoff
• Errors– Difference between train & test error & expected error– Cross-validation (and cross-val error)– NEVER EVER learn on test data
• Overfitting vs Underfitting
(C) Dhruv Batra 29
![Page 30: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/30.jpg)
New Topic: Naïve Bayes
(your first probabilistic classifier)
(C) Dhruv Batra 30
Classificationx y Discrete
![Page 31: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/31.jpg)
Classification• Learn: h:X! Y
– X – features– Y – target classes
• Suppose you know P(Y|X) exactly, how should you classify?– Bayes classifier:
• Why?
Slide Credit: Carlos Guestrin
![Page 32: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/32.jpg)
Optimal classification
• Theorem: Bayes classifier hBayes is optimal!
– That is
• Proof:
Slide Credit: Carlos Guestrin
![Page 33: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/33.jpg)
Generative vs. Discriminative
• Generative Approach– Estimate p(x|y) and p(y)– Use Bayes Rule to predict y
• Discriminative Approach– Estimate p(y|x) directly OR– Learn “discriminant” function h(x)
(C) Dhruv Batra 33
![Page 34: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/34.jpg)
Generative vs. Discriminative• Generative Approach
– Assume some functional form for P(X|Y), P(Y)– Estimate p(X|Y) and p(Y)– Use Bayes Rule to calculate P(Y| X=x)– Indirect computation of P(Y|X) through Bayes rule– But, can generate a sample, P(X) = �y P(y) P(X|y)
• Discriminative Approach– Estimate p(y|x) directly OR– Learn “discriminant” function h(x)– Direct but cannot obtain a sample of the data, because P(X) is not available
(C) Dhruv Batra 34
![Page 35: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/35.jpg)
Generative vs. Discriminative• Generative:
– Today: Naïve Bayes
• Discriminative:– Next: Logistic Regression
• NB & LR related to each other.
(C) Dhruv Batra 35
![Page 36: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/36.jpg)
How hard is it to learn the optimal classifier?
Slide Credit: Carlos Guestrin
• Categorical Data
• How do we represent these? How many parameters?– Class-Prior, P(Y):
• Suppose Y is composed of k classes
– Likelihood, P(X|Y):• Suppose X is composed of d binary features
• Complex model à High variance with limited data!!!
![Page 37: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/37.jpg)
Independence to the rescue
(C) Dhruv Batra 37Slide Credit: Sam Roweis
![Page 38: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/38.jpg)
The Naïve Bayes assumption• Naïve Bayes assumption:
– Features are independent given class:
– More generally:
• How many parameters now?• Suppose X is composed of d binary features
Slide Credit: Carlos Guestrin(C) Dhruv Batra 38
d
![Page 39: L10 bias variance naive bayes - Virginia Techf16ece5424/slides/L10_bias_variance... · • Matlab demo (C)#DhruvBatra# 24. BiasDVariance#Tradeoff • Choice#of#hypothesisclassintroduceslearning#bias](https://reader031.vdocuments.site/reader031/viewer/2022022116/5c853edd09d3f2b2468bbfea/html5/thumbnails/39.jpg)
The Naïve Bayes Classifier
Slide Credit: Carlos Guestrin(C) Dhruv Batra 39
• Given:– Class-Prior P(Y)– d conditionally independent features X given the class Y– For each Xi, we have likelihood P(Xi|Y)
• Decision rule:
• If assumption holds, NB is optimal classifier!