ece 5984: introduction to machine learnings15ece5984/slides/l20... · 2015. 4. 20. · – resume...
TRANSCRIPT
![Page 1: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/1.jpg)
ECE 5984: Introduction to Machine Learning
Dhruv Batra Virginia Tech
Topics: – Ensemble Methods: Bagging, Boosting
Readings: Murphy 16.4; Hastie 16
![Page 2: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/2.jpg)
Administrativia • HW3
– Due: April 14, 11:55pm – You will implement primal & dual SVMs – Kaggle competition: Higgs Boson Signal vs Background
classification – https://inclass.kaggle.com/c/2015-Spring-vt-ece-machine-
learning-hw3 – https://www.kaggle.com/c/higgs-boson
(C) Dhruv Batra 2
![Page 3: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/3.jpg)
Administrativia • Project Mid-Sem Spotlight Presentations
– 9 remaining – Resume in class on April 20th
– Format • 5 slides (recommended) • 4 minute time (STRICT) + 1-2 min Q&A
– Content • Tell the class what you’re working on • Any results yet? • Problems faced?
– Upload slides on Scholar
(C) Dhruv Batra 3
![Page 4: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/4.jpg)
New Topic: Ensemble Methods
(C) Dhruv Batra 4 Image Credit: Antonio Torralba
Bagging Boosting
![Page 5: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/5.jpg)
Synonyms • Ensemble Methods
• Learning Mixture of Experts/Committees
• Boosting types – AdaBoost – L2Boost – LogitBoost – <Your-Favorite-keyword>Boost
(C) Dhruv Batra 5
![Page 6: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/6.jpg)
A quick look back • So far you have learnt
• Regression – Least Squares – Robust Least Squares
• Classification – Linear
• Naïve Bayes • Logistic Regression • SVMs
– Non-linear • Decision Trees • Neural Networks • K-NNs
(C) Dhruv Batra 6
![Page 7: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/7.jpg)
Recall Bias-Variance Tradeoff • Demo
– http://www.princeton.edu/~rkatzwer/PolynomialRegression/
(C) Dhruv Batra 7
![Page 8: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/8.jpg)
Bias-Variance Tradeoff
• Choice of hypothesis class introduces learning bias – More complex class → less bias – More complex class → more variance
8 (C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 9: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/9.jpg)
Fighting the bias-variance tradeoff • Simple (a.k.a. weak) learners
– e.g., naïve Bayes, logistic regression, decision stumps (or shallow decision trees)
– Good: Low variance, don’t usually overfit – Bad: High bias, can’t solve hard learning problems
• Sophisticated learners – Kernel SVMs, Deep Neural Nets, Deep Decision Trees – Good: Low bias, have the potential to learn with Big Data – Bad: High variance, difficult to generalize
• Can we make combine these properties – In general, No!! – But often yes…
9 (C) Dhruv Batra Slide Credit: Carlos Guestrin
![Page 10: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/10.jpg)
Voting (Ensemble Methods) • Instead of learning a single classifier,
learn many classifiers
• Output class: (Weighted) vote of each classifier – Classifiers that are most “sure” will vote with more conviction
• With sophisticated learners – Uncorrelated errors à expected error goes down – On average, do better than single classifier! – Bagging
• With weak learners – each one good at different parts of the input space – On average, do better than single classifier! – Boosting
(C) Dhruv Batra 10
![Page 11: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/11.jpg)
Bagging • Bagging = Bootstrap Averaging
– On board
– Bootstrap Demo • http://wise.cgu.edu/bootstrap/
(C) Dhruv Batra 11
![Page 12: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/12.jpg)
Learn many trees & Average Outputs
Will formally visit this in Bagging lecture
… …
Decision Forests
![Page 13: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/13.jpg)
• Pick a class of weak learners
• You have a black box that picks best weak learning – unweighted sum
– weighted sum
• On each iteration t – Compute error for each point – Update weight of each training example based on it’s error. – Learn a hypothesis – ht – A strength for this hypothesis – αt
• Final classifier:
(C) Dhruv Batra 13
Boosting [Schapire, 1989]
H = {h | h : X � Y }
ft�1(xi) =tX
t0=1
�t0ht0(xi)
h
� = argminh⇥H
X
i
wiL (yi, h(xi))
h
� = argminh⇥H
X
i
L (yi, h(xi))
ft(x) = ft�1 + �tht(x)
![Page 14: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/14.jpg)
Boosting • Demo
– Matlab demo by Antonio Torralba – http://people.csail.mit.edu/torralba/shortCourseRLOC/
boosting/boosting.html
(C) Dhruv Batra 14
![Page 15: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/15.jpg)
Types of Boosting
(C) Dhruv Batra 15
Loss Name Loss Formula Boosting Name
Regression: Squared Loss L2Boosting
Regression: Absolute Loss Gradient Boosting
Classification: Exponential Loss AdaBoost
Classification: Log/Logistic Loss LogitBoost
|y � f(x)|
(y � f(x))2
log
⇣1 + e�yf(x)
⌘
e�yf(x)
![Page 16: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/16.jpg)
L2 Boosting
• Algorithm – On Board
(C) Dhruv Batra 16
Loss Name Loss Formula Boosting Name
Regression: Squared Loss L2Boosting (y � f(x))2
![Page 17: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/17.jpg)
Adaboost
• Algorithm – You will derive in HW4!
• Guaranteed to achieve zero training error – With infinite rounds of boosting (assuming no label noise) – Need to do early stopping
(C) Dhruv Batra 17
Loss Name Loss Formula Boosting Name
Classification: Exponential Loss AdaBoost e�yf(x)
![Page 18: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/18.jpg)
What you should know • Voting/Ensemble methods
• Bagging – How to sample – Under what conditions is error reduced
• Boosting – General outline – L2Boosting derivation – Adaboost derivation
(C) Dhruv Batra 18
![Page 19: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/19.jpg)
Learning Theory
Probably Approximately Correct (PAC) Learning What does it formally mean to learn?
(C) Dhruv Batra 19
![Page 20: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/20.jpg)
Learning Theory • We have explored many ways of learning from data
• But… – How good is our classifier, really? – How much data do I need to make it “good enough”?
(C) Dhruv Batra 20 Slide Credit: Carlos Guestrin
![Page 21: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/21.jpg)
A simple setting… • Classification
– N data points – Finite space H of possible hypothesis
• e.g. dec. trees of depth d
• A learner finds a hypothesis h that is consistent with training data – Gets zero error in training – errortrain(h) = 0
• What is the probability that h has more than ε true error? – errortrue(h) ≥ ε
(C) Dhruv Batra 21 Slide Credit: Carlos Guestrin
![Page 22: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/22.jpg)
Even if h makes zero errors in training data, may make errors in test
Generalization error in finite hypothesis spaces [Haussler ’88]
• Theorem: – Hypothesis space H finite – dataset D with N i.i.d. samples – 0 < ε < 1
For any learned hypothesis h that is consistent on the training data:
(C) Dhruv Batra 22 Slide Credit: Carlos Guestrin
N
![Page 23: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/23.jpg)
Using a PAC bound
• Typically, 2 use cases: – 1: Pick ε and δ, give you N – 2: Pick N and δ, give you ε
(C) Dhruv Batra 23 Slide Credit: Carlos Guestrin
N
![Page 24: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/24.jpg)
Haussler ‘88 bound
• Strengths: – Holds for all (finite) H – Holds for all data distributions
• Weaknesses – Consistent classifier – Finite hypothesis space
(C) Dhruv Batra 24 Slide Credit: Carlos Guestrin
N
![Page 25: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/25.jpg)
• Theorem: – Hypothesis space H finite – dataset D with N i.i.d. samples – 0 < ε < 1
For any learned hypothesis h:
Generalization bound for |H| hypothesis
(C) Dhruv Batra 25 Slide Credit: Carlos Guestrin
N
![Page 26: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L20... · 2015. 4. 20. · – Resume in class on April 20th – Format • 5 slides (recommended) • 4 minute time (STRICT)](https://reader035.vdocuments.site/reader035/viewer/2022071218/60512a1949a023648f0b7a9c/html5/thumbnails/26.jpg)
Important: PAC bound holds for all h, but doesn’t guarantee that algorithm finds best h!!!
or, after moving some terms around, with probability at least 1-δ:
PAC bound and Bias-Variance tradeoff
(C) Dhruv Batra 26 Slide Credit: Carlos Guestrin
N
N