ensemble(methods( -...
TRANSCRIPT
![Page 1: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/1.jpg)
Ensemble Methods
Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ensemble-‐machine-‐learning-‐tutorial/
![Page 2: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/2.jpg)
Outline
• The NeHlix Prize – Success of ensemble methods in the NeHlix Prize
• Why Ensemble Methods Work • Algorithms
– Bagging – Random forests – AdaBoost
![Page 3: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/3.jpg)
Defini<on
Ensemble Classifica.on
Aggrega<on of predic<ons of mul<ple classifiers with the goal of improving accuracy.
![Page 4: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/4.jpg)
Teaser: How good are ensemble methods?
Let’s look at the Ne-lix Prize Compe66on…
![Page 5: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/5.jpg)
• Supervised learning task – Training data is a set of users and ra<ngs (1,2,3,4,5 stars) those users have given to movies.
– Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars
• $1 million prize for a 10% improvement over NeHlix’s current movie recommender/classifier (MSE = 0.9514)
Began October 2006
![Page 6: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/6.jpg)
Just three weeks aaer it began, at least 40 teams had bested the NeHlix classifier.
Top teams showed about 5% improvement.
![Page 7: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/7.jpg)
from h8p://www.research.a8.com/~volinsky/neHlix/
However, improvement slowed…
![Page 8: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/8.jpg)
Today, the top team has posted a 8.5% improvement.
Ensemble methods are the best performers…
![Page 9: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/9.jpg)
“Thanks to Paul Harrison's collabora<on, a simple mix of our solu<ons improved our result from 6.31 to 6.75”
Rookies
![Page 10: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/10.jpg)
“My approach is to combine the results of many methods (also two-‐way interac<ons between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”
Arek Paterek
h8p://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
![Page 11: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/11.jpg)
“When the predic<ons of mul.ple RBM models and mul.ple SVD models are linearly combined, we achieve an error rate that is well over 6% be8er than the score of NeHlix’s own system.”
U of Toronto
h8p://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
![Page 12: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/12.jpg)
Gravity
home.mit.bme.hu/~gtakacs/download/gravity.pdf
![Page 13: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/13.jpg)
“Our common team blends the result of team Gravity and team Dinosaur Planet.”
Might have guessed from the name…
When Gravity and Dinosaurs Unite
![Page 14: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/14.jpg)
And, yes, the top team which is from AT&T…
“Our final solu.on (RMSE=0.8712) consists of blending 107 individual results. “
BellKor / KorBell
![Page 15: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/15.jpg)
The winner was an ensemble of ensembles (including BellKor).
![Page 16: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/16.jpg)
Some Intui<ons on Why Ensemble Methods Work…
![Page 17: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/17.jpg)
Intui<ons
• U<lity of combining diverse, independent opinions in human decision-‐making – Protec<ve Mechanism (e.g. stock porHolio diversity)
• Viola.on of Ockham’s Razor – Iden<fying the best model requires iden<fying the proper "model complexity"
See Domingos, P. Occam’s two razors: the sharp and the blunt. KDD. 1998.
![Page 18: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/18.jpg)
Majority vote Suppose we have 5 completely independent classifiers… – If accuracy is 70% for each
• 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) • 83.7% majority vote accuracy
– 101 such classifiers • 99.9% majority vote accuracy
Intui<ons
![Page 19: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/19.jpg)
Strategies
Bagging – Use different samples of observa<ons and/or predictors (features) of the examples to generate diverse classifiers
– Aggregate classifiers: average in regression, majority vote in classifica<on
Boos<ng – Make examples currently misclassified more important (or less, in some cases)
![Page 20: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/20.jpg)
Bagging (Construc<ng for Diversity)
1. Use random samples of the examples to construct the classifiers
2. Use random feature sets to construct the classifiers • Random Decision Forests
• Bagging: Bootstrap Aggrega<on
Leo Breiman
![Page 21: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/21.jpg)
• Bootstrap: consider the following situa<on: – A random sample from unknown probability distribu<on F
– We wish to es<mate parameter
– We build es<mate – What is the s.d. of ?
x = (x1, . . . , xN )
θ = t(F )θ̂ = s(x)
θ̂
Examples: 1) estimate mean and sd of expected prediction error
2) estimate point-wise confidence bands in smoothing
![Page 22: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/22.jpg)
• Bootstrap: – It is completely automa<c
– Requires no theore<cal calcula<ons – Not based on asympto<c results – Available regardless of how complicated the es<mator θ̂
![Page 23: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/23.jpg)
• Bootstrap algorithm: 1. Select B independent bootstrap samples
each consis<ng of N data values drawing with replacement from
2. Evaluate the bootstrap replica<on corresponding to each bootstrap sample
3. Es<mate the standard error of using the sample standard error of the B es<mates
x∗1, . . . ,x∗B
x
θ̂∗(b) = s(x∗b), b = 1, . . . , B
θ̂
![Page 24: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/24.jpg)
• Bagging: use bootstrap to improve predic6ons 1. Create bootstrap samples, es<mate model from
each bootstrap sample 2. Aggregate predic<ons (average if regression,
majority vote if classifica<on)
• This works best when perturbing the training set can cause significant changes in the es<mated model
• For instance, for least-‐squares, can show variance is decreased while bias is unchanged
![Page 25: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/25.jpg)
Random forests
• At every level, choose a random subset of the variables (predictors, not examples) and choose the best split among those a8ributes
![Page 26: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/26.jpg)
Random forests
• Let the number of training points be M, and the number of variables in the classifier be N.
For each tree, 1. Choose a training set by choosing N <mes with
replacement from all N available training cases.
2. For each node, randomly choose m variables on which to base the decision at that node.
Calculate the best split based on these.
![Page 27: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/27.jpg)
Random forests
• Grow each tree as deep as possible no pruning!
• Out-‐of-‐bag data can be used to es<mate cross-‐valida<on error – For each training point, get predic<on from averaging trees where point is not included in bootstrap sample
• Variable importance measures are easy to calculate
![Page 28: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/28.jpg)
![Page 29: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/29.jpg)
Boos<ng
1. Create a sequence of classifiers, giving higher influence to more accurate classifiers
2. At each itera<on, make examples currently misclassified more important (get larger weight in the construc<on of the next classifier).
3. Then combine classifiers by weighted vote (weight given by classifier accuracy)
![Page 30: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/30.jpg)
AdaBoost Algorithm 1. Ini<alize Weights: each case gets the same weight:
2. Construct a classifier using current weights. Compute its error:
3. Get classifier influence, and update example weights
4. Goto step 2…
wi = 1/N, i = 1, . . . , N
εm =�
i wi × I{yi �= gm(xi)}�i wi
αm = log�
1− εm
εm
�wi ← wi × exp {αmI{yi �= gm(xi)}}
Final predic<on is weighted vote, with weight αm
![Page 31: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/31.jpg)
Classifica.ons (colors) and Weights (size) a[er 1 itera+on Of AdaBoost
3 itera+ons 20 itera+ons
from Elder, John. From Trees to Forests and Rule Sets -‐ A Unified Overview of Ensemble Methods. 2007.
![Page 32: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/32.jpg)
AdaBoost
• Advantages – Very li8le code – Reduces variance
• Disadvantages – Sensi<ve to noise and outliers. Why?
![Page 33: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/33.jpg)
How was the NeHlix prize won?
Gradient boosted decision trees…
Details: http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
![Page 34: Ensemble(Methods( - UMIACSusers.umiacs.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdfDefinion EnsembleClassifica.on Aggrega](https://reader033.vdocuments.site/reader033/viewer/2022042911/5f4294e1f21b995a5f1127c8/html5/thumbnails/34.jpg)
Sources • David Mease. Sta<s<cal Aspects of Data Mining. Lecture.
h8p://video.google.com/videoplay?docid=-‐4669216290304603251&q=stats+202+engEDU&total=13&start=0&num=10&so=0&type=search&plindex=8
• Die8erich, T. G. Ensemble Learning. In The Handbook of Brain Theory and Neural Networks, Second edi<on, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. h8p://www.cs.orst.edu/~tgd/publica<ons/hbtnn-‐ensemble-‐learning.ps.gz
• Elder, John and Seni Giovanni. From Trees to Forests and Rule Sets -‐ A Unified Overview of Ensemble Methods. KDD 2007 h8p://Tutorial. videolectures.net/kdd07_elder_afr/
• NeHlix Prize. h8p://www.neHlixprize.com/ • Christopher M. Bishop. Neural Networks for Pa8ern Recogni<on. Oxford University Press. 1995.