gbm.more gbm in h2o
DESCRIPTION
TRANSCRIPT
![Page 1: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/1.jpg)
H2O – The Open Source Math Engine
H2O and Gradient Boosting
![Page 2: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/2.jpg)
What is Gradient Boosting
gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function
ie gbm is a sum of decision trees
each new tree corrects errors of the previous forest
![Page 3: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/3.jpg)
Why gradient boosting
Performs variable selecting during fitting process• Highly collinear explanatory variables
- glm: backwards/forwards is unstable
Interactions: will search to a specified depth
Captures nonlinearities in the data• ex airlines on-time performance: gbm captures a change in 2001
without analyst having to do so
![Page 4: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/4.jpg)
Why gradient boosting, moreWill naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties)
Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)]
Relatively insensitive to long tailed distributions and outliers
![Page 5: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/5.jpg)
gradient boosting works wellon the right dataset, gbm classification will outperform both glm and random forest
Demonstrates good performance on various classification problems• Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge:
gbm main model to predict telco customer churn
• KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm
• many kaggle winners
• results at previous employers
![Page 6: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/6.jpg)
Inference algorithm (simplified)
1. Initialize k predictors f_k,m=0(x)
2. for m = 1:num_treesa. normalize current predictions
b. for k = 1:num_classes
i. compute pseudo residual r = y – p_k
ii. fit a regression tree to targets r with data X
iii. for each terminal region, compute multiplier that maximizes the deviance loss
iv. f_k,m+1(x) = f_k,m(x) + region multiplier
![Page 7: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/7.jpg)
Regression tree, 1
R1
R2
R4
R3
X1
X2
2
7
1
![Page 8: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/8.jpg)
Regression tree, 2
1-level regression tree: 2 terminal nodes, split decision: minimize squared error
Data (9 observations)
Errors
![Page 9: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/9.jpg)
but has pain points
Slow to fit
Slow to predict
Data size limitations: often downsampling required
Many implementations single threaded
Parameters difficult to understand
Fit with searching, choose with holdout:• Interaction levels / depths [1,5,10,15]
• trees: [10,100,1000,5000]
• learning rate: [.1, .01, .001]
• this is often an overnight job
![Page 10: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/10.jpg)
h2o can help
multicore
distributed
parallel
![Page 11: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/11.jpg)
Questions?
![Page 12: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/12.jpg)
gbm intuition
Why should this work well?
![Page 13: Gbm.more GBM in H2O](https://reader036.vdocuments.site/reader036/viewer/2022082805/54c661444a79594b538b4688/html5/thumbnails/13.jpg)
Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu