Алексей Натекин (dm labs, opendatascience): «Градиентный бустинг:...
TRANSCRIPT
Gradient Boosting New stuff, possibilities and tricks
AlexNatekin
Such boosting
wow much learning
MicrosoftLightGBM
CRAN
Boost our plan for today:
GBM as of May 2017
Inside the black box
Lesser knowncapabilities
MicrosoftLightGBM
CRAN
•Leaf-wise tree growth
•Histogram-based trees
•Feature & data parallel split search
•Common tasks
•Regularized tree structure
•(new) histogram-based trees
•Feature parallel split search
•Common tasks + full customization
•Vanilla + TONS of tweaks
•Histogram-based optimisation
•Feature parallel split search
•Common tasks, some extensions
•Vanilla •Some tree implementations are plain bad
•As extensible as one wants
Main GBM libraries:
tree_method = “hist”CRAN
MicrosoftLightGBM
CRAN
Current competition:
Next big boost:
Next big boost:
Such challenge
wow much kaggle
?
7
• A lot of implementations, great, good and so-so: • Multi-platform solutions outperfrom all: xgboost, lightgbm and h2o • There are many niche packages with specialised boosters, losses and tweaks
• GBM benchmarks: • https://github.com/szilard/benchm-ml • https://medium.com/data-design/exact-xgboost-and-fast-histogram-xgboost-training-
speed-comparison-17f95cee68b5 • https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs-
xgboost-7b5484746ac4
• Next big thing - GBM on GPU, currently in active development: • https://blog.h2o.ai/2017/05/machine-learning-on-gpus/ • Xgboost also has it’s GPU implementation, but H2O wrapped it under it’s framework
GBM as of May 2017
Inside the black-box
Variable importance
Partial dependency plots
Distillation and GBM reconstruction
• GBM variable importance: • Mostly implemented as gains and frequencies across splits. Don’t trust them • Better approach for a blackbox - via shuffling variables and looking at loss change • Nice packages: https://github.com/limexp/xgbfir/ + https://github.com/Far0n/xgbfi
• Partial dependence plots: • Just fix all variables to mean values and plot prediction grids for chosen variables • Useful for overall model validation and highlighting strong interactions • Very useful for validation key features and (chosen) interactions
• GBM distillation and reconstruction: • Use Xgboost functions predict_leaf_indices • Lasso, glmnet, glinternet • Can actually refit it all
Inside the black-box
Random cool stuff
Varying tree complexity
Tuning: Discrete random FTW
RL: boosting for minecraft
• You can tweak GBM a lot: • Changing tree depth across iterations (smaller first, deeper afterwards) • Same applies to other parameters (deeper trees might need more randomness)
• Tuning GBM: • Better to tune alpha\eta\shrinkage with fixed number of trees • Packages bring more hyper parameter tweaks, histogram resolution is often useful • Discrete random search works really well, significantly decrease time • H2O has it off-the-shelf https://blog.h2o.ai/2016/06/h2o-gbm-tuning-tutorial-for-r/
• GBM for random cool tasks: • Strange yet working demo for RL and Minecraft https://arxiv.org/pdf/1603.04119.pdf • Some custom GBMs for NER with CRF http://proceedings.mlr.press/v38/chen15b.pdf
Random cool stuff
1. Lightgbm seems like the go-to 2017 GBM library
2. Await many cool news with GPU (especially H2O + Xgboost)
3. Don’t forget about model inspection and PDP
4. We have the distillation capabilities, why are we not using them?
5. Random search helps a lot with tuning
Summary
Thanks!