xgboost: a scalable tree boosting system184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... ·...

33
XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

XGBOOST: A SCALABLE TREE BOOSTING SYSTEMADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016

Page 2: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

OutlineIntroduction

Method

Experiment

Conclusion

2

Page 3: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

IntroductionRegression tree

CART (Gini)

Boosting

Ensemble method, an iterative procedure adaptively change the distribution of training examples.

Adaboost

3

Page 4: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

IntroductionThe most important factor of XGBoost —

Scalability.

Billions of examples.

4

Page 5: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

IntroductionA practical choice:

17 out of 29 winning solutions in Kaggle 2015.

Top-10 teams all used XGBoost in KDDcup 2015

T-brain: used in top-3 teams.

Ad click through rate prediction, malware classification, customer behavior prediction, etc.

5

Page 6: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodTree ensemble model:

Prediction Leaf weights of a tree

6

Page 7: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodRegularized objective function:

Differentiable convex loss function

Number of leaves +

Weights on leave

Model complexity

Number of leaves

7

Objective function

Page 8: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodGradient tree boosting:

Model is trained in additive manner.

Usual

__

_________

8

Objective function

Page 9: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodAdditive training (Boosting)

9

Objective function

Page 10: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

Method

Taylor expansion:

10

Objective function

Page 11: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

:instance set of j ( xi in leaf j )

Method

T : number of leaf

11

Objective function

Page 12: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

Method

For a fixed tree q, the optimal weight is:

12

Objective function

Page 13: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodFor a fixed tree q, the optimal weight is:

The corresponding optimal value is:

13

Objective function

Page 14: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodFrom now, if the tree is known, we get the optimal value.

The problem becomes “what tree is the best ?”

Left subtree. Right subtree. Parent

Loss reduction

The larger the better, might be negative

Greedy strategy

14

Objective function

Page 15: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodPreventing overfitting further:

Shrinkage.

Subsampling. (column)

15

Objective function

Page 16: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodBasic Exact Greedy Algorithm.

Approximate Algorithm.

Global

Local

16

Split Finding

Page 17: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodBasic Exact Greedy Algorithm:

17

Split Finding

.m

When to stop?

Page 18: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodB.E.G.A. is good, since all possible splits, but…. When data can’t fit in memory, the thrashing slow down the system.

Approximations:

18

Split Finding

Page 19: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodLocal/ Global agendas:

Global: less proposal but more candidate point.

19

Split Finding

Page 20: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodWeighted quantile sketch:

Each interval has the same “impact” on OF.

20

Split Finding

Page 21: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodSparsity-aware:

Possible reasons:

Missing value

Frequent zero

Artifacts of feature engineering (like one-hot)

Solution: default direction

21

Split Finding

Page 22: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

Method

22

Split Finding

Sort criteria: Missing value last

Learn the best direction (of the feature)

Page 23: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodNon-presence -> missing value.

Only deal with presence.

50x faster than naive ver. , on Allstate.

23

Split Finding

Page 24: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodThe most time consuming part: sorting.

Sort just once.

Store data in in-memory unit: block.

24

System Design

Page 25: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodCSC format (compressed column)

Ex:

Different blocks can be distributed across machine, stored on disk in the out-of-core setting.

25

System Design

Page 26: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodBlock structure helps split finding.

However, it’s a non-continuous memory access.

Solution: allocate an internal buffer in each thread.

26

System Design

Page 27: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodBlock size matters. (max number of examples)

Small blocks result in small workload for each thread.

Large blocks lead cache missing.

27

System Design

Balance!

Page 28: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

MethodOut-of-core computation:

Block compression

Ex: [0, 2, 2, 0, 1, 2]

Block sharding

A prefetch thread is assigned to each disk.

28

System Design

Page 29: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

Experiment

The open source package:

GitHub.com/dmlc/xgboost

29

Page 30: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

ExperimentClassification:

GBM expands one branch of a tree.

Other two expand full tree.

30

Page 31: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

ExperimentLearning to rank:

pGBRT: the best previously published system.

pGBRT only supports approximate algorithm.

31

Page 32: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

ExperimentOut-of-core experiment

Compression helps 3x times.

Sharding into two give 2x speedup.

32

Page 33: XGBOOST: A SCALABLE TREE BOOSTING SYSTEM184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... · Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10

Conclusion

The most Important feature: Scalability !

Lessons from building XGBoost:

Sparsity aware, weighted quantile sketch, cache aware, parallelization.

33

System Design

Fin.