xgboost: a scalable tree boosting system184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... ·...

XGBOOST: A SCALABLE TREE BOOSTING SYSTEMADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016

OutlineIntroduction

Method

Experiment

Conclusion

2

IntroductionRegression tree

CART (Gini)

Boosting

Ensemble method, an iterative procedure adaptively change the distribution of training examples.

Adaboost

3

IntroductionThe most important factor of XGBoost —

Scalability.

Billions of examples.

4

IntroductionA practical choice:

17 out of 29 winning solutions in Kaggle 2015.

Top-10 teams all used XGBoost in KDDcup 2015

T-brain: used in top-3 teams.

Ad click through rate prediction, malware classification, customer behavior prediction, etc.

5

MethodTree ensemble model:

Prediction Leaf weights of a tree

6

MethodRegularized objective function:

Differentiable convex loss function

Number of leaves +

Weights on leave

Model complexity

Number of leaves

7

Objective function

MethodGradient tree boosting:

Model is trained in additive manner.

Usual

__

_________

8

Objective function

MethodAdditive training (Boosting)

9

Objective function

Method

Taylor expansion:

10

Objective function

:instance set of j ( xi in leaf j )

Method

T : number of leaf

11

Objective function

Method

For a fixed tree q, the optimal weight is:

12

Objective function

MethodFor a fixed tree q, the optimal weight is:

The corresponding optimal value is:

13

Objective function

MethodFrom now, if the tree is known, we get the optimal value.

The problem becomes “what tree is the best ?”

Left subtree. Right subtree. Parent

Loss reduction

The larger the better, might be negative

Greedy strategy

14

Objective function

MethodPreventing overfitting further:

Shrinkage.

Subsampling. (column)

15

Objective function

MethodBasic Exact Greedy Algorithm.

Approximate Algorithm.

Global

Local

16

Split Finding

MethodBasic Exact Greedy Algorithm:

17

Split Finding

.m

When to stop?

MethodB.E.G.A. is good, since all possible splits, but…. When data can’t fit in memory, the thrashing slow down the system.

Approximations:

18

Split Finding

MethodLocal/ Global agendas:

Global: less proposal but more candidate point.

19

Split Finding

MethodWeighted quantile sketch:

Each interval has the same “impact” on OF.

20

Split Finding

MethodSparsity-aware:

Possible reasons:

Missing value

Frequent zero

Artifacts of feature engineering (like one-hot)

Solution: default direction

21

Split Finding

Method

22

Split Finding

Sort criteria: Missing value last

Learn the best direction (of the feature)

MethodNon-presence -> missing value.

Only deal with presence.

50x faster than naive ver. , on Allstate.

23

Split Finding

MethodThe most time consuming part: sorting.

Sort just once.

Store data in in-memory unit: block.

24

System Design

MethodCSC format (compressed column)

Ex:

Different blocks can be distributed across machine, stored on disk in the out-of-core setting.

25

System Design

MethodBlock structure helps split finding.

However, it’s a non-continuous memory access.

Solution: allocate an internal buffer in each thread.

26

System Design

MethodBlock size matters. (max number of examples)

Small blocks result in small workload for each thread.

Large blocks lead cache missing.

27

System Design

Balance!

MethodOut-of-core computation:

Block compression

Ex: [0, 2, 2, 0, 1, 2]

Block sharding

A prefetch thread is assigned to each disk.

28

System Design

Experiment

The open source package:

GitHub.com/dmlc/xgboost

29

http://GitHub.com/dmlc/xgboost

ExperimentClassification:

GBM expands one branch of a tree.

Other two expand full tree.

30

ExperimentLearning to rank:

pGBRT: the best previously published system.

pGBRT only supports approximate algorithm.

31

ExperimentOut-of-core experiment

Compression helps 3x times.

Sharding into two give 2x speedup.

32

Conclusion

The most Important feature: Scalability !

Lessons from building XGBoost:

Sparsity aware, weighted quantile sketch, cache aware, parallelization.

33

System Design

Fin.

xgboost: a scalable tree boosting system184pc128.csie.ntnu.edu.tw/presentation/18-04-17/... ·...

Documents