gbm.more gbm in h2o

13
H2O – The Open Source Math Engine H2O and Gradien t Boostin g

Upload: srisatish-ambati

Post on 26-Jan-2015

104 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Gbm.more GBM in H2O

H2O – The Open Source Math Engine

H2O and Gradient Boosting

Page 2: Gbm.more GBM in H2O

What is Gradient Boosting

gbm is a boosted ensemble of decision trees, fitted in a stagewise forward fashion to minimize a loss function

ie gbm is a sum of decision trees

each new tree corrects errors of the previous forest

Page 3: Gbm.more GBM in H2O

Why gradient boosting

Performs variable selecting during fitting process• Highly collinear explanatory variables

- glm: backwards/forwards is unstable

Interactions: will search to a specified depth

Captures nonlinearities in the data• ex airlines on-time performance: gbm captures a change in 2001

without analyst having to do so

Page 4: Gbm.more GBM in H2O

Why gradient boosting, moreWill naturally handle unscaled data (unlike glm, particularly with L1, L2 penalties)

Handles ordinal data, eg income:[$10k,$20k],($20k,$40k],($40k,$100k],($100k,inf)]

Relatively insensitive to long tailed distributions and outliers

Page 5: Gbm.more GBM in H2O

gradient boosting works wellon the right dataset, gbm classification will outperform both glm and random forest

Demonstrates good performance on various classification problems• Hugh Miller, team leader, winner KDD Cup 2009 Slow Challenge:

gbm main model to predict telco customer churn

• KDD Cup 2013 - Author-Paper Identification Challenge - 3 of the 4 winners incorporated gbm

• many kaggle winners

• results at previous employers

Page 6: Gbm.more GBM in H2O

Inference algorithm (simplified)

1. Initialize k predictors f_k,m=0(x)

2. for m = 1:num_treesa. normalize current predictions

b. for k = 1:num_classes

i. compute pseudo residual r = y – p_k

ii. fit a regression tree to targets r with data X

iii. for each terminal region, compute multiplier that maximizes the deviance loss

iv. f_k,m+1(x) = f_k,m(x) + region multiplier

Page 7: Gbm.more GBM in H2O

Regression tree, 1

R1

R2

R4

R3

X1

X2

2

7

1

Page 8: Gbm.more GBM in H2O

Regression tree, 2

1-level regression tree: 2 terminal nodes, split decision: minimize squared error

Data (9 observations)

Errors

Page 9: Gbm.more GBM in H2O

but has pain points

Slow to fit

Slow to predict

Data size limitations: often downsampling required

Many implementations single threaded

Parameters difficult to understand

Fit with searching, choose with holdout:• Interaction levels / depths [1,5,10,15]

• trees: [10,100,1000,5000]

• learning rate: [.1, .01, .001]

• this is often an overnight job

Page 10: Gbm.more GBM in H2O

h2o can help

multicore

distributed

parallel

Page 11: Gbm.more GBM in H2O

Questions?

Page 12: Gbm.more GBM in H2O

gbm intuition

Why should this work well?

Page 13: Gbm.more GBM in H2O

Universe is sparse. Life is messy. Data is sparse & messy. - Lao Tzu