decision tree ensembles

44
Decision Tree Ensembles (Random Forest & Gradient Boosting) by Devin Didericksen

Upload: devin-didericksen

Post on 15-Apr-2017

418 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Decision Tree Ensembles

Decision Tree Ensembles(Random Forest & Gradient

Boosting)by Devin Didericksen

Page 2: Decision Tree Ensembles

Sponsors

Page 3: Decision Tree Ensembles

About Me

Education:● B.S. Math - University of Utah● M.S. Statistics - Utah State University

Career:● Sr. Pricing Strategist - Overstock.com

Hobbies:● Traveling, movies, food, and my two

rabbits My wife, Sara, and me

Page 4: Decision Tree Ensembles

What Some of the Leading Data Scientists Say

“Random Forests have been the most successful general-purpose algorithm in modern times” - Jeremy Howard, CEO of Enlitic

Page 5: Decision Tree Ensembles

What Some of the Leading Data Scientists Say

“Random Forests have been the most successful general-purpose algorithm in modern times” - Jeremy Howard, CEO of Enlitic

“Gradient Boosting Machines are the best!” - Gilberto Titericz, #1 Ranked Kaggler

Page 6: Decision Tree Ensembles

What Some of the Leading Data Scientists Say

“Random Forests have been the most successful general-purpose algorithm in modern times” - Jeremy Howard, CEO of Enlitic

“Gradient Boosting Machines are the best!” - Gilberto Titericz, #1 Ranked Kaggler

“When in doubt, use GBM” - Owen Zhang, #2 Ranked Kaggler, Chief Product Officer at DataRobot

Page 7: Decision Tree Ensembles

What Some of the Leading Data Scientists Say

“Random Forests have been the most successful general-purpose algorithm in modern times” - Jeremy Howard, CEO of Enlitic

“Gradient Boosting Machines are the best!” - Gilberto Titericz, #1 Ranked Kaggler

“When in doubt, use GBM” - Owen Zhang, #2 Ranked Kaggler, Chief Product Officer at DataRobot

“We achieve state of the art accuracy and demonstrate improved generalization [with Random Forest]” - Microsoft Xbox Kinect Research Team on human pose recognition

Page 8: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe

Page 9: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data

Page 10: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data● Prizes have ranged from an interview to $500k (but usually around

$20k)

Page 11: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data● Prizes have ranged from an interview to $500k (but usually around

$20k)● Competition data is sliced as follows:

Page 12: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data● Prizes have ranged from an interview to $500k (but usually around

$20k)● Competition data is sliced as follows:

● Labels are hidden from validation and holdout sets

Page 13: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data● Prizes have ranged from an interview to $500k (but usually around

$20k)● Competition data is sliced as follows:

● Labels are hidden from validation and holdout sets● Winners are determined by performance against holdout set

Page 14: Decision Tree Ensembles

Kaggle in a Nutshell

Kaggle is a data competition marketplace● 400,000+ members across the globe● Companies will sponsor a competition using their own data● Prizes have ranged from an interview to $500k (but usually around

$20k)● Competition data is sliced as follows:

● Labels are hidden from validation and holdout sets● Winners are determined by performance against holdout set● Competitions are very competitive!

Page 15: Decision Tree Ensembles

Last 10 Kaggle Competitions

Competition Prize Money End Date Best Single Model SpringLeaf Marketing $100,000 10/19 Gradient Boosting

Dato Truly Native $10,000 10/14 Gradient Boosting

Flavours of Physics $15,000 10/12 Gradient Boosting

Recruit Coupon Purchase $50,000 9/30 Gradient Boosting

Caterpillar Tube Pricing $30,000 8/31 Gradient Boosting

Grasp-and-Lift EEG $10,000 8/31 Recurrent Convolutional Neural Net

Liberty Mutual Property $25,000 8/28 Gradient Boosting

Drawbridge Cross-Device $10,000 8/24 Gradient Boosting

Avito Context Ad Clicks $20,000 7/28 Gradient Boosting

Diabetic Retinopathy $100,000 7/27 Convolutional Neural Net

Page 16: Decision Tree Ensembles

Last 10 Kaggle Competitions

Competition Prize Money End Date Best Single Model SpringLeaf Marketing $100,000 10/19 Gradient Boosting

Dato Truly Native $10,000 10/14 Gradient Boosting

Flavours of Physics $15,000 10/12 Gradient Boosting

Recruit Coupon Purchase $50,000 9/30 Gradient Boosting

Caterpillar Tube Pricing $30,000 8/31 Gradient Boosting

Grasp-and-Lift EEG $10,000 8/31 Recurrent Convolutional Neural Net

Liberty Mutual Property $25,000 8/28 Gradient Boosting

Drawbridge Cross-Device $10,000 8/24 Gradient Boosting

Avito Context Ad Clicks $20,000 7/28 Gradient Boosting

Diabetic Retinopathy $100,000 7/27 Convolutional Neural Net

*It doesn’t take a data scientist to identify a trend here

Page 17: Decision Tree Ensembles

Kaggle Competition - Titanic

Can you predict which people survived the Titanic tragedy?

Page 18: Decision Tree Ensembles

Kaggle Titanic Data

PassengerId

Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S

2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)

female 38 1 0 PC 17599 71.2833 C85 C

3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282

7.925 S

4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)

female 35 1 0 113803 53.1 C123 S

5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S

6 0 3 Moran, Mr. James male 0 0 330877 8.4583 Q

7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S

Page 19: Decision Tree Ensembles

Kaggle Titanic Data - Training Variable Selection

PassengerId

Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S

2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)

female 38 1 0 PC 17599 71.2833 C85 C

3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282

7.925 S

4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)

female 35 1 0 113803 53.1 C123 S

5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S

6 0 3 Moran, Mr. James male 0 0 330877 8.4583 Q

7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S

LabelDrop Drop Drop Drop

Page 20: Decision Tree Ensembles

Kaggle Titanic Data - Training Set

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

Page 21: Decision Tree Ensembles

Decision Tree Pioneers

Leo Breiman, PhDU. of California Berkeley

Adele Cutler, PhDUtah State University

Jerome Friedman, PhDStanford University

CART, Random Forest, Gradient Boosting Random Forest Gradient Boosting

Page 22: Decision Tree Ensembles

Simple Decision Tree

Page 23: Decision Tree Ensembles

Classification and Regression Tree (CART)

Titanic Survival Classification Tree

Page 24: Decision Tree Ensembles

Classification and Regression Tree (CART)

Like Mr. Bean’s car, CART is ● Super Simple - They are

often easier to interpret than linear models.

● Very Efficient - The computation cost is minimal.

● Weak - It has low predictive power on its own. It’s in a class of models called the “weak learners”.

Page 25: Decision Tree Ensembles

Random Forest

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

1. Randomly sample the rows (w/replacement) and columns (w/o replacement) at each node and build a tree.

Page 26: Decision Tree Ensembles

Random Forest

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

1. Randomly sample the rows (w/replacement) and columns (w/o replacement) at each node and build a tree.

2. Repeat many times (1,000+)

Page 27: Decision Tree Ensembles

Random Forest

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

1. Randomly sample the rows (w/replacement) and columns (w/o replacement) at each node and build a tree.

2. Repeat many times (1,000+)3. Ensemble trees by majority

vote (ie. if 300 out of 1,000 trees predicts a given individual dies then probability of death is 30%).

Page 28: Decision Tree Ensembles

Random Forest - Tree 1

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

Page 29: Decision Tree Ensembles

Random Forest - Tree 2

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

Page 30: Decision Tree Ensembles

Random Forest - Several Trees

Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S

1 1 female 38 1 0 71.2833 C

1 3 female 26 0 0 7.925 S

1 1 female 35 1 0 53.1 S

0 3 male 35 0 0 8.05 S

0 3 male 0 0 8.4583 Q

0 1 male 54 0 0 51.8625 S

0 3 male 2 3 1 21.075 S

1 3 female 27 0 2 11.1333 S

1 2 female 14 1 0 30.0708 C

Page 31: Decision Tree Ensembles

Random Forest

Like a Honda CR-V, Random Forest is● Versatile - It can do classification,

regression, missing value imputation, clustering, feature importance, and works well on most data sets right out of the box.

● Efficient - Trees can be grown in parallel.

● Low Maintenance - There is only one parameter to tune, number of columns to sample.

Page 32: Decision Tree Ensembles

Gradient Boosting Machines (GBM)

Page 33: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Tree 1

Page 34: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Boost 1

Page 35: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Tree 2

Page 36: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Boost 2

Page 37: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Tree 3

Page 38: Decision Tree Ensembles

Gradient Boosting Machines (GBM) - Probabilities

Page 39: Decision Tree Ensembles

Gradient Boosting Machines (GBM)

Given this process, how quickly do you think this will overfit?

Page 40: Decision Tree Ensembles

Gradient Boosting Machines (GBM)

Given this process, how quickly do you think this will overfit?

The surprising answer is not very fast. There is even a mathematical proof behind this.

Page 41: Decision Tree Ensembles

Gradient Boosting Machines (GBM)

Like the original hummer, GBM is● Powerful - On most real world data

sets, it is hard to beat in predictive power. It handles missing values natively.

● High Maintenance - There are many parameters to tune. Extra precautions must be taken to prevent overfitting.

● Expensive - Boosting is inherently sequential and computationally expensive. However, GBM is a lot faster than it used to be with the arrival of XGBoost.

Page 42: Decision Tree Ensembles

Deep Learning

Deep learning is like the 2008 Tesla Roadster…

The technology holds a lot of promise for the future, but in it’s current form it is usually not the most practical tool for most problems (other than speech and image recognition).

Page 43: Decision Tree Ensembles

Kaggle Titanic - R Code

Some (simple) CART and Random Forest R code can be found here:

https://github.com/didericksen/titanic/blob/master/titanic.R