l9. real world machine learning - cooking predictions

49
Cooking Predictions A real case in the hotel sector Andrés González Big Data Prediction Manager [email protected] Twitter: @data_lytics

Upload: machine-learning-valencia

Post on 18-Feb-2017

1.033 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: L9. Real World Machine Learning - Cooking Predictions

Cooking PredictionsA real case in the hotel sector

Andrés González Big Data Prediction Manager

[email protected] Twitter: @data_lytics

Page 2: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 3

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 3: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 4

Hotel Sector

• % room occupation. • Cancellation risk. • Income.

Page 4: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 5

Business Need

Predict client’s

NATIONALITY

BEFORE

client

check-in

Page 5: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 6

Staff Arrangement

Languages

Page 6: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 7

Prepare Activities

Page 7: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 8

Kitchen Arrangement

Page 8: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 9

Customize Stay

Page 9: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 10

… Details Make the Difference

In short, because…

Page 10: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 11

Machine Learning basics

Page 11: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 12

Machine Learning basics

Can you find patterns in this data?

Page 12: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit

13

Machine Learning basics

Historical Data Training Prediction

New Data Re-Training

Page 13: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 14

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 14: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit

Tasting the Dish

Cooking

Transforming

15

“Cooking” Predictions2

Go to the market to buy ingredients

Cleaning

Page 15: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit

Evaluating Prediction Quality

Training the Model

Transforming and Feature Engineering

15

“Cooking” Predictions2

Gathering RAW data

Cleaning Data

Page 16: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 16

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 17: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 17

Where does Data come from?

Own Website

Partners Websites

RAW Data

Page 18: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 18

RAW Data

One year historical reservation data

(.xlsx file)

Characteristics •260.000 reservations •80 fields

•57 categorical •9 numeric •10 date •3 text •1 incorrect field

•Size: 150 MB

Page 19: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 19

RAW Data

Page 20: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 20

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 21: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit

“Dirty” RAW Data

Gathering Data

21

The Process

New Fields

1 3 4

Transformation and Feature Engineering

“Clean” Data

Calculated Fields

2Cleaning Model

Page 22: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 23: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 24: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 25: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 26: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 27: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 22

Data Cleaning

Page 28: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 23

Data Cleaning

Row Deletion

• Reservations without check-in

• Cancelled reservations • Rows with errors

Column Deletion

• IDs vs names • Columns with little data

Other Actions

• Give dates a format • Delete accents • Transform .xlsx -> .csv

Page 29: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 24

Clean Dataset

Clean

•150.000 reservations •46 fields •26 categorical •9 numeric •10 data •1 text

•Size: 75MB

Dirty

•260.000 reservations •80 fields

•57 categorical •9 numeric •10 data •3 text •1 incorrect field

•Size: 150 MB

Page 30: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit

“Dirty” RAW Data

Gathering Data

25

The Process

New Fields

1 3 4

Transformations and Feature Engineering

“Clean” Data

Calculated Fields

2Cleaning Model

Page 31: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 26

TransformationsCountry Grouping

•A lot of countries to predict (210)

•Some countries have very few instances

•Grouping objective: mín. 1% of total instances

• Does not affect business objective

•Total number of groups: 20

New Fields

• RESERV_ANTICIPATION (calculated): (reservation date - checkin date)

• COUNTRY_HOTEL (name of the country)

• HOTEL_STARS (1-5)

Page 32: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 27

Clean Dataset

Clean •150.000 reservations •46 fields •Size: 75MB

Dirty •260.000 reservations •80 fields •Size: 150 MB

Transformed •150.000 registers •49 fields •Size: 80MB

Page 33: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 28

What is Feature Engineering

Extract signal from noise

Page 34: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 29

Feature Engineering Techniques

• Detecta fields (features) that are predictorss

(signal) and bypass those that are not (noise)

• Dependand fields (pax, days, pax*days) • Needless fields (reservation number) • Fields with very little data • Random fields (minute and second of reservation)

• Domain knowledge • Experience • Recursive cycle

Page 35: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 30

Field Selection

Algorithm Adjustment

Prediction

Quality Evaluation

Recursive Feature Engineering

Page 36: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 31

Clean Dataset

Clean •150.000 reservations •46 fields •Size: 75MB

Dirty •260.000 reservations •80 fields •Size: 150 MB

Transformed •150.000 registers •49 fields •Size: 80MB

Final Dataset •150.000 registers •10 fields •Size: 55MB

Page 37: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 32

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 38: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 33

The Process

“Dirty” RAW Data

New Fields

1 3 4Gathering Data

Transformation and Feature Engineering

“Clean” Data

Calculated

2Cleaning Modeling

Page 39: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 34

ModelingTraining Learning

Page 40: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 35

Modeling

Page 41: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 37

Agenda Business Need1

“Cooking” Predictions2

Gathering ingredients3

Cleaning and Transforming4

The recipe (the model)5

Tasting the dish6

Page 42: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 38

Quality Evaluation

80%

20% Evaluation

Training

TestDataset 100%

Modelo

Page 43: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 39

Quality Evaluation

Accuracy Confusion Matrix

Page 44: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 40

Quality Evaluation

54% 75%

Page 45: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 41

Quality EvaluationPredicted vs Real Distribution

Page 46: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 42

Cooking Predictions

80%

20%Tasting the Dish

Cooking

Transforming

Go to the market to buy ingredients

Cleaning

Page 47: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 42

Cooking Predictions

80%

20%Evaluating Prediction Quality

Training the Model

Transforming and Feature Engineering

Gathering RAW data

Cleaning Data

Page 48: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 43

Other TechniquesEnsembles Clusters

Weight Analysis Anomaly Detection

Page 49: L9. Real World Machine Learning - Cooking Predictions

CleverTask Solutions SL - Big Data Business Unit 44

ENDemail: [email protected]

Twitter: @data_lytics

www.clevertask.com