submit predictions

17
Submit Prediction s Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disas Woman and Children First Read dataset into Excel, R, etc Some Age Missing Data, Analyze Gender Only 74% Women, 19% Men 320 / 418 = 76.5%

Upload: dieter

Post on 11-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Goal. Predict whom survived the Titanic Disaster. Hypotheses. Woman and Children First. Get Data. Read dataset into Excel, R, etc. Data Management. Some Age Missing Data, Analyze Gender Only. Statistics & Analysis. 74% Women, 19% Men . Submit Predictions. 320 / 418 = 76.5%. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Submit Predictions

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

74% Women, 19% Men

320 / 418 = 76.5%

Page 2: Submit Predictions

Variable Description Type Hypothesispclass Passenger Class Categorical,

Ordinal1st class 3rd

name Name TextSex Sex Categoricalage Age Numericsibsp Number of Siblings/Spouses Aboard Integer

parch Number of Parents/Children Aboard Integer

ticket Ticket Number Textfare Passenger Fare Numericcabin Cabin Textembarked Port of Embarkation Categorical

Predictor Variables

Page 3: Submit Predictions

AgeAll

N = 891

MissingN = 177

DataN = 714

0 10 20 30 40 50 60 70 80 900

2

4

6

8

10

12

14

16

18

20

Survived Not

Page 4: Submit Predictions

• Dependent variable, (Y) • Continuous• Categorical

Decision Trees

The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y

Survived

Age Lesser Than X

Age Greater Than X• Independent variables, (X’s)

• Continuous• Categorical

Page 5: Submit Predictions

Age

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0

5

10

15

20

25

30

35

40

45

50

A B Delta N

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18

20

Page 6: Submit Predictions

• maximize data likelihood (minimize deviance).

Decision Trees

Page 7: Submit Predictions

Prediction and Missing Values

Variable Descriptionpclass Passenger Classname NameSex Sexage Agesibsp Number of Siblings/Spouses Aboard

parch Number of Parents/Children Aboard

ticket Ticket Numberfare Passenger Farecabin Cabinembarked Port of Embarkation

Correlation, Association of Age with other Variables?

Page 8: Submit Predictions

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

74% Women, 19% Men

320 / 418 = 76.5%

Page 9: Submit Predictions

Gender

Page 10: Submit Predictions

Gender and Age• Tree grows based on optimizing

only the split from the current node rather then optimizing the entire tree• Tree stops when further split

becomes ineffective

0 10 20 30 40 50 60 700%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Female Survival%

Page 11: Submit Predictions

Prediction: Gender + Age

Page 12: Submit Predictions

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Some Age Missing Data, Analyze Gender Only

Page 13: Submit Predictions

Submit Predictions

Statistics &Analysis

Data Management

Hypotheses

Goal

Get Data

Predict whom survived the Titanic Disaster

Woman and Children First

Read dataset into Excel, R, etc

Age + Gender

Page 14: Submit Predictions

Kitchen Sink

Page 15: Submit Predictions

Kitchen Sink

Page 16: Submit Predictions

• Popular Implementations• CART Classification And Regression Tree• CHAID CHi-squared Automatic Interaction Detector

• CHAID allows multiple branch split - a wider tree• CART uses binary split

Decision Trees

Page 17: Submit Predictions