titanic data set analysis

8
Integrated Analysis - Decision Tree and K-means clustering using Tableau & R Sumit Kumar Saini Page 1 Analysis of the Titanic dataset to find out the important attributes in the survival of the people. Decision Tree classification using R Misclassification rate for the current tree model is 0.21 Sex is the first variable used for splitting Top 6 variables from the decision tree in the order of importance 1) Survived 2) Sex 3) Pclass 4) Age 5) Embarked 6) Sib Sb

Upload: sumit-saini

Post on 13-Jan-2017

37 views

Category:

Documents


3 download

TRANSCRIPT

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 1

Analysis of the Titanic dataset to find out the important

attributes in the survival of the people.

Decision Tree classification using R

Misclassification rate for the current tree model is 0.21

Sex is the first variable used for splitting

Top 6 variables from the decision tree in the order of importance 1) Survived 2) Sex 3) Pclass 4) Age 5) Embarked 6) Sib Sb

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 2

K-means Clustering

6 is the desired number of cluster based on the below plot

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 3

.

Visualization in Tableau connecting with R

Analyzing all the attributes divided based on the clusters

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 4

Cluster -1 i) People who survived are from Embarked = Southampton only ii) Median age for survival is 28 years from this cluster iii) This cluster consist of only male i.e Sex N = 1 iv) Passengers belongs to class 3 only Cluster -4

i) People who survived are from Embarked = Southampton only ii) Median age for survival is 50 years from this cluster. iii) This cluster consists of both male and female but only male survived in this cluster i.e Sex N = 1 iv) Passengers belong to all 2 and 3 classes but survived only in class 3

Analysis of all the cluster categorized by the survived or not survived.

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 5

Clusters with the highest survivability chances 2nd and 5th

Clusters categorized by Sex

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 6

Clusters categorized by Sex and Pclass

Clusters categorized by Sex and AgeCat

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 7

Clusters categorized by Sex and Embarked

Based on above visualization I came up with the below matrix

Integrated Analysis - Decision Tree and K-means clustering using Tableau & R

Sumit Kumar Saini Page 8

2nd 5th

Ideal Gender Female Female

Ideal Passenger Class Female / Pclass 1 Female / Pclass 3

Ideal Age Category 17-32 17-32

Ideal Embarked point Female / Southampton Female / Southampton

Ideal number of siblings Female / Sib Sp 0 Female / Sib Sp 0