advanced network database lab kaggle competition prudential life insurance assessment can you make...

Post on 19-Jan-2016

236 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Advanced Network Database Lab

Kaggle Competition

Prudential Life Insurance Assessment

Can you make buying life insurance easier?

2

Registration

• Site: https://www.kaggle.com/competitions

• Account: IKDD1(Group Number)

3

Prudential

• Prudential Financial, Inc.• An American Fortune Global 500 and Fortune 500 company• https://www.prulife.com.tw/page/index.htm• $ 30,000

5

Data Attribute

6

Data Attribute

• Nominal type• Numbers may be used to represent the variables but the numbers do

not have numerical value or relationship.

7

Classification

8

Prediction

9

Decision Tree

10

Sklearn – Python tool

• Simple and efficient tools for data mining and data analysis!

• Decision tree url : http://scikit-learn.org/stable/modules/tree.html

11

Homework 1

• Registration

• Apply a simple algorithm to build the classifier

• To predict the "Response" variable for each Id in the test set

• Submit the result to Kaggle

• Deadline: next Thursday (12/31)

12

Homework 2

• Improve your prediction results

• Oral report

• Deadline: next Thursday (1/7)

13

Homework 3 (Final project)

• Try different algorithms to build the best classifier

• Submit the result to Kaggle

14

Final project

• Deadline: 1/14 23:59

• Submission: • Submit the results to kaggle• Email your project to cwchang.ncku@gmail.com• Project file content:

• code • prediction result • report

15

Report

• The details of the your best method

• The description of the methods that you tried

• The important attributes or surprised features you found

16

Grading

• Homework 1: 20%

• Homework 2: 10%

• Final Project : 70%• The ranking: 20%• Algorithm and coding : 25%• Report: 25%

XGBoost

• General purpose gradient boosting library, including generalized linear model and gradient boosted decision tree

• SITE: http://dmlc.ml/

tslm

• A linear model with time series components

• SITE: http://www.inside-r.org/packages/cran/forecast/docs/tslm

H2o.randomForest

• Random Forest (RF) is a powerful classification tool. When given a set of data, RF generates a forest of classification trees, rather than a single classification tree. Each of these trees generates a classification for a given set of attributes. The classification from each H2O tree can be thought of as a vote; the most votes determines the classification.

• SITE: http://docs.h2o.ai/h2oclassic/datascience/rf.html

top related