gmdh-based feature ranking and selection for improved classification of medical data

16
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology GMDH-based feature ranking and selection for improved classification of medical data Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal 005. BI.456-468

Upload: xenos

Post on 26-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

GMDH-based feature ranking and selection for improved classification of medical data. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal. 2005. BI.456-468. Outline. Motivation Objective Method Material Results Conclusions. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GMDH-based feature ranking and selection for improved classification of medical data

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

GMDH-based feature ranking and selection for improved

classification of medical data

Advisor : Dr. Hsu

Presenter : Yu-San Hsieh

Author : R.E. Abdel-Aal

2005. BI.456-468

Page 2: GMDH-based feature ranking and selection for improved classification of medical data

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation Objective Method Material Results Conclusions

Outline

Page 3: GMDH-based feature ranking and selection for improved classification of medical data

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation Accuracy is very important in classifiers used

for medical application.

Page 4: GMDH-based feature ranking and selection for improved classification of medical data

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective Improved classification performance of

medical data.

Page 5: GMDH-based feature ranking and selection for improved classification of medical data

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Method

First stage – ranked feature─ GMDH algorithm

z1

Zm(m-1)/2

1. representation

2.Selection and stopping

x1

x2

x3

x4

y

An increasing rmin: model becoming complex,

1.Overfitting the estimation data

2.Performing poorly on the new selection data.

Iteration

Square error

r12

rm(m-1)2

rmin

r22

Page 6: GMDH-based feature ranking and selection for improved classification of medical data

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Method

First stage – ranked feature─ AIM abductive network

2.Selection and stopping

1.repesentation

1.repesentation

First stage – ranked feature─ AIM abductive network

2.Selection and stoppingAvoid overfitting

Using CPM control

1.CPM>1,simpler model that are less accurate but generalize.

2.CPM<1,complex model, overfit training data and decrease actual prediction performance.

Page 7: GMDH-based feature ranking and selection for improved classification of medical data

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Method Second stage – selected feature

─ Selected k, performance on an evaluation dataset would first improve and starts to deteriorate due to the model overfitting the training data.

─ A compact m-feature subset can be obtained by taking the first m features starting from top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is {2,6,7,8,1,5}.

─ The optimum subset of features is determined by repeatedly forming subset of k features, starting from the top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, {2,6,7,8,1,5},{6,7,8,1,5,3}…中選出最佳的 subset

Page 8: GMDH-based feature ranking and selection for improved classification of medical data

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Material Two standard medical diagnosis datasets from

the UCI Machine Learning Repository were used for this study.─ Wisconsin breast cancer dataset─ Cleveland heart disease dataset

70% 30%

Page 9: GMDH-based feature ranking and selection for improved classification of medical data

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results

The breast cancer data─ Ranking for the feature set

{2,6,7,8,1,5,3,4,9}

7

5

9

Feature selected Feature ranked

Page 10: GMDH-based feature ranking and selection for improved classification of medical data

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results Rough set data analysis of dataset

Overfitting Overfitting

3%

3%

Page 11: GMDH-based feature ranking and selection for improved classification of medical data

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results

Standard error↓Standard error↓

AUC↑

3%3%

Page 12: GMDH-based feature ranking and selection for improved classification of medical data

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results The heart disease data

─ Ranking for the feature set{13,12,9,3,2,10,8,4,5,11,1,7,6}

Feature selected Feature ranked

Page 13: GMDH-based feature ranking and selection for improved classification of medical data

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results

3%6%

Overfitting

Page 14: GMDH-based feature ranking and selection for improved classification of medical data

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Results

AUC↑

AUC↑

Requires less than half the number of input features

Models using the reduced feature set will be more efficient.

Page 15: GMDH-based feature ranking and selection for improved classification of medical data

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

Improved implementation and performance of classifiers for medical screening and diagnosis.

Feature reduction is particularly useful with high-dimensional data characterized by a large number of feature and a relatively few training example.

Page 16: GMDH-based feature ranking and selection for improved classification of medical data

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.My opinion

Advantage: Preprocess Disadvantage: Apply: Clustering, Association Rule……