intelligent database systems lab n.y.u.s.t. i. m. an integrated machine learning approach to...

13
Intelligent Database Systems Lab N.Y.U.S. T. I. M. An Integrated Machine Learning Approach to Stroke Prediction Presenter: Tsai Tzung Ruei Authors: Aditya Khosla, Yu Cao, Cliff Chiung- Yu Lin, Hsu-Kuang Chiu, Junling Hu, Honglak Lee SIGKDD 2010 國國國國國國國國 National Yunlin University of Science and Technology

Upload: easter-gibbs

Post on 19-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Most previous prediction models have adopted features (risk factors) that are verified by clinical trials or selected manually by medical experts. In the past, high-performance machine learning algorithms such as SVM and logistic regression were not explored. 3

TRANSCRIPT

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

An Integrated Machine Learning Approachto Stroke Prediction

Presenter: Tsai Tzung Ruei Authors: Aditya Khosla, Yu Cao, Cliff Chiung-Yu Lin, Hsu-Kuang Chiu, Junling Hu, Honglak Lee

SIGKDD 2010

國立雲林科技大學National Yunlin University of Science and Technology

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Methodology Experiments Conclusion Comments

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

Most previous prediction models have adopted features (risk factors) that are verified by clinical trials or selected manually by medical experts.

In the past, high-performance machine learning algorithms such as SVM and logistic regression were not explored.

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

To propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean.

To present a margin-based censored regression algorithm that combines the concept of margin-based classifiers with censored regression to achieve a better concordance index than the Cox model.

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

5

Missing Data

Imputation

• Column mean• Column median• Imputation

through linear regression

• Regularized Expectation Maximization (EM)

Feature Selection

• Forward feature selection

• L 1 regularized logistic regression

• Conservative mean feature selection

Learning Algorithms

for Prediction

• Margin-based Censored Regression

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Conservative mean feature selection To consider the variance across different folds along with the

average of the prediction performance.

To evaluate the performance of each feature individually.

6

Age

Calculated

hypertension status

Left ventricula

r mass

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Conservative mean feature selection

7

VECTOR

Age

Left ventricular

mass

Calculated hypertension

status

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Learning Algorithms for Prediction Margin-based Censored Regression

8

SVM

True

False

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Data Imputation

Feature Selection

9

Missing Data

Imputation

Feature Selection

Learning Algorithms

for Prediction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Stroke Prediction

10

Missing Data

Imputation

Feature Selection

Learning Algorithms

for Prediction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Identifying risk factors

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

Contribution An extensive evaluation of the problems of data

imputation, feature selection and prediction in medical data, with comparisons against the Cox proportional hazards model.

A novel feature selection algorithm, Conservative Mean feature selection, that outperforms both L 1 regularized Cox model and L 1 regularized logistic regression on the CHS dataset.

A novel risk prediction algorithm, Margin-based Censored Regression, that outperforms the Cox model given the same set of features.

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

Advantage The structure of this paper is very clear.

Drawback ……

Application classification

13