intelligent database systems lab n.y.u.s.t. i. m. an integrated machine learning approach to...
DESCRIPTION
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Most previous prediction models have adopted features (risk factors) that are verified by clinical trials or selected manually by medical experts. In the past, high-performance machine learning algorithms such as SVM and logistic regression were not explored. 3TRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
An Integrated Machine Learning Approachto Stroke Prediction
Presenter: Tsai Tzung Ruei Authors: Aditya Khosla, Yu Cao, Cliff Chiung-Yu Lin, Hsu-Kuang Chiu, Junling Hu, Honglak Lee
SIGKDD 2010
國立雲林科技大學National Yunlin University of Science and Technology
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Methodology Experiments Conclusion Comments
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
Most previous prediction models have adopted features (risk factors) that are verified by clinical trials or selected manually by medical experts.
In the past, high-performance machine learning algorithms such as SVM and logistic regression were not explored.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
To propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean.
To present a margin-based censored regression algorithm that combines the concept of margin-based classifiers with censored regression to achieve a better concordance index than the Cox model.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
5
Missing Data
Imputation
• Column mean• Column median• Imputation
through linear regression
• Regularized Expectation Maximization (EM)
Feature Selection
• Forward feature selection
• L 1 regularized logistic regression
• Conservative mean feature selection
Learning Algorithms
for Prediction
• Margin-based Censored Regression
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Conservative mean feature selection To consider the variance across different folds along with the
average of the prediction performance.
To evaluate the performance of each feature individually.
6
Age
Calculated
hypertension status
Left ventricula
r mass
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Conservative mean feature selection
7
VECTOR
Age
Left ventricular
mass
Calculated hypertension
status
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
Learning Algorithms for Prediction Margin-based Censored Regression
8
SVM
True
False
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
Data Imputation
Feature Selection
9
Missing Data
Imputation
Feature Selection
Learning Algorithms
for Prediction
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
Stroke Prediction
10
Missing Data
Imputation
Feature Selection
Learning Algorithms
for Prediction
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
Contribution An extensive evaluation of the problems of data
imputation, feature selection and prediction in medical data, with comparisons against the Cox proportional hazards model.
A novel feature selection algorithm, Conservative Mean feature selection, that outperforms both L 1 regularized Cox model and L 1 regularized logistic regression on the CHS dataset.
A novel risk prediction algorithm, Margin-based Censored Regression, that outperforms the Cox model given the same set of features.
12