blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors c....
DESCRIPTION
ROC curves constructed by plotting the sensitivity (true positive rate) versus the1-specificity, or false positive rate, for varying probability cutoff level. visualization of the relationship between sensitivity and specificity of a test. Area under the ROC curves (AUC) measures the probability of the classifier to correctly classify events and nonevents. Patient Data Unv. Hospitals Leuven 1994~ records, 25 features 32% malignant Univariate Analysis Preprocessing Multivariate Analysis PCA, Factor analysis Stepwise logistic regression Model Building Bayesian LS-SVM + sparse approxi. Bayesian MLP Model Evaluation ROC analysis: AUC Cross validation (temporal, random) Descriptive statistics Input Variable Selection Data Exploration Model Development Procedure of developing models to predict the malignancy of ovarian tumors Goal: find a model With High sensitivity for malignancy and low false positive rate. Providing probability of malignancy for individual. Bayesian LS-SVM (RBF, Linear) Forward Selection (Max. Evidence) 3. MethodsTRANSCRIPT
Blackbox classifiers for preoperative discrimination between malignant
and benign ovarian tumors
C. Lu1, T. Van Gestel1, J. A. K. Suykens1, S. Van Huffel1,I. Vergote2, D. Timmerman2
1Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium,2Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
Email address: [email protected]
Variable (symbol) Benign Malignant Demographic Age (age)
Postmenopausal (meno) 45.6 15.2
31.0 % 56.9 14.6
66.0 % Serum marker CA 125 (log) (l_ca125) 3.0 1.2 5.2 1.5 CDI High color score (colsc3,4) 19.0% 77.3 % Morphologic Abdominal fluid (asc)
Bilateral mass (bilat) Unilocular cyst (un) Multiloc/solid cyst (mulsol) Solid (sol) Smooth wall (smooth) Irregular wall (irreg) Papillations (pap)
32.7 % 13.3 % 45.8 % 10.7 % 8.3 %
56.8 % 33.8 % 12.5 %
67.3 % 39.0 % 5.0 %
36.2 % 37.6 % 5.7 %
73.2 % 53.2 %
Demographic, serum marker, color Doppler imaging and morphologic variables
Visualizing the
correlation between the
variables
and the relations
between the variables
and clusters.
Biplot of Ovarian Tumor Data
1. Introduction Ovarian masses is a common problem in gynecology. A reliable test for preoperative discrimination between benign and malignant ovarian tumors is of considerable help for clinicians in choosing appropriate treatments for patients. In this study, we develop and evaluate several blackbox models, particularly multi-layer perceptrons (MLP) and least squares support vector machines (LS-SVMs), both within Bayesian evidence framework, to preoperatively predict malignancy of ovarian tumors. Model performance is accessed via Receiver Operating Characteristic (ROC) curve analysis.
2. Data
o: benign case x: malignant case
ROC curves constructed by plotting the
sensitivity (true positive rate) versus the1-specificity, or false positive rate, for varying probability cutoff level.
visualization of the relationship between sensitivity and specificity of a test.
Area under the ROC curves (AUC)measures the probability of the classifier to correctly classify events and nonevents.
Patient DataUnv. Hospitals Leuven
1994~1999
425 records, 25 features
32% malignant
Univariate
Analysis
Preprocessing
Multivariate
Analysis
PCA,
Factor analysis
Stepwise logistic
regression
Model
Building
Bayesian LS-SVM
+ sparse approxi.
Bayesian MLP
Model
Evaluation
ROC analysis: AUC
Cross validation
(temporal, random)
Descriptive statistics
Input VariableSelection
Data Exploration
Model Development
Procedure of developing models to predict the malignancy of ovarian tumorsGoal: find a model
With High sensitivity for malignancy and low false positive rate.
Providing probability of malignancy for individual.
Bayesian LS-SVM
(RBF, Linear)
Forward Selection
(Max. Evidence)
3. Methods
4. Bayesian MLPs and Bayesian LS-SVMs for classification
LS-SVM Classifier (VanGestel,Suykens 2002)
22,
1
The following model is taken:
min ( , ) ,2 2
S.T. [ ( ) ] 1 1,...,with reg
( ) ( )
ularizer . Denot [e ] ,
NT
iw bi
Ti i i
T
J w b w w e
y w x b e i N
f w x b
x
1 1
2 2
1
1
[ ,..., ] ,1 [1,...,1] , [ ,..., ] ,
[ ,..., ] , ( ) ( ) ( , )
e.g. RBF kernel: ( , ) exp{ / }
Linear kernel: ( , )
Resulting
00 11
cl
T T TN v N
T TN ij i j i j
Tv
v
T
N
Y y
bYI
y e e e
x x K x x
K
K
x z x z
x z z x
1
( ) [assifi ( , ) ]er: N
i i ii
y x sign y K x x b
21
1 2 21 2
T 1 2MP
ˆIntroduce new error variables ( ( ) ),ˆwith the center of class in feature space.
2 ( ) exp ,2( )
ˆwhere ( ( ) ), , is the
( ,
varia
,
nc
)
e
T
ee
e
e e
p x y D
e w x mm
m
m w x m
H
1
of due to target noise and uncertainty in w.
( , , )( , , )
with the prior class probabili
( )( )
(
( , , )
ty.) y
e
p x y D Hp yp y
p y
p x y D Hp y x D H
Computing posterior class probabilities
solved in dual space
model , forMLP: network structure, e.g.
LS-SVM: kernel parameter, e.g.
#hidden neuronsfor rbf ke
( , , , ) ( , , ), ,
rnel
( , )
,
s
: infer , for given ,
p D w b H p w b Hp D
H
HP H
w bD
w b H
Level 1
=> the Maximum A Posteriori Estimation for and will be the solution of basic MLP/LS-SVM classifier
( , ) ( )(
exp(
(, ) =
( , ))b
(, )
w
)
: Infer hyperparameter
p D Hp D H p H
p
J
D
w b
HH
p D
Level 2
Level
( )
choose the which maximi
( )( )
( )(
ze t
)
he
: Compare models:
jjj
j
j
j
pp D
D HH
H p D H
H p Hp D
p D
3
Model evidence
Bayesian Evidence FrameworkInferences are divided into distinct levels.
(2) (1) (1) (2)
Consider the one hidden layer MLP:,
where ( , ) '
with activation function of exp( ) exp( )hidden layer: '( ) tanh(
( ) ( ,
) , exp( ) exp( )
output layer: logistic funct on
)
i
a x w w g w x b b
a ag a aa a
f x g a x w
1,
1
min ( , ) , with regularizer ,2
where the cross entropy error function
1( )1 e
{ log ( ) (1 ) log(1 ( ))}.
xp( )
T
w b
N
i i i ii
J w b w w G
G y
g
f x y f x
aa
MP
2 2
( 1)( 1)
posterior class probability can be approximated:
( ) ( , ) log log ,
where ( ) 1/ 1 / 8, and is var( | ),with the prior class probabili
( 1| , , )
( 1 ty.)
Ng s a x P yP
wN
s s s
P y x D H
x
y
P ya
1,...,Consider a binary classification problem, given {( , )} , where , 0,1 in case of MLP, 1,1 in case of LS-SVM.pi i i N i i iD x y x R y y
MLP Classifiers (Mackay 1992)
Computing posterior class probabilities for minimum risk decision making
Incorporate the different misclassification costs into the class priors: e.g.
Set the adjusted prior probability for malignant and benign class to: 2/3 and 1/3.
5. Experimental results
RMI: risk of malignancy index = scoremorph× scoremeno× CA125
Training set : data from the first treated 265 patients
Test set : data from the latest treated 160 patients
Performance from Temporal validationROC curve on test set
MODEL TYPE
AUC cut off
Accuracy
Sensitivity
Specificity
RMI 0.8733 0.4 78.13 74.07 80.190.3 76.88 81.48 74.53
MLP 0.9174 0.4 83.13 81.48 83.96(10-2-1) 0.3 81.87 83.33 81.13LS-SVM 0.9141 0.4 81.25 77.78 83.02(LIN) 0.3 81.88 83.33 81.13LS-SVM 0.9184 0.4 83.13 81.48 83.96(RBF) 0.3 84.38 85.19 83.96
Performance on Test set
Input variable selection
The forward selection procedure tries to maximize the model evidence of LS-SVM given a certain type of kernel
10 variables were selected using RBF kernels.
l_ca125, pap, sol, colsc3, bilat, meno, asc, shadows, colsc4, irreg
( 1)'( 1) , where , denote the cost of misclassifying a case from class '+' and '-', respectively.( 1) ( 1)
P y cP y c cP y c P y c
The forward selection procedure which tries to maximize the evidence of LS-SVM model is able to identify the important variables.
The performance of LS-SVMs and MLPs are comparable. Both models have the potential to give reliable
preoperative prediction of malignancy of ovarian tumors. A larger scale validation is needed.
References1. C. Lu, T. Van Gestel, et al. Preoperative prediction
of malignancy of ovarian tumors using Least Squares Support Vector Machines (2002), submitted paper.
2. D. Timmerman, H. Verrelst, et al., Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses. Ultrasound Obstet Gynecol (1999).
3. J.A.K. Suykens, J. Vandewalle, Least Squares support vector machine classifiers, Neural Processing Letters (1999), 9(3).
4. T. Van Gestel, J.A.K. Suykens, et al., Bayesian framework for least squares support vector machine classifiers, Gaussian process and kernel fisher discriminant analysis, Neural Computation (2002), 15(5).
5. D.J.C. MacKay, The evidence framework applied to classification networks, Neural Computation (1992), 4(5).
Performance from randomized cross-validation (30 runs)
MODEL TYPE
mAUC (SD)
cut off
Accuracy
Sensitivity
Specificity
RMI 0.8882 100 82.65 81.73 83.060.0318 80 81.10 83.87 79.85
MLP 0.9409 0.6 84.46 87.20 83.21(10-2-1) 0.0198 0.5 82.17 90.80 78.24LS-SVM 0.9405 0.5 84.31 87.40 82.91(LIN) 0.0236 0.4 82.77 90.47 79.27LS-SVM 0.9424 0.5 84.85 86.53 84.09(RBF) 0.0232 0.4 83.52 90.00 80.58
randomly separating training set (n=265) and test set (n=160) Stratified, #malignant : #benign ~ 2:1 for each training and test set. Repeat 30 times
Averaged Performance on 30 runs of validations
6. Conclusions