optimization and data mining in epilepsy research w. art chaovalitwongse assistant professor...
TRANSCRIPT
Optimization and Data Mining in Epilepsy Research
W. Art Chaovalitwongse
Assistant Professor
Industrial and Systems Engineering
Rutgers University
Acknowledgements
Comprehensive Epilepsy Center, St. Peter’s University Hospital Rajesh C. Sachdeo, MD Deepak Tikku, MD
Brain Institute, University of Florida Panos M. Pardalos, PhD J. Chris Sackellares, MD Paul R. Carney, MD
Bioengineering, Arizona State University Leonidas D. Iasemidis, PhD
Agenda
Background: Epilepsy Electroencephalogram (EEG) Time Series Chaos Theory: Dimensionality Reduction Seizure Prediction
Feature Selection Process Monitoring
Concluding Remarks
Facts About Epilepsy
At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy.
Epilepsy is the second most common brain disorder (after stroke)
The hallmark of epilepsy is recurrent seizures. Epileptic seizures occur when a massive group of
neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.
Epileptic Seizures
Seizures usually occur spontaneously, in the absence of external triggers.
Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes.
Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.
Rationale
Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion in the U.S. in associated health care costs and losses in employment, wages, and productivity.
Cost per patient ranged from $4,272 for persons with remission after initial diagnosis and treatment to $138,602 for persons with intractable and frequent seizures.
How To Fight Epilepsy Anti-Epileptic Drugs (AEDs)
Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive
Epilepsy surgery Require long-term invasive EEG monitoring 50% of pre-surgical candidates do not undergo respective surgery
Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue
Only 60% of surgery cases result in seizure free Electrical Stimulation (Vagus nerve stimulator)
Parameters (amplitude and duration of stimulation) arbitrarily adjusted As effective as one additional AED dose Side Effects
Seizure Prediction?
Vagus Nerve Stimulator
Open Problems
Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors
preceding seizures? If yes, what measurement can be used to
indicate these pre-cursors? Does normal brain activity during differ from
abnormal brain activity?
Electroencephalogram (EEG) …is a tool for evaluating the physiological state of
the brain. …offers excellent spatial and temporal resolution to
characterize rapidly changing electrical activity of brain activation
…captures voltage potentials produced by brain cells while communicating.
In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.
From Microscopic to Macroscopic Level (Electroencephalogram - EEG)
Depth and Subdural electrode placement for EEG recordings
LOF
LOFROF
LTDRTD
LTD
LST
LSTRST
Scalp EEG Data Acquisition
EEG Data Acquisition
Typical EEG Time Series Data
Goals of Research
Test the hypothesis that seizures are not a random process.
Employ data mining techniques to differentiate normal and abnormal EEGs
Employ quantitative analysis to identify seizure pre-cursors
Demonstrate that seizures could be predicted Develop a closed-loop seizure control device
(Brain Pacemaker)
10-second EEGs: Seizure EvolutionNormal Pre-Seizure
Seizure Post-Seizure
Dimensionality Reduction
The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is
comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB(assume 16-bit ASCI format) 1 day = 1 hour*24 1 week = 1 hour*168 20 patients = 1 hour*3360
Kilobytes → Megabytes → Gigabytes → Terabytes
Dimensionality Reduction Using Chaos Theory
Chaos in Brain? Chaos in Stock Market? Chaos in Foreign Exchanges (Swedish Currency)? Measure the brain dynamics from EEG time series. Apply dynamical measures (based on chaos theory) to non-
overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent
measures the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space
Measures the chaoticity of the brain waves
where M is the number of times we went through the loop above, and N is the number of time-steps in the fiduciary. NΔt = tn - t0
Embed the data set (EEG). Xi = (x(ti),x(ti+τ),…,x(ti+(p-1)τ))T where τ is the selected time lag between the components of each vector in the phase space, p is the selected dimension of the embedding phase space, and ti [1,T-(p-1) τ].
Pick a point x(t0) somewhere in the middle of the trajectory. Find that point's nearest neighbor. Call that point z0 (t0).
Compute |z0 (t0) - x(t0)| = L0.
Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z0 (ti) - x(ti)| = L0(i) and incrementing i, until L0(i) > ε. Call that value L0' and that time t1.
Find z1 (t1), the “nearest neighbor” of x(t1), and go to step 3. Repeat the procedure to the end of the fiduciary trajectory t = tn, keeping track of the Li and Li' .
2-D Example: Circle of initial conditions evolves into an ellipse.
STLmax Profiles
Pre-Ictal Ictal Post-Ictal
Hidden Synchronization Patterns
By paired-T statistic:Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are
1 2 60
1 2 60
{ max , max , , max }
{ max , max , , max },
i i i i
j j j j
L STL STL STL
L STL STL STL
1 2 60
1 1 2 2 60 60
{ , , , }
{ max max , max max , , max max }
ij i j ij ij ij
i j i j i j
D L L d d d
STL STL STL STL STL STL
Then, we calculate the average value, ,and the sample standard deviation, , of .
ijD
d̂ 2 60{ , , , }ij ij ij ijD d d d
The T-index between EEG signal epochs i and j is defined as ,ˆ
60
ij
ijd
DT
How similar are they?Statistics to quantify the convergence of STLmax
Statistically Quantifying the Convergence
IID (Independent and Identically Distributed) Test
Assumption 1: Within a window of 30 STLmax points, the differences of STLmax values (Dij) between two electrode sites i and j are independent.
To verify this assumption, Employ “portmanteau” test of white noise developed by Ljung and Box.
Assumption 2: Within a wt window of 60 points, the differences of STLmax values between two electrode sites i and j are normally distributed.
To verify this assumption, Employ To check this assumption, we employed the Shapiro-Wilk W test, which is is a well-established and powerful test of departure from normality.
Convergence of STLmax
Models
and are intrinsic parameters.
and ’ are directional coupling strengths.
N = number of oscillators
)()( '
,,,1
ijijji
N
jijiii
i xxzydt
tdx
iiiii yxdt
tdy )(
)()(
iiiiii yxzxdt
tdz
(1)
(2)
(3)
Homoclinic Chaos (Silnikov’s Theorem):
Rössler systems, Lorentz systems, population dynamical systems
STLmax versus time and coupling
Why Feature Selection?
Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to
show the convergence preceding the next seizure.
Optimization: We apply optimization techniques to find a group of
electrode sites such that … They are the most converged (in STLmax) electrode
sites during 10-min window before the seizure They show the dynamical resetting (diverged in
STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode
sites”. Hypothesis:
The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure.
Optimization Problem
Multi-Quadratic Integer Programming
To select critical electrode sites, we formulated this problem as a multi-quadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the
average T-index among electrode sites
a linear constraint to identify the number of critical electrode sites
a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting
1
1
Problem :
Min f( )
s.t.
{0,1}, 1,...,
T
n
ii
T
i
P
x x Qx
x b
x Dx
x i n
x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. xi = 1 if electrode i is selected to be one of the critical electrode
sites. xi = 0 otherwise.
Q is an (nn) matrix, whose each element qij represents the T-index between electrode i and j during 10-minute window before a seizure.
b is an integer constant. (the number of critical electrode sites) D is an (nn) matrix, whose each element dij represents the T-
index between electrode i and j during 10-minute window after a seizure.
α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H0: “`two brain sites acquire identical STLmax values within 10-minute window”
Notation and Modeling
Conventional Linearization Approach for Multi-Quadratic 0-1 Problem
2
i
For each product , we introduce new 0-1 variable ( ).
Note that for 0,1 .
The equivalent linear 0-1 problem is given by:
min
s.
i j ij i j
ii i i i
ij ijj
x x x x x i j
x x x x
q x
i
t.
, for , 1,..., ( )
, for , 1,..., ( )
1 , for , 1,..., ( )
ij i
ij j
i j ij
ij ijj
Ax b
x x i j n i j
x x i j n i j
x x x i j n i j
d x
2
{0,1},0 1, , 1,...,
Note that the number of continuous variables has been increased to ( ).
Note that this problem formulation is computationally inefficient as in
i ijx x i j n
O n
n
creases.
Consider the quadratic 0-1 programming problem
eT = (1,1,…,1) Relax x ≥ 0, we then have the following KKT conditions:
KKT Conditions Approach
Min f( )
s.t.
{0,1}, 1,...,
T
i
x x Qx
Ax b
x i n
. 0
0
0, 0, 0
T
Qx u e y
Ax b
y x
x u y
Min f( )
s.t.
0, 1,...,
T
i
x x Qx
Ax b
x i n
0, , 0Tc A e v
Q is an (nn) matrix.b is an integer constantx is an n-dimensional column vector
Add slack variables a and define s = u.e + a Minimizing slack variables, we can formulate this problem as:
Note that this problem formulation is an efficient approach, as n increases, because it has the SAME number of 0-1 variables (n), and 2n additional continuous variables.
KKT Conditions Approach
Min
0
(1 )
where 0, 0, 0,1 ,
and max
T
iji
j
e s
Qx y s
Ax b
y M x
s y x
M q Q
Min
0
0
0, 0, 0
T
T
e s
Qx y s
Ax b
y x
x s y
Fix x{0,1}
0 (1 )Ty x y M x
For any matrix Q where qij≥0 We want to prove that P and P are equivalent:
Connections Between QIP problems and MILP problems
Problem :
Min
0 (1)
(2)
(1 ) 0 (3)
0, 0, 0,1 (4)
where max
T
T
iji
j
P
e s
Qx y s
Ax b
y M x y x
s y x
M a
Problem :
Min f( )
s.t.
{0,1}, 1,...,
T
i
P
x x Qx
Ax b
x i n
Equivalent
0 0 0
0 0 0
Theorem1: "If has an optimal solution there exist , such that
( , , ) is an optimal solution to ."
: . If is an optimal solution to , it is obviou
P x iff y s
x y s P
PROOF Neccessity x P
0
0 0 0
0 0 0
s that
, : 0, 0 such that 0 (1) and 0 (3) .
Choose and s from the above defined set of and s.t. is minimized.
Let us show that ( , , ) is an optimal solution to .
Mu
T
T
y s y s Qx y s y x
y y s e s
x y s P
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0
0
ltiplying (1) by ( ) , we obtain ( ) ( ) ( ) 0.
Note that from (3), ( ) ( ) 0. We then have ( ) ( ) .
We know that arg min , s.t. , {0,1}. If we can prove that
T T T T
T T T T
T
T
x x Qx x y x s
x y y x x Qx x s
x x Qx Ax b x
e s
0 0 0 0 0( ) (5) , then ( , , ) is an optimal solution to . Tx s x y s P
0 0 0
1 2 1 1 2 2( ) ... ...T Tn n ne s x s s s s x s x s x s
0 0 0
0 0
0 0 0
To prove ( ) (5) , it is sufficient to show that, for any ,
if 0, then 0. We can prove this statement by contradiction.
Proof : Assume that given ( , , ) that is an optimal soluti
T T
i i
e s x s i
x s
x y s
0 0 0
0 0
0
on to ,
0 and 0 for some . ( is minimized)
For any , define vectors and 0, which is not the optimal
solution ( is not minimal). It is clear that ( , ,
Ti i
i i i i
T
P
x s i e s
i y y s s
e s x y s
0 0
0 0 0
) satisfied all contraints
(1) - (4) in . Thus, ( , , ) is feasible and .
This fact contradicts our initial assumption that ( , , ) is an
optimal solution to .
.
T TP x y s e s e s
x y s
P
Sufficiency The proof
.is similar
Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem
with the SAME number of integer variables.
Theoretical Results:MILP formulation for MQIP problem
Problem :1
Min f( )s.t.
{0,1}, 1,...,
P
Tx x Qxb
Tx Dxx i ni
Ax
Problem :1
Min 0 (1)
(2) (1 ) (3) 0 (4)
(5) z '
P
Te sQx y s
by M x
Dx zTe
M x
Ax
z
(6)
, , 0, 0,1 (7)
where max ,
' max
s y z x
M q Qiji jM d Diji j
Equivalent
0 0 0 0 0 0 0 01
1
Theorem2: "If has an optimal solution there exist , , such that ( , , , )
is an optimal solution to ."
: . From the proof of theorem 1, to prove theorem 2 we onl
P x iff y s z x y s z
P
PROOF Neccessity0 0
1
0 0
y need to show that if is an optimal solution to problem , then there exists vector (s.t. 0) and the
following constraints are satisfied
0
ix P z z
Dx z
0
0 0
0
(1)
(2)
' (3)
From (3), note that if 0 then we have
T
i
e z
z M x
x z
0
0 0 0
0
0 (the proof is similar to the one in theorem 1).
Then we obtain ( ) (4).
Since is a real number and every element of the matrix is nonnegative, for all
iT T
i
e z x z
z D i
0 0 0 0
0 0 0 0 0 0
01
where
we have 1, we can choose 0 such that ( ) . We then satisfy (1) and (3).
Multiplying (1) by ( ) , from (4) we obtain ( ) ( ) .
Since is an optimal solution to , (2)
i i i i
T T T T
x z Dx z
x x Dx x z e z
x P
0 0 0 is satisfied: ( )
. .
T Tx Dx e z
Sufficiency The proof is similar
Reference:
• P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004.
Empirical Results:Performance on Larger Problems
Reference:
• W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1 Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004.
Empirical Results:Performance on Larger Problems
Hypothesis: The critical electrode sites should be most likely to show the
convergence in STLmax (drop in T-index below the critical value) again before the next seizure.
The critical electrode sites are electrode sites that are the most converged (in STLmax ) electrode sites during 10-
min window before the seizure show the dynamical resetting (diverged in STLmax ) during 10-min
window after the seizure Simulation:
Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times)
Hypothesis Testing - Simulation
Optimal VS Non-Optimal
Simulation - Results
How to automate the system
Select critical electrode sites after every subsequent seizure
EEG Signals
Give a warning when:T-index value is greaterthan 5, then drops to a value of 2.662 or less
Monitor the averageT-index of the critical electrodes
Continuously calculateSTLmax from multi-channel EEG.
ASWA
Automated Seizure Warning System
Data Characteristics
Performance Evaluation for ASWS
To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning.
Sensitivity =
False Prediction Rate = average number of false warnings per hour
seizures analyzed of #
seizures predicted accurately of #
Performance characteristics of automated seizure warningalgorithm with the best parameter-settings of training data set.
Training Results
ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between:
the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized
the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.
RECEIVER OPERATING CHARACTERISTICS (ROC)
ROC curve analysis for the best parameter settings of 10 patients
Test Results
Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.
Validation of the ASWS algorithm
Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets
Spatial Properties Non-Optimized ASWS – Selecting non-optimal
electrode sites 100 Randomly Selected Electrodes
Prediction Scores: ASWS
Prediction Scores: Surrogate Data and Non-Optimized ASWS
W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.
Prediction Scores: Surrogate Data and Non-Optimal ASWS
Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application The first online real-time seizure prediction system Seizure Prediction
Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average
Ongoing and Future Research Classification of EEGs from normal and epileptic patients Classification of abnormal brain activity Cluster analysis of epileptic brains Analysis on scalp EEGs
W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.
W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. EEG Classification in Epilepsy. To appear in Annals of Operations Research.
W. Chaovalitwongse and P.M. Pardalos. Optimization Approaches to Characterize the Hidden Dynamics of the Epileptic Brain: Seizure Prediction and Localization. To appear in SIAG/OPT Views-and-News.
W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005 .
L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005.
P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004)
W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters)
Reference
Questions?
Thank you
Classification of Brain Activity
Phase Profiles
Entropy H of Attractor
Classification of Physiological States
Nearest Neighbor Time Series Classification
Normal
Pre-Seizure Post-Seizure
A
By paired-T statistic:Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 30 points, 5 minutes) are
1 2 30
1 2 30
{ max , max , , max }
{ max , max , , max },
i i i i
j j j j
L STL STL STL
L STL STL STL
1 2 30
1 1 2 2 30 30
{ , , , }
{ max max , max max , , max max }
ij i j ij ij ij
i j i j i j
D L L d d d
STL STL STL STL STL STL
Then, we calculate the average value, ,and the sample standard deviation, , of .
ijD
d̂ 2 30{ , , , }ij ij ij ijD d d d
The T-index between EEG signal epochs i and j is defined as ,ˆ
30
ij
ijd
DT
Similarity Measure for EEG Time Series – T-test
T-Statistics Distance
The T-index, Txy, between the time series x and y is then defined as:
where E[ ] denotes the average of the value within an epoch of the time series, n is the length of the time series epoch, and σxy is the sample standard deviation of the difference in value of x and y.
Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom.
n
YEXET
xyxy /
][][
Nearest Neighbor Classification Rules
Given an unknown-state epoch of EEG signals A, we calculate statistical distances between the EEG epoch and the groups of Normal, Pre-Seizure, and Post-Seizure EEGs in our database.
EEG sample A will be classified in the group of patient’s states (normal, pre-seizure, and post-seizure) that yields the minimum T-index distance.
Multiple Electrodes = Multiple Decisions Averaging Voting (Majority voting: selects action with maximum
number of votes)
Preliminary Data Set
132 5-minute epochs of pre-seizure EEGs 132 5-minute epochs of post-seizure EEGs 300 5-minute epochs of normal EEGs
Pre-seizure = 0-30 minutes before seizure Post-seizure = 2-10 minutes after seizure Normal = 10 hours away from seizure
Probability of Correct Classifications
Probability of Correct Classifications
Patient State Classification (Voting - Lmax+Phase) - Sensitivity
95.65%
22.73%25.00%
4.35%
72.73%
10.00%
0.00%4.55%
65.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Pre-ictal Post-ictal Inter-ictal
States
Per
cen
tag
e o
f C
lass
ifie
d T
ype
Pre-ictal
Post-ictal
Inter-ictal
Metrics for Performance Evaluation
PREDICTED CLASS
ACTUALCLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
a: TP (true positive); b: FN (false negative);
c: FP (false positive); d: TN (true negative)
Sensitivity and Specificity
Sensitivity measures the fraction of positive cases that are classified as positive.
Specificity measures the fraction of negative cases classified as negative.
Sensitivity = TP/(TP+FN)Specificity = TN/(TN+FP)
Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize.
Maximize the probability of correctly classifying patient states.
False positive rate can be considered as 1-Specificity which one wants to minimize.
ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between:
the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized
the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.
RECEIVER OPERATING CHARACTERISTICS (ROC)
ROC for Different Classification Methods
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity
Sen
sitiv
ity
Voting
ROC – Performance Characteristics
Entropy
PhaseLmax
ROC for Different Classification Methods
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity
Sen
sitiv
ity Voting
Average
ROC – Performance Characteristics
Entropy
PhaseLmax
Entropy
Phase
Lmax
ROC – Performance Characteristics
ROC for Different Classification Methods
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity
Sen
sitiv
ity Voting
Average
L+P+E
Entropy
PhaseLmax
Entropy
Phase
LmaxAverage
Voting
ROC – Performance Characteristics
ROC for Different Classification Methods
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity
Sen
sitiv
ity
Voting
Average
L+P+E
L+P
Entropy
PhaseLmax
Entropy
Phase
LmaxAverage
Voting
AverageVoting
Sensitivity = 95.7%Specificity = 75.4%
Results
Any More Sophisticated Method?
Support Vector Machines2-Class Linearly Separable Case
Mathematical Modeling
Leave-one-out Cross Validation
Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies.
K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.
Classification Results
QP for Clustering
Clustering Epileptic Brains
Hierarchical Clustering
a, b, c, d, e
a d e cb
a, d
b, c
b, c, e
Agglomerative Divisive
Hierarchical Clustering
Agglomerative Divisive a, b, c, d, e
a d e cb
a, d
b, c
b, c, e
Hierarchical Clustering
Agglomerative Divisive a, b, c, d, e
a d e cb
a, d
b, c
b, c, e
Clustering via Concave Quadratic Programming (CCQP) Formulate a clustering problem as a Quadratic
Integer Program (QIP)
where A is an nxn T-index matrix of pairwise distance
λ is a parameter adjusting the degree of similarity within a cluster
xi is a 0-1 decision variable indicating whether or not point i is selected (assigned) to be in the cluster
Advantages In some instances when λ is large enough to make the
quadratic function become concave function. QIP can be converted to a continuous problem (minimizing a
concave quadratic function over a sphere)
CCQP Algorithm
Patient 1: Box Plot of Average Solution
Lmax
Patient 1: Box Plots of Average Solution
Lmax Phase
Patient 2: Box Plots of Average Solution
Lmax Phase
Kruskal-Wallis Test
…is a nonparametric version of the one-way ANOVA
…is an extension of the Wilcoxon rank sum test to more than two groups
…compares samples from two or more groups. …compares the medians of the samples in X,
and returns the p-value for the null hypothesis that all samples are drawn from the same population (or equivalently, from different populations with the same distribution).
Assumptions
The Kruskal-Wallis test makes the following assumptions about the data in X: All samples come from populations having the
same continuous distribution, apart from possibly different locations due to group effects.
All observations are mutually independent. The classical one-way ANOVA test replaces
the first assumption with the stronger assumption that the populations have normal distributions.
T-test Test the hypothesis of
the difference in means of two samples
Determine whether two samples, x and y, could have the same mean when the standard deviations are unknown but assumed equal.
Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom.
Results – Significance Level
Concluding Remarks
Overview of Epilepsy Research Applications of Data Mining and Optimization
Techniques Interplay between theory and application Quadratic Programming for Feature Selection Quadratic Programming for Clustering Long-Term Monitoring Analysis