optimization and data mining in epilepsy research w. art chaovalitwongse assistant professor...

Post on 22-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Optimization and Data Mining in Epilepsy Research

W. Art Chaovalitwongse

Assistant Professor

Industrial and Systems Engineering

Rutgers University

Acknowledgements

Comprehensive Epilepsy Center, St. Peter’s University Hospital Rajesh C. Sachdeo, MD Deepak Tikku, MD

Brain Institute, University of Florida Panos M. Pardalos, PhD J. Chris Sackellares, MD Paul R. Carney, MD

Bioengineering, Arizona State University Leonidas D. Iasemidis, PhD

Agenda

Background: Epilepsy Electroencephalogram (EEG) Time Series Chaos Theory: Dimensionality Reduction Seizure Prediction

Feature Selection Process Monitoring

Concluding Remarks

Facts About Epilepsy

At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy.

Epilepsy is the second most common brain disorder (after stroke)

The hallmark of epilepsy is recurrent seizures. Epileptic seizures occur when a massive group of

neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

Epileptic Seizures

Seizures usually occur spontaneously, in the absence of external triggers.

Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes.

Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

Rationale

Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion in the U.S. in associated health care costs and losses in employment, wages, and productivity.

Cost per patient ranged from $4,272 for persons with remission after initial diagnosis and treatment to $138,602 for persons with intractable and frequent seizures.

How To Fight Epilepsy Anti-Epileptic Drugs (AEDs)

Mainstay of epilepsy treatment Approximately 25 to 30% remain unresponsive

Epilepsy surgery Require long-term invasive EEG monitoring 50% of pre-surgical candidates do not undergo respective surgery

Multiple epileptogenic zones Epileptogenic zone located in functional brain tissue

Only 60% of surgery cases result in seizure free Electrical Stimulation (Vagus nerve stimulator)

Parameters (amplitude and duration of stimulation) arbitrarily adjusted As effective as one additional AED dose Side Effects

Seizure Prediction?

Vagus Nerve Stimulator

Open Problems

Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors

preceding seizures? If yes, what measurement can be used to

indicate these pre-cursors? Does normal brain activity during differ from

abnormal brain activity?

Electroencephalogram (EEG) …is a tool for evaluating the physiological state of

the brain. …offers excellent spatial and temporal resolution to

characterize rapidly changing electrical activity of brain activation

…captures voltage potentials produced by brain cells while communicating.

In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

Depth and Subdural electrode placement for EEG recordings

LOF

LOFROF

LTDRTD

LTD

LST

LSTRST

Scalp EEG Data Acquisition

EEG Data Acquisition

Typical EEG Time Series Data

Goals of Research

Test the hypothesis that seizures are not a random process.

Employ data mining techniques to differentiate normal and abnormal EEGs

Employ quantitative analysis to identify seizure pre-cursors

Demonstrate that seizures could be predicted Develop a closed-loop seizure control device

(Brain Pacemaker)

10-second EEGs: Seizure EvolutionNormal Pre-Seizure

Seizure Post-Seizure

Dimensionality Reduction

The brain is a non-stationary system. EEG time series is non-stationary. With 200 Hz sampling, 1 hour of EEGs is

comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB(assume 16-bit ASCI format) 1 day = 1 hour*24 1 week = 1 hour*168 20 patients = 1 hour*3360

Kilobytes → Megabytes → Gigabytes → Terabytes

Dimensionality Reduction Using Chaos Theory

Chaos in Brain? Chaos in Stock Market? Chaos in Foreign Exchanges (Swedish Currency)? Measure the brain dynamics from EEG time series. Apply dynamical measures (based on chaos theory) to non-

overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent

measures the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space

Measures the chaoticity of the brain waves

where M is the number of times we went through the loop above, and N is the number of time-steps in the fiduciary. NΔt = tn - t0

Embed the data set (EEG). Xi = (x(ti),x(ti+τ),…,x(ti+(p-1)τ))T where τ is the selected time lag between the components of each vector in the phase space, p is the selected dimension of the embedding phase space, and ti [1,T-(p-1) τ].

Pick a point x(t0) somewhere in the middle of the trajectory. Find that point's nearest neighbor. Call that point z0 (t0).

Compute |z0 (t0) - x(t0)| = L0.

Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z0 (ti) - x(ti)| = L0(i) and incrementing i, until L0(i) > ε. Call that value L0' and that time t1.

Find z1 (t1), the “nearest neighbor” of x(t1), and go to step 3. Repeat the procedure to the end of the fiduciary trajectory t = tn, keeping track of the Li and Li' .

2-D Example: Circle of initial conditions evolves into an ellipse.

STLmax Profiles

Pre-Ictal Ictal Post-Ictal

Hidden Synchronization Patterns

By paired-T statistic:Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are

1 2 60

1 2 60

{ max , max , , max }

{ max , max , , max },

i i i i

j j j j

L STL STL STL

L STL STL STL

1 2 60

1 1 2 2 60 60

{ , , , }

{ max max , max max , , max max }

ij i j ij ij ij

i j i j i j

D L L d d d

STL STL STL STL STL STL

Then, we calculate the average value, ,and the sample standard deviation, , of .

ijD

d̂ 2 60{ , , , }ij ij ij ijD d d d

The T-index between EEG signal epochs i and j is defined as ,ˆ

60

ij

ijd

DT

How similar are they?Statistics to quantify the convergence of STLmax

Statistically Quantifying the Convergence

IID (Independent and Identically Distributed) Test

Assumption 1: Within a window of 30 STLmax points, the differences of STLmax values (Dij) between two electrode sites i and j are independent.

To verify this assumption, Employ “portmanteau” test of white noise developed by Ljung and Box.

Assumption 2: Within a wt window of 60 points, the differences of STLmax values between two electrode sites i and j are normally distributed.

To verify this assumption, Employ To check this assumption, we employed the Shapiro-Wilk W test, which is is a well-established and powerful test of departure from normality.

Convergence of STLmax

Models

and are intrinsic parameters.

and ’ are directional coupling strengths.

N = number of oscillators

)()( '

,,,1

ijijji

N

jijiii

i xxzydt

tdx

iiiii yxdt

tdy )(

)()(

iiiiii yxzxdt

tdz

(1)

(2)

(3)

Homoclinic Chaos (Silnikov’s Theorem):

Rössler systems, Lorentz systems, population dynamical systems

STLmax versus time and coupling

Why Feature Selection?

Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to

show the convergence preceding the next seizure.

Optimization: We apply optimization techniques to find a group of

electrode sites such that … They are the most converged (in STLmax) electrode

sites during 10-min window before the seizure They show the dynamical resetting (diverged in

STLmax) during 10-min window after the seizure. Such electrode sites are defined as “critical electrode

sites”. Hypothesis:

The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure.

Optimization Problem

Multi-Quadratic Integer Programming

To select critical electrode sites, we formulated this problem as a multi-quadratic integer (0-1) programming (MQIP) problem with … objective function to minimize the

average T-index among electrode sites

a linear constraint to identify the number of critical electrode sites

a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

1

1

Problem :

Min f( )

s.t.

{0,1}, 1,...,

T

n

ii

T

i

P

x x Qx

x b

x Dx

x i n

x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. xi = 1 if electrode i is selected to be one of the critical electrode

sites. xi = 0 otherwise.

Q is an (nn) matrix, whose each element qij represents the T-index between electrode i and j during 10-minute window before a seizure.

b is an integer constant. (the number of critical electrode sites) D is an (nn) matrix, whose each element dij represents the T-

index between electrode i and j during 10-minute window after a seizure.

α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H0: “`two brain sites acquire identical STLmax values within 10-minute window”

Notation and Modeling

Conventional Linearization Approach for Multi-Quadratic 0-1 Problem

2

i

For each product , we introduce new 0-1 variable ( ).

Note that for 0,1 .

The equivalent linear 0-1 problem is given by:

min

s.

i j ij i j

ii i i i

ij ijj

x x x x x i j

x x x x

q x

i

t.

, for , 1,..., ( )

, for , 1,..., ( )

1 , for , 1,..., ( )

ij i

ij j

i j ij

ij ijj

Ax b

x x i j n i j

x x i j n i j

x x x i j n i j

d x

2

{0,1},0 1, , 1,...,

Note that the number of continuous variables has been increased to ( ).

Note that this problem formulation is computationally inefficient as in

i ijx x i j n

O n

n

creases.

Consider the quadratic 0-1 programming problem

eT = (1,1,…,1) Relax x ≥ 0, we then have the following KKT conditions:

KKT Conditions Approach

Min f( )

s.t.

{0,1}, 1,...,

T

i

x x Qx

Ax b

x i n

. 0

0

0, 0, 0

T

Qx u e y

Ax b

y x

x u y

Min f( )

s.t.

0, 1,...,

T

i

x x Qx

Ax b

x i n

0, , 0Tc A e v

Q is an (nn) matrix.b is an integer constantx is an n-dimensional column vector

Add slack variables a and define s = u.e + a Minimizing slack variables, we can formulate this problem as:

Note that this problem formulation is an efficient approach, as n increases, because it has the SAME number of 0-1 variables (n), and 2n additional continuous variables.

KKT Conditions Approach

Min

0

(1 )

where 0, 0, 0,1 ,

and max

T

iji

j

e s

Qx y s

Ax b

y M x

s y x

M q Q

Min

0

0

0, 0, 0

T

T

e s

Qx y s

Ax b

y x

x s y

Fix x{0,1}

0 (1 )Ty x y M x

For any matrix Q where qij≥0 We want to prove that P and P are equivalent:

Connections Between QIP problems and MILP problems

Problem :

Min

0 (1)

(2)

(1 ) 0 (3)

0, 0, 0,1 (4)

where max

T

T

iji

j

P

e s

Qx y s

Ax b

y M x y x

s y x

M a

Problem :

Min f( )

s.t.

{0,1}, 1,...,

T

i

P

x x Qx

Ax b

x i n

Equivalent

0 0 0

0 0 0

Theorem1: "If has an optimal solution there exist , such that

( , , ) is an optimal solution to ."

: . If is an optimal solution to , it is obviou

P x iff y s

x y s P

PROOF Neccessity x P

0

0 0 0

0 0 0

s that

, : 0, 0 such that 0 (1) and 0 (3) .

Choose and s from the above defined set of and s.t. is minimized.

Let us show that ( , , ) is an optimal solution to .

Mu

T

T

y s y s Qx y s y x

y y s e s

x y s P

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0

0

ltiplying (1) by ( ) , we obtain ( ) ( ) ( ) 0.

Note that from (3), ( ) ( ) 0. We then have ( ) ( ) .

We know that arg min , s.t. , {0,1}. If we can prove that

T T T T

T T T T

T

T

x x Qx x y x s

x y y x x Qx x s

x x Qx Ax b x

e s

0 0 0 0 0( ) (5) , then ( , , ) is an optimal solution to . Tx s x y s P

0 0 0

1 2 1 1 2 2( ) ... ...T Tn n ne s x s s s s x s x s x s

0 0 0

0 0

0 0 0

To prove ( ) (5) , it is sufficient to show that, for any ,

if 0, then 0. We can prove this statement by contradiction.

Proof : Assume that given ( , , ) that is an optimal soluti

T T

i i

e s x s i

x s

x y s

0 0 0

0 0

0

on to ,

0 and 0 for some . ( is minimized)

For any , define vectors and 0, which is not the optimal

solution ( is not minimal). It is clear that ( , ,

Ti i

i i i i

T

P

x s i e s

i y y s s

e s x y s

0 0

0 0 0

) satisfied all contraints

(1) - (4) in . Thus, ( , , ) is feasible and .

This fact contradicts our initial assumption that ( , , ) is an

optimal solution to .

.

T TP x y s e s e s

x y s

P

Sufficiency The proof

.is similar

Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem

with the SAME number of integer variables.

Theoretical Results:MILP formulation for MQIP problem

Problem :1

Min f( )s.t.

{0,1}, 1,...,

P

Tx x Qxb

Tx Dxx i ni

Ax

Problem :1

Min 0 (1)

(2) (1 ) (3) 0 (4)

(5) z '

P

Te sQx y s

by M x

Dx zTe

M x

Ax

z

(6)

, , 0, 0,1 (7)

where max ,

' max

s y z x

M q Qiji jM d Diji j

Equivalent

0 0 0 0 0 0 0 01

1

Theorem2: "If has an optimal solution there exist , , such that ( , , , )

is an optimal solution to ."

: . From the proof of theorem 1, to prove theorem 2 we onl

P x iff y s z x y s z

P

PROOF Neccessity0 0

1

0 0

y need to show that if is an optimal solution to problem , then there exists vector (s.t. 0) and the

following constraints are satisfied

0

ix P z z

Dx z

0

0 0

0

(1)

(2)

' (3)

From (3), note that if 0 then we have

T

i

e z

z M x

x z

0

0 0 0

0

0 (the proof is similar to the one in theorem 1).

Then we obtain ( ) (4).

Since is a real number and every element of the matrix is nonnegative, for all

iT T

i

e z x z

z D i

0 0 0 0

0 0 0 0 0 0

01

where

we have 1, we can choose 0 such that ( ) . We then satisfy (1) and (3).

Multiplying (1) by ( ) , from (4) we obtain ( ) ( ) .

Since is an optimal solution to , (2)

i i i i

T T T T

x z Dx z

x x Dx x z e z

x P

0 0 0 is satisfied: ( )

. .

T Tx Dx e z

Sufficiency The proof is similar

Reference:

• P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004.

Empirical Results:Performance on Larger Problems

Reference:

• W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1 Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004.

Empirical Results:Performance on Larger Problems

Hypothesis: The critical electrode sites should be most likely to show the

convergence in STLmax (drop in T-index below the critical value) again before the next seizure.

The critical electrode sites are electrode sites that are the most converged (in STLmax ) electrode sites during 10-

min window before the seizure show the dynamical resetting (diverged in STLmax ) during 10-min

window after the seizure Simulation:

Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T-index below the critical value) before the next seizure between the electrode sites, which are Critical electrode sites Randomly selected (5,000 times)

Hypothesis Testing - Simulation

Optimal VS Non-Optimal

Simulation - Results

How to automate the system

Select critical electrode sites after every subsequent seizure

EEG Signals

Give a warning when:T-index value is greaterthan 5, then drops to a value of 2.662 or less

Monitor the averageT-index of the critical electrodes

Continuously calculateSTLmax from multi-channel EEG.

ASWA

Automated Seizure Warning System

Data Characteristics

Performance Evaluation for ASWS

To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning.

Sensitivity =

False Prediction Rate = average number of false warnings per hour

seizures analyzed of #

seizures predicted accurately of #

Performance characteristics of automated seizure warningalgorithm with the best parameter-settings of training data set.

Training Results

ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between:

the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized

the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.

RECEIVER OPERATING CHARACTERISTICS (ROC)

ROC curve analysis for the best parameter settings of 10 patients

Test Results

Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.

Validation of the ASWS algorithm

Temporal Properties Surrogate Seizure Time Data Set 100 Surrogate Data Sets

Spatial Properties Non-Optimized ASWS – Selecting non-optimal

electrode sites 100 Randomly Selected Electrodes

Prediction Scores: ASWS

Prediction Scores: Surrogate Data and Non-Optimized ASWS

W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.

Prediction Scores: Surrogate Data and Non-Optimal ASWS

Concluding Remarks Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application The first online real-time seizure prediction system Seizure Prediction

Predicting ~70% of temporal lobe seizures on average Giving a false alarm rate of ~0.16 per hour on average

Ongoing and Future Research Classification of EEGs from normal and epileptic patients Classification of abnormal brain activity Cluster analysis of epileptic brains Analysis on scalp EEGs

W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy. Epilepsy Research, 64, 93-133, 2005.

W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. EEG Classification in Epilepsy. To appear in Annals of Operations Research.

W. Chaovalitwongse and P.M. Pardalos. Optimization Approaches to Characterize the Hidden Dynamics of the Epileptic Brain: Seizure Prediction and Localization. To appear in SIAG/OPT Views-and-News.

W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2-3): 383-394, 2005 .

L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005.

P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS Pierskalla Best Paper Award 2004)

W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004. (Rank 5th in Top 25 Articles in Operations Research Letters)

Reference

Questions?

Thank you

Classification of Brain Activity

Phase Profiles

Entropy H of Attractor

Classification of Physiological States

Nearest Neighbor Time Series Classification

Normal

Pre-Seizure Post-Seizure

A

By paired-T statistic:Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 30 points, 5 minutes) are

1 2 30

1 2 30

{ max , max , , max }

{ max , max , , max },

i i i i

j j j j

L STL STL STL

L STL STL STL

1 2 30

1 1 2 2 30 30

{ , , , }

{ max max , max max , , max max }

ij i j ij ij ij

i j i j i j

D L L d d d

STL STL STL STL STL STL

Then, we calculate the average value, ,and the sample standard deviation, , of .

ijD

d̂ 2 30{ , , , }ij ij ij ijD d d d

The T-index between EEG signal epochs i and j is defined as ,ˆ

30

ij

ijd

DT

Similarity Measure for EEG Time Series – T-test

T-Statistics Distance

The T-index, Txy, between the time series x and y is then defined as:

where E[ ] denotes the average of the value within an epoch of the time series, n is the length of the time series epoch, and σxy is the sample standard deviation of the difference in value of x and y.

Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom.

n

YEXET

xyxy /

][][

Nearest Neighbor Classification Rules

Given an unknown-state epoch of EEG signals A, we calculate statistical distances between the EEG epoch and the groups of Normal, Pre-Seizure, and Post-Seizure EEGs in our database.

EEG sample A will be classified in the group of patient’s states (normal, pre-seizure, and post-seizure) that yields the minimum T-index distance.

Multiple Electrodes = Multiple Decisions Averaging Voting (Majority voting: selects action with maximum

number of votes)

Preliminary Data Set

132 5-minute epochs of pre-seizure EEGs 132 5-minute epochs of post-seizure EEGs 300 5-minute epochs of normal EEGs

Pre-seizure = 0-30 minutes before seizure Post-seizure = 2-10 minutes after seizure Normal = 10 hours away from seizure

Probability of Correct Classifications

Probability of Correct Classifications

Patient State Classification (Voting - Lmax+Phase) - Sensitivity

95.65%

22.73%25.00%

4.35%

72.73%

10.00%

0.00%4.55%

65.00%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Pre-ictal Post-ictal Inter-ictal

States

Per

cen

tag

e o

f C

lass

ifie

d T

ype

Pre-ictal

Post-ictal

Inter-ictal

Metrics for Performance Evaluation

PREDICTED CLASS

ACTUALCLASS

Class=Yes Class=No

Class=Yes a b

Class=No c d

a: TP (true positive); b: FN (false negative);

c: FP (false positive); d: TN (true negative)

Sensitivity and Specificity

Sensitivity measures the fraction of positive cases that are classified as positive.

Specificity measures the fraction of negative cases classified as negative.

Sensitivity = TP/(TP+FN)Specificity = TN/(TN+FP)

Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize.

Maximize the probability of correctly classifying patient states.

False positive rate can be considered as 1-Specificity which one wants to minimize.

ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between:

the false positive rate (1-Specificity, plotted on X-axis) that needs to be minimized

the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.

RECEIVER OPERATING CHARACTERISTICS (ROC)

ROC for Different Classification Methods

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity

Sen

sitiv

ity

Voting

ROC – Performance Characteristics

Entropy

PhaseLmax

ROC for Different Classification Methods

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity

Sen

sitiv

ity Voting

Average

ROC – Performance Characteristics

Entropy

PhaseLmax

Entropy

Phase

Lmax

ROC – Performance Characteristics

ROC for Different Classification Methods

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity

Sen

sitiv

ity Voting

Average

L+P+E

Entropy

PhaseLmax

Entropy

Phase

LmaxAverage

Voting

ROC – Performance Characteristics

ROC for Different Classification Methods

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.0001-Specificity

Sen

sitiv

ity

Voting

Average

L+P+E

L+P

Entropy

PhaseLmax

Entropy

Phase

LmaxAverage

Voting

AverageVoting

Sensitivity = 95.7%Specificity = 75.4%

Results

Any More Sophisticated Method?

Support Vector Machines2-Class Linearly Separable Case

Mathematical Modeling

Leave-one-out Cross Validation

Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies.

K-fold cross validation: Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.

Classification Results

QP for Clustering

Clustering Epileptic Brains

Hierarchical Clustering

a, b, c, d, e

a d e cb

a, d

b, c

b, c, e

Agglomerative Divisive

Hierarchical Clustering

Agglomerative Divisive a, b, c, d, e

a d e cb

a, d

b, c

b, c, e

Hierarchical Clustering

Agglomerative Divisive a, b, c, d, e

a d e cb

a, d

b, c

b, c, e

Clustering via Concave Quadratic Programming (CCQP) Formulate a clustering problem as a Quadratic

Integer Program (QIP)

where A is an nxn T-index matrix of pairwise distance

λ is a parameter adjusting the degree of similarity within a cluster

xi is a 0-1 decision variable indicating whether or not point i is selected (assigned) to be in the cluster

Advantages In some instances when λ is large enough to make the

quadratic function become concave function. QIP can be converted to a continuous problem (minimizing a

concave quadratic function over a sphere)

CCQP Algorithm

Patient 1: Box Plot of Average Solution

Lmax

Patient 1: Box Plots of Average Solution

Lmax Phase

Patient 2: Box Plots of Average Solution

Lmax Phase

Kruskal-Wallis Test

…is a nonparametric version of the one-way ANOVA

…is an extension of the Wilcoxon rank sum test to more than two groups

…compares samples from two or more groups. …compares the medians of the samples in X,

and returns the p-value for the null hypothesis that all samples are drawn from the same population (or equivalently, from different populations with the same distribution).

Assumptions

The Kruskal-Wallis test makes the following assumptions about the data in X: All samples come from populations having the

same continuous distribution, apart from possibly different locations due to group effects.

All observations are mutually independent. The classical one-way ANOVA test replaces

the first assumption with the stronger assumption that the populations have normal distributions.

T-test Test the hypothesis of

the difference in means of two samples

Determine whether two samples, x and y, could have the same mean when the standard deviations are unknown but assumed equal.

Asymptotically, Txy index follows a t-distribution with n-1 degrees of freedom.

Results – Significance Level

Concluding Remarks

Overview of Epilepsy Research Applications of Data Mining and Optimization

Techniques Interplay between theory and application Quadratic Programming for Feature Selection Quadratic Programming for Clustering Long-Term Monitoring Analysis

top related