knowledge based system

9
Computer-aided diagnosis of diabetic subjects by heart rate variability signals using discrete wavelet transform method U. Rajendra Acharya a,b , Vidya K. Sudarshan a,, Dhanjoo N. Ghista c , Wei Jie Eugene Lim a , Filippo Molinari d , Meena Sankaranarayanan e a Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore 599489, Singapore b Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Malaysia c University 2020 Foundation, MA, USA d Biolab, Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy e Department of Mathematics, Anand Institute of Higher Technology, Kazhipattur, Chennai 603 103, India article info Article history: Received 9 June 2014 Received in revised form 3 February 2015 Accepted 5 February 2015 Available online 12 February 2015 Keywords: Diabetes HRV Classifier DWT Feature extraction Feature ranking abstract Diabetes Mellitus (DM), a chronic lifelong condition, is characterized by increased blood sugar levels. As there is no cure for DM, the major focus lies on controlling the disease. Therefore, DM diagnosis and treat- ment is of great importance. The most common complications of DM include retinopathy, neuropathy, nephropathy and cardiomyopathy. Diabetes causes cardiovascular autonomic neuropathy that affects the Heart Rate Variability (HRV). Hence, in the absence of other causes, the HRV analysis can be used to diagnose diabetes. The present work aims at developing an automated system for classification of nor- mal and diabetes classes by using the heart rate (HR) information extracted from the Electrocardiogram (ECG) signals. The spectral analysis of HRV recognizes patients with autonomic diabetic neuropathy, and gives an earlier diagnosis of impairment of the Autonomic Nervous System (ANS). Significant correlations with the impaired ANS are observed of the HRV spectral indices obtained by using the Discrete Wavelet Transform (DWT) method. Herein, in order to diagnose and detect DM automatically, we have performed DWT decomposition up to 5 levels, and extracted the energy, sample entropy, approximation entropy, kurtosis and skewness features at various detailed coefficient levels of the DWT. We have extracted relative wavelet energy and entropy features up to the 5th level of DWT coefficients extracted from HR signals. These features are ranked by using various ranking methods, namely, Bhattacharyya space algorithm, t-test, Wilcoxon test, Receiver Operating Curve (ROC) and entropy. The ranked features are then fed into different classifiers, that include Decision Tree (DT), K-Nearest Neighbor (KNN), Naïve Bayes (NBC) and Support Vector Machine (SVM). Our results have shown maxi- mum diagnostic differentiation performance by using a minimum number of features. With our system, we have obtained an average accuracy of 92.02%, sensitivity of 92.59% and specificity of 91.46%, by using DT classifier with ten-fold cross validation. Ó 2015 Elsevier B.V. All rights reserved. 1. Introduction According to the International Diabetes Federation (IDF), it is estimated that in 2013 a total of 381 million people were diagnosed with diabetes across the globe, out of which 23 million people are from Southeast Asian countries [26]. Due to lack of finance or access to healthcare, most of the populations around the world are una- ware that they may be suffering from diabetes [26]. Statistics shows that around 1.9 million people are diagnosed with diabetes in USA every year and 79 million have pre-diabetic conditions [7]. By 2030, the number of diabetes subjects is estimated to get almost double (2.8% in 2000 and 4.4% in 2030), as its incidence is increasing rapid- ly every year Sarah et al. [47]. Diabetes and its complications have shown a notable impact on individuals, families, and health sys- tems and countries’ economy. The USA alone spends around $245 billion annually on the diagnosed diabetes patients. It is pre- dicted that by 2050, 1 in 3 Americans adults may have diabetes if the current tendency is continued [7,10]. Diabetes mellitus (DM) is a condition that is defined by hyper- glycemia state (blood glucose level), which in turn leads to microvascular, and macrovascular damage [60]. Even though, finding a cure for this DM condition is difficult, emphasis is laid on early diagnosis of DM. In this regard, it is well known that a http://dx.doi.org/10.1016/j.knosys.2015.02.005 0950-7051/Ó 2015 Elsevier B.V. All rights reserved. Corresponding author. Tel.: +65 64608393. E-mail address: [email protected] (K.S. Vidya). Knowledge-Based Systems 81 (2015) 56–64 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Upload: jay

Post on 17-Feb-2016

13 views

Category:

Documents


0 download

DESCRIPTION

technical paper

TRANSCRIPT

Page 1: knowledge based system

Knowledge-Based Systems 81 (2015) 56–64

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier .com/locate /knosys

Computer-aided diagnosis of diabetic subjects by heart rate variabilitysignals using discrete wavelet transform method

http://dx.doi.org/10.1016/j.knosys.2015.02.0050950-7051/� 2015 Elsevier B.V. All rights reserved.

⇑ Corresponding author. Tel.: +65 64608393.E-mail address: [email protected] (K.S. Vidya).

U. Rajendra Acharya a,b, Vidya K. Sudarshan a,⇑, Dhanjoo N. Ghista c, Wei Jie Eugene Lim a, Filippo Molinari d,Meena Sankaranarayanan e

a Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore 599489, Singaporeb Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Malaysiac University 2020 Foundation, MA, USAd Biolab, Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italye Department of Mathematics, Anand Institute of Higher Technology, Kazhipattur, Chennai 603 103, India

a r t i c l e i n f o

Article history:Received 9 June 2014Received in revised form 3 February 2015Accepted 5 February 2015Available online 12 February 2015

Keywords:DiabetesHRVClassifierDWTFeature extractionFeature ranking

a b s t r a c t

Diabetes Mellitus (DM), a chronic lifelong condition, is characterized by increased blood sugar levels. Asthere is no cure for DM, the major focus lies on controlling the disease. Therefore, DM diagnosis and treat-ment is of great importance. The most common complications of DM include retinopathy, neuropathy,nephropathy and cardiomyopathy. Diabetes causes cardiovascular autonomic neuropathy that affectsthe Heart Rate Variability (HRV). Hence, in the absence of other causes, the HRV analysis can be usedto diagnose diabetes. The present work aims at developing an automated system for classification of nor-mal and diabetes classes by using the heart rate (HR) information extracted from the Electrocardiogram(ECG) signals. The spectral analysis of HRV recognizes patients with autonomic diabetic neuropathy, andgives an earlier diagnosis of impairment of the Autonomic Nervous System (ANS). Significant correlationswith the impaired ANS are observed of the HRV spectral indices obtained by using the Discrete WaveletTransform (DWT) method. Herein, in order to diagnose and detect DM automatically, we have performedDWT decomposition up to 5 levels, and extracted the energy, sample entropy, approximation entropy,kurtosis and skewness features at various detailed coefficient levels of the DWT. We have extractedrelative wavelet energy and entropy features up to the 5th level of DWT coefficients extracted fromHR signals. These features are ranked by using various ranking methods, namely, Bhattacharyya spacealgorithm, t-test, Wilcoxon test, Receiver Operating Curve (ROC) and entropy.

The ranked features are then fed into different classifiers, that include Decision Tree (DT), K-NearestNeighbor (KNN), Naïve Bayes (NBC) and Support Vector Machine (SVM). Our results have shown maxi-mum diagnostic differentiation performance by using a minimum number of features. With our system,we have obtained an average accuracy of 92.02%, sensitivity of 92.59% and specificity of 91.46%, by usingDT classifier with ten-fold cross validation.

� 2015 Elsevier B.V. All rights reserved.

1. Introduction

According to the International Diabetes Federation (IDF), it isestimated that in 2013 a total of 381 million people were diagnosedwith diabetes across the globe, out of which 23 million people arefrom Southeast Asian countries [26]. Due to lack of finance or accessto healthcare, most of the populations around the world are una-ware that they may be suffering from diabetes [26]. Statistics showsthat around 1.9 million people are diagnosed with diabetes in USAevery year and 79 million have pre-diabetic conditions [7]. By 2030,

the number of diabetes subjects is estimated to get almost double(2.8% in 2000 and 4.4% in 2030), as its incidence is increasing rapid-ly every year Sarah et al. [47]. Diabetes and its complications haveshown a notable impact on individuals, families, and health sys-tems and countries’ economy. The USA alone spends around$245 billion annually on the diagnosed diabetes patients. It is pre-dicted that by 2050, 1 in 3 Americans adults may have diabetes ifthe current tendency is continued [7,10].

Diabetes mellitus (DM) is a condition that is defined by hyper-glycemia state (blood glucose level), which in turn leads tomicrovascular, and macrovascular damage [60]. Even though,finding a cure for this DM condition is difficult, emphasis is laidon early diagnosis of DM. In this regard, it is well known that a

Page 2: knowledge based system

U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64 57

person with diabetes exhibits autonomic neuropathy (AN), damageto the nervous system or cardiovascular autonomic neuropathy(CAN), a well-known complication of DM that affects the centraland peripheral vascular systems and causes abnormalities in theheart rate signal [1]. Thus, diabetes can also be diagnosed bystudying the heart rate variability.

Concerning heart rate variability, the heart rate (HR), anon-stationary/nonlinear signal, is obtained by calculating the timeelapsed between two ventricular contractions or the time betweentwo consecutive R-waves (R–R interval) on the ECG signals [27].The HR Variability (HRV) is one of the reliable methods forqualifying physiological dysfunction in terms of the condition ofsympathetic and parasympathetic nervous system [6,25]. The ana-lysis of HRV enables us to evaluate overall cardiac health in termsof the heart rate regulation, based on the status of the autonomicnervous system responsible for regulating cardiac activity [37].

Spectral analysis of the short-term HRV enables quantitativeevaluation of the neurologic oscillations, and delivers values forneural regulation of heart rate [9,51,34]. The spectral analysis ofHRV (spectral parameters like the power spectrum of HRV signal)recognizes patients with autonomic diabetic neuropathy, and givesan earlier diagnosis of impairment of the autonomic nervoussystem (ANS) [15,21]. Significant correlations are observedbetween impaired ANS and the HRV indices obtained by spectralanalyses using nonparametric and parametric methods namely,fast Fourier Transform (FFT) and autoregressive (AR) methodrespectively [12]. Time–frequency domain analysis of HRV makesit easier to quantify the ANS activity in DM subjects [48]. Eventhough the autonomic functions are better assessed by using fre-quency domain features, the accuracy of spectral power is limitedby the low level of the signal to noise ratio [6].

Nonlinear dynamic techniques are used in HRV signal analysisto circumvent the limitations of time and frequency domain analy-sis [4]. Nonlinear methods are needed for the analysis of nonlinearsignals and systems [35]. The non-linear methods have beenapplied in HRV analysis [50,30] to predict diabetes [14,5,20] andcardiovascular disease (CVD) [23]. Nonlinear techniques can becoupled to frequency analysis techniques. Among all of thesetechniques, the DWT has the advantage of providing multipleresolutions. This method provides discrimination between twodifferent signals with the same spectrum magnitude, thus distin-guishing the subtle changes in the signals [17,2,56].

In normal and diabetic subjects, the HRV signal has been used tostudy and measure the activity and symptoms of the cardiacparasympathetic nervous system [41]. Their study reported thatdiabetic subjects’ exhibit diminished cardiac parasympatheticactivity before the appearance of autonomic neuropathy symp-toms. Several studies conducted (Table 3) have reported thatdiabetic patients are characterized by reduced HRV, with less infor-mation about HRV across the spectrum of blood glucose levels. In2000, Singh et al. [52] studied the correlation between hyper-glycemia (increased blood glucose level) and reduced HRV. Theyreported reduced HRV variables in DM subjects and in subjectswith impaired blood (plasma) glucose levels by using time domainfeatures.

Awdah et al. [11] studied diabetic subjects with and withoutautonomic neuropathy by using the time domain analysis ofHRV. Their results showed significant decrease in all the timedomain measures for diabetic subjects with and without diabeticneuropathy compared to the control class. In 2005, Flynn et al.[22] used detrended fluctuation analysis (DFA) to study the HRVchanges over a short time ECG recordings of 20 min. Their studyreported reduced values of HRV for diabetic subjects. Chemlaet al. [16] used autoregressive (AR) methods to study the HRVspectral components in diabetic patients. They found that diabeticsubjects exhibit decreased spectral values, and that FFT method is

more suitable for evaluation of short-term HRV spectral compo-nents in diabetic subjects.

Analysis in the time and frequency domain of RR interval hasbeen carried out by Ahmad Seyd et al. [8], to quantify the auto-nomic nervous system (ANS) in DM patients. Significant differ-ences in high frequency (HF) power, very low frequency (VLF)and low frequency (LF) power were noted between DM patientsand normal classes in the frequency domain analysis of extracteddata (NN interval – normal to normal interval). This study alsoobserved significant difference in time domain analysis of rootmean square of successive NN interval differences (RMSSD) andthe standard deviation of NN interval (SDNN) between the DMand control groups.

Multiscale entropy (MSE) analysis method has also been usedto diagnose the autonomic dysregulation in DM patients byTrunkvalterova et al. [58]. Their study performed the analysis ofheart rate (HR) signal, systolic and diastolic blood pressure (SBPand DBP) signals in both normal and diabetic subjects, to evaluatethe SampEn and linear measures. They reported that in youngpatients with DM, the changes in cardiovascular control weredetected by the MSE analysis of SBP and DBP oscillations and HRsignals. The relationship between HRV and duration of type 2 dia-betes based on sex-differences was studied by Nolan et al. [38]. Bythis study result, an inverse relationship was reported between theType 1 and Type 2 diabetes duration and HRV measures amongmale subjects only. The inverse association of HRV with increasingage of diabetes diagnosis, as well as increasing severity of coronaryheart disease risk and obesity was observed in female subjects.

Then in 2012, Faust et al. [20] used time and frequency domainand nonlinear methods to study the HRV signals of both diabeticand normal subjects; they have proposed unique ranges for variousfeatures of the two classes. The HRV parameter in diabetic andnon-diabetic patients with renal transplantation has been investi-gated in time and frequency domain by Kirvela et al. [31]; theirresult highlighted that in end-stage diabetic neuropathy patientsthe autonomic neuropathy is the main reason to cause severeimpairment of HRV and partly by the co-existing heart disease.

Recently, a novel Diabetic Integrated Index (DII) has been devel-oped by Acharya et al. [3], by using nonlinear parameters extractedfrom the HRV signal. This DII is a number which can distinguishand classify the two classes in terms of just one number. They alsoreported that the AdaBoost classifier yielded a high classificationaccuracy of 86% for the two classes (normal and diabetic). In thisresearch group, Swapna et al. [54] used Higher Order Spectral fea-tures to classify diabetic patients from normal subjects; theirmethod reported the highest accuracy, sensitivity and specificityof 90.5%, 85.7% and 95.2% respectively, by using Gaussian mixturemodel classifier. The magnitude plots of the HOS bispectrumobtained from HRV signals have been subjected to principalcomponent analysis for feature reduction [28]. These principalcomponents with SVM classifier reported an accuracy of 79.93%.However, Acharya et al. [5] reported 90% of accuracy, 92.5% of sen-sitivity and 88.7% of specificity with AdaBoost classifier coupledwith four nonlinear features. Pachori et al. [42] (In press), proposeda new nonlinear method based on Empirical Mode Decomposition(EMD) to discriminate between normal and diabetic RR-intervalsignals. In their proposed method, EMD decomposes the RR-intervalsignal into IMFs from which five features (Fourier–Bessel seriesexpansion, amplitude modulation bandwidth, frequency modula-tion bandwidths, analytic signal representation and second orderdifference Plot) are extracted. The study results show that thefeatures extracted exhibits are statistically significant differencebetween normal and diabetic classes.

In our present work, in order to automatically diagnose anddetect DM, we have performed DWT decomposition up to 5 levelsand have extracted the energy, sample entropy, approximation

Page 3: knowledge based system

58 U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64

entropy, kurtosis and skewness features at various detailed coeffi-cient levels of the DWT. Fig. 1 shows an overview of our proposedmethodology for diabetic HR signal classification. In the off-linesystem, normal and diabetes RR signal data are analyzed byDWT, performed up to 5 level of decomposition. Energy, sampleentropy, approximate entropy, kurtosis and skewness featuresare extracted from each levels of the detailed coefficients of theDWT. Then, these features are ranked by using Bhattacharyyaspace algorithm, t-test, Wilcoxon test, Receiver Operating Curve(ROC), and entropy method. The ranked features are fed to DT,KNN, NBC and SVM classifiers to obtain the highest classificationperformance using minimum number of features. In the on-linesystem, up to five levels decomposition are performed by usingDWT method and the features (energy, ApEn, SampEn, kurtosis,and skewness) are extracted. These features are ranked and fedto the selected classifiers for automated classification as normaland DM.

The flow of the paper is as follows. Section 2 delineates (i) thedata acquisition process and pre-processing, (ii) feature extractionmethod and feature ranking methods, and (iii) classification. Theresults of this novel diagnostic system are presented in Section 3.The discussion of the results is carried out in Section 4, and conclu-sion is provided in Section 5.

2. Methods for HRV analysis

2.1. Data acquisition/pre-processing

The electrocardiogram signals (ECG) were acquired from 30subjects (15 subjects with DM and 15 healthy subjects) in a relaxedsupine position for 60 min. The ECG recordings were performed byusing BIOPAC™ (Aero Camino Goleta, CA, USA) equipment, and theAcqKnowledge software inbuilt within equipment to convert therecordings into heart rate time series. Fig. 2 shows the RR signalsof normal and DM patients. We have kept the ECG sampling rateto 500 Hz. A total of 81 datasets from 15 diabetic subjects (10 maleand 5 female) and 82 datasets from 15 normal subjects (8 male and

Fig. 1. Propose

7 female) were used in this study, with each dataset having 1000samples. All the subjects were instructed about the aim of thestudy and signed an informed consent before being examined.The study received the approval by the Kasturba MedicalHospital, in Manipal, India. Band reject filter with a center frequen-cy of 50 Hz was used to remove the power-line interference noise.RR points were detected using Pan and Tomkins algorithm [40].

2.2. Feature extraction

Feature extraction step is the crucial process in biomedical sig-nal analysis and interpretation. We have performed DWT on theHR signals up to five levels, and extracted features of Energy (E),Approximate Entropy (ApEn), Sample Entropy (SampEn), Kurtosis(Kur) and Skewness (Skw) from these different levels of DWTcoefficients. The DWT method and the features extracted aredescribed briefly in the following section.

2.2.1. Discrete Wavelet Transform (DWT)The DWT transforms the signal from time domain to wavelet

domain and delivers different coefficient values. In the DWT, thegiven heart rate signals are passed through high pass and low passfilter. Once filtering is done, half of the samples are eliminated as itis sub-sampled by 2. This is the first level of decomposition. Thenthe low pass filter coefficients are subjected to low pass and highpass filter again and this procedure is repeated for different levelsof decomposition. At each level, the number of samples andfrequency band are halved [55]. This converts a signal into lowpass (approximate) coefficients and high pass (detailed) coefficients.In this work, we have used db8 mother wavelet function [17]. Weperformed DWT on HR signals up to five levels, and then extractedfeatures of energy, ApEn, SampEn, kurtosis, and skewness.

In this work, A5 is the fifth level of the approximate coefficientsand D1–D5 correspond to first to fifth level detailed coefficients.Fig. 3 shows the DWT performed on RR interval signals of normaland DM patients.

d system.

Page 4: knowledge based system

Fig. 2. Typical RR interval signals: (a) normal subject and (b) diabetic subject.

U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64 59

2.2.2. Energy (E)It is the square of the DWT coefficient of the heart rate signal.

2.2.3. Approximate Entropy (ApEn)It is a method used to quantify the amount of regularity and

unpredictability of signal variations [43]. This regularity statistichas potential application in ECG and heart rate data analysis/timeseries [44]. A signal varying rhythmically has small ApEn and viceversa. Herein, we have used the ApEn formula proposed by Pincuset al. [45].

2.2.4. Sample Entropy (SampEn)It is a modification of approximate entropy used for the assess-

ment of complexity and regularity of physiological time-series[57]. Unlike ApEn, SampEn is independent of data length andperforms consistently well. A signal with more repeating patternswill have small SampEn and vice versa.

2.2.5. Kurtosis and skewnessThese two values are used to assess the probability distribu-

tions of the signal series [46]. Kurtosis indicates whether the datais peak or flat relative to the normal distribution. Skewnessmeasures the asymmetry of the tails of distribution. The kurtosis(Kur) and skewness (Skw) are defined as

Kur ¼ E½fX � lg4�r4 ð1Þ

Skw ¼ E½fX � lg3�r3 ð2Þ

where X is the probability distribution of the signal, l is the meanvalue of the data set, and r represents the standard deviation ofthe data set.

2.3. Feature ranking

Ranking methods are one of the fastest methods in featureselection problem. Feature ranking is used to select a subset offeatures, which will reduce the classifiers complication withoutmaking any difference in its performance. In our work, the differ-ent feature ranking methods namely, Bhattacharyya spacealgorithm, t-test, Wilcoxon test, Receiver Operating Curve (ROC),and entropy are used to rank the significant features. These featureranking methods are briefly explained below.

2.3.1. Bhattacharyya methodIn this method, the features are ranked according to their ability

in discriminating the training data. Bhattacharyya ranking methodyields a single evaluation route, to thereby reduce the number ofclassifications by adding every feature [29].

2.3.2. t-testThe student t-test method is used to determine whether the

mean of two sets are different or not [13]. The test gives thep-value and t-values for the features extracted for the two groupsof data. Statistically, a low p-value is preferred (p < 0.05), and higherthe t-value better the ranking. Hence in this work, the low p-valuefeatures are selected and the t-values are used to rank them.

2.3.3. Wilcoxon testIt assess the difference between the two related samples. This is

a paired test that is suitable for comparing two different measure-ment sets made on the same data [61].

2.3.4. Receiver Operating Curve (ROC) methodIn this method, the sensitivity and specificity of a diagnostic test

is evaluated to obtain the ROC curve at different threshold values,

Page 5: knowledge based system

Fig. 3. Typical DWT plots of RR interval signals: (a) normal and (b) diabetic subject.

60 U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64

and it is plotted as sensitivity versus 1-specificity. A test that per-fectly discriminates between the two groups would yield a curve;then, by determining the area under curve, the soundness of a testcan be assessed. In practice, the area varies between 0.5 and 1; ifthe area is closer to 1, the test is considered better; the test is con-sidered worst if the area is closer to 0.5 [39].

2.3.5. Entropy based testThis method is based on the fact that entropy is lower for order-

ly layout and higher for disorderly layout. In this method, the fea-tures are ranked in descending order of relevance, by finding thedescending order of the entropies after removing each featureone at a time [18].

2.4. Classification

In our work, we have used ten-fold cross validation method toevaluate the classifiers [2]. Our main objective is to obtain the bestclassification accuracy, by using the minimum number of rankedfeatures and identify the best classifier. In this method, the wholeset of ranked features are first divided into 10 equal parts, with thefirst 9 parts (147 data files) being used for training the classifier,followed by using the trained classifier on the one remaining part(16 data files) to evaluate its performance. This whole process isrepeated 10 times by taking different parts for training and testingdataset. The classifier performance is measured by using the

average value of the ten folds. The different classifiers used inour study are explained below.

2.4.1. Decision Tree (DT)This classifier uses the significant features from the training

data to construct a tree [33]. The two classes are defined by usingthe rules extracted from the constructed tree. Then the class of thetest data is determined using these rules. The main advantage ofthis classifier is its ability to break down a complex decision-making process into a collection of simpler decisions, therebyproviding a solution which is often easier to interpret. There maybe difficulties involved in designing an optimal DT classifier. Theperformance of a DT classifier strongly depends on how well thetree is designed.

2.4.2. K-Nearest Neighbor (KNN)It is a simple classifier that determines the k-nearest neighbors

by using the minimum distance from the testing and training data[32]. The most common among the k-nearest neighbors areassigned with a class. This classifier has poor run-time perfor-mance when the training set is large. In this work, we have usedk = 3.

2.4.3. Naive Bayes Classifier (NBC)It is a probabilistic classifier which works on the principle of

Bayes theorem, and on the assumption that the features are

Page 6: knowledge based system

Table 1Range (Mean ± Standard Deviation) of features extracted from normal and diabetic RR interval signals.

Features Normal Diabetes p-value t-value

Mean SD Mean SD

ApEn_D1 0.878684 0.09963 0.766539 0.226049 6.37E�05 4.106849Kur_D3 0.050443 0.049304 0.14026 0.205666 0.000174 3.8445SamEn_A5 0.414618 0.141522 0.348454 0.106587 0.000946 3.368416Kur_D2 0.027187 0.042692 0.098319 0.19369 0.00142 3.246782Kur_D1 0.021703 0.052228 0.079683 0.157082 0.001826 3.169823ApEn_D3 0.850749 0.084797 0.785439 0.168723 0.002089 3.128092ApEn_D2 0.859135 0.108898 0.786305 0.214654 0.006906 2.736556Kur_D4 0.097679 0.090308 0.150161 0.166174 0.013084 2.509336ApEn_A5 0.665725 0.172749 0.609154 0.147757 0.026095 2.245531SampEn_D1 0.712377 0.113272 0.646319 0.251198 0.031581 2.168592Skw_D2 0.397354 0.039024 0.428037 0.138865 0.055938 1.92543ApEn_D4 0.731534 0.147349 0.688021 0.191352 0.105526 1.627786E_D1 0.002803 0.003401 0.019003 0.113083 0.196578 1.296734Skw_A5 0.503106 0.121693 0.528192 0.143809 0.23085 1.202721ApEn_D5 0.667284 0.158245 0.637826 0.183521 0.273877 1.097925E_A5 7.58E�05 2.93E�05 0.012382 0.111107 0.317341 1.003052E_D5 0.000253 0.000313 0.01238 0.111107 0.324434 0.988411Skw_D1 0.636675 0.052334 0.620989 0.133937 0.325119 0.987009E_D4 0.000343 0.00035 0.012399 0.111105 0.327228 0.982703E_D2 0.002461 0.002364 0.014506 0.111103 0.32778 0.981578Skw_D3 0.424324 0.042139 0.438184 0.121263 0.330019 0.977032E_D3 0.001619 0.001354 0.01256 0.111088 0.37381 0.891839Skw_D4 0.391949 0.066628 0.380989 0.12243 0.478117 0.710994Skw_D5 0.436783 0.188226 0.455444 0.163141 0.499988 0.676036Kur_D5 0.193536 0.179643 0.204018 0.163548 0.697491 0.389405Kur_A5 0.145777 0.130321 0.151545 0.177971 0.813513 0.236283

U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64 61

independent random variables [24]. The main advantage of thisclassifier is that it requires a small amount of training data to esti-mate the parameters (means and variances of the variables)required for classification. The most important downside of thisclassifier is that it has strong feature independence assumptions.

2.4.4. Support Vector Machine (SVM)It is one of the most widely used classifiers, which constructs a

separating hyper-plane in a feature space which separates thetraining data into two classes [19]. Kernel functions are used, ifthe data used are nonlinearly separable, to map the original inputdata to a higher dimensional feature space where the featuresmight become linearly separable. This work concerns polynomialkernel functions of order 1, 2 and 3 and radial basis function(RBF) kernels. We have used Least Square SVM (LS-SVM) in thiswork [53]. The biggest advantage of SVM is to overcome the curseof dimensionality in traditional machine learning and localminima. When dealing with small sample size problem, the gener-alization ability of this classifier is the best. The biggest limitationof the SVM lies in the choice of the kernel, and the most serious onefrom a practical point of view is the high algorithmic complexityand extensive memory requirements.

3. Results

In our work, we have extracted a total of twenty-six featuresfrom HRV signals by using the DWT method. Table 1 shows the

Table 2Results of classification by using various classifiers (features ranked using t-test method).

Classifiers Features TP TN FP

DT 8 75 76 6KNN 5 74 76 6NBC 13 24 78 4SVM Polynomial 1 4 57 78 4SVM Polynomial 2 6 67 72 10SVM Polynomial 3 6 75 67 15

results of statistical analysis. The results of automated detectionand classification of HRV signals of DM subjects are tabulated inTable 2. A ten-fold cross validation has been performed on theranked features by using different ranking methods which resultedin an average accuracy of 92.02%, sensitivity of 92.59% andspecificity of 91.46% is shown in Table 2.

Fig. 4 shows the plot of accuracy versus number of features forvarious ranking methods. It clearly shows that the t-test methodyields the highest classification accuracy for 21 ranked features,beyond which there is a drop in the accuracy level. Fig. 5 showsthe plot of average accuracy (%), sensitivity (%) and specificity (%)versus different folds of ten-fold cross-validation for DT classifier.

It can be noted from Table 1 that all the entropies in the differ-ent levels of detailed coefficients have decreased for the diabeticclass due to decrease in the variability. Also, the kurtosis, skewnessand energy of the detailed coefficients have higher value fordiabetic than the normal class.

4. Discussion

In our work, we have developed an automated DM diagnosticsystem by extracting the energy and entropy features of the firstfive levels of detailed coefficients of DWT. Table 3 provides a sum-mary of these works to discriminate DM automatically by usingHRV analysis to detect diabetes.

Our results show that the entropy features (namely the vari-ables ApEn and SampEn of Table 1) are always statistically lower

FN Sensitivity (%) Specificity (%) Accuracy (%)

6 92.59 92.68 92.647 91.36 92.68 92.02

57 29.63 95.12 62.5824 70.37 95.12 82.8214 82.72 87.80 85.28

6 92.59 81.71 87.12

Page 7: knowledge based system

Table 3Studies conducted to discriminate normal and diabetic subjects using HRV signals.

Authors Methods Features Classifier,number offeatures

Performance

Pfeifer et al. [41] Time domain RR variations Nil, one Supine HRV during a beta-adrenergicblockade and deep respiratory rate caneffectively estimate parasympatheticnervous activity in diabetic and controlsubjects

Singh et al. [52] Time and frequencydomain

SDNN, high and Low Frequency (LF) power, LF/HF Nil, four LF power and LF/HF ratio were lower inDM

Awdah et al. [11] Time domain SDRR, NN50, RMSSD, pNN50%, etc Nil, eight Decreased with diabetesFaust et al. [20] Time, Frequency

domain and nonlinearAll features Nil, Time

domain:seven

Decreased with diabetes

Freq domain:three

Nonlinear:twenty-two

Kirvela et al. [31] Time and frequencydomain

All time domain and frequency domain features Nil, All timeandfrequencydomainfeatures

Diminished HRV has been observed indiabetic autonomic neuropathy

Flynn et al. [22] Detrended fluctuationanalysis

Short range correlation (a1) Nil, One Short range correlation (a1) decreases fordiabetes subjects

Chemla et al. [16] FFT and Autoregressivespectral analysis

LF/HF ratio, LF(nu), and HF(nu) Nil, Three Decreased value for diabetes subjects

Schroeder et al. [49] Time domain SD, root mean square of successive differences innormal-to-normal R-R intervals

Nil, Three Decreased value for diabetes subjects

Ahamed Seyd et al. [8] Time and frequencydomain

All time domain and frequency domain features Nil, Time

domain: nine

All parameters reduced with diabetes

Freq domain:eleven

Trunkvalterova et al.[58]

Nonlinear Multiscale entropy (MSE) Nil, MSE MSE was significantly reduced on scales 2and 3 in DM

Nolan et al. [38] Time and frequencydomain

High frequency (HF) power, root mean square ofsuccessive differences between R–R intervals, totalR–R variability

Nil, Three Between HRV measures, duration of Type1 and Type 2 diabetes relationship isinverse

Acharya et al. [3] Nonlinear RQA features, Correlation dimension, long termvariability

Perceptron-AdaBoost,five

Diabetes Index, Accuracy: 86%

Sensitivity: 87.5%Specificity: 84.6%

Swapna et al. [54] HOS Bispectrum moments, entropies and weightedcenters

GMM, eight Accuracy: 90.5% Sensitivity: 85.7%Specificity: 95.2%

Jian et al. [28] HOS PCA features SVM Accuracy: 79.93%Acharya et al. [5] Nonlinear RQA features, ApEn Least

Squares-AdaBoost,four

Accuracy: 90.0% Sensitivity: 92.5%Specificity: 88.7%

Pachori et al. [42](in press)

Nonlinear Mean frequency using Fourier–Bessel seriesexpansion, two bandwidth parameters (amplitudemodulation and frequency modulation bandwidths)and Analytic Signal Representation (ASR) andSecond Order Difference Plot (SODP)

Kruskal–Wallisstatisticaltest, five

Features provide statistically significantdifference between diabetic and normalclasses

This work DWT Entropies, energy, skewness and kurtosis DT, eight Accuracy: 92.02%Sensitivity: 92.59%Specificity: 91.46%

62 U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64

for DM as compared to controls. This is in accordance to some veryrecent studies that showed how diabetes reduced the entropy ofthe EMG signals [59] and of the near-infrared signals measuringmuscle metabolism [36]. This decreased signal entropy is foundboth in the electrical activation of the muscles and suggested thatduring metabolism diabetes might alter the muscle fiber conduc-tion velocity and membrane functioning. We believe the entropyof the signal is a very important parameter also when analyzingthe HRV signal, because it might directly reflect a neuromusculareffect of DM.

This newly developed system has the followingadvantages:

(a) The developed software is repeatable and not prone to anyinter/intra-observer variability.

(b) This diagnostic tool will eliminate the need of repeated teststo confirm the DM, and thereby provide more reliable andfaster diagnosis.

(c) This method is highly effective during the situation when lotof data are to be collected for long durations to understandand identify the abnormality.

(d) Our method performed better than the rest of the techniquesreported in the above table.

(e) The proposed system is robust (ten-fold stratified cross-validation) and reduces the burden on the clinicians.

Page 8: knowledge based system

Fig. 4. Plot of accuracy versus number of features for the various ranking methods.

Fig. 5. Plot of average accuracy (%), sensitivity (%) and specificity (%) versus different folds of ten-fold cross-validation for DT classifier.

U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64 63

5. Conclusion

Diabetes is identified as one of the rapidly growing health con-cern in rural and urban cities of developed and developing coun-tries. Earlier intervention and continued treatment helps to keepthe diabetes under control. In this work, we have provided a tuto-rial on how diabetes is associated with cardiovascular autonomicneuropathy, which affects HRV. Hence, we can detect diabetes bycarrying out HRV spectral analysis. We have presented an auto-mated DM detection system, by using DWT features (of energyand entropy) extracted from the HRV signals. Using our presentedmethod, we have obtained the accuracy, sensitivity and specificityof 92.02%, 92.59% and 91.46% respectively by using DT classifier.The proposed method can be further extended to develop a CADsystem which can assist the clinicians to screen the diabetessubjects.

References

[1] I.V. Aaron, E.M. Raelene, D.M. Braxton, R. Roy, Diabetic autonomic neuropathy,Diabetes Care 26 (2003) 1553–1579.

[2] U.R. Acharya, S. Vinitha Sree, C.A. Ang Peng, S.S. Jasjit, Use of principalcomponent analysis for automatic classification of epileptic EEG activities inwavelet framework, Expert Syst. Appl. 39 (2012) 9072–9078.

[3] U.R. Acharya, O. Faust, S. Vinitha Sree, D.N. Ghista, S. Dua, P. Joseph, A.V.I.Thajudin, N. Janarthanan, T. Tamura, An integrated diabetic index using heart

rate variability signal features for diagnosis of diabetes, Comput. MethodBiomech. Biomed. Eng. 16 (2013) 222–234.

[4] U.R. Acharya, N. Kannathal, S.M. Krishna, Comprehensive analysis of cardiachealth using heart rate signals, Physiol. Meas. 25 (2004) 1139–1151.

[5] U.R. Acharya, O. Faust, N.A. Kadri, J.S. Suri, W. Yu, Automated identification ofnormal and diabetes heart rate signals using nonlinear measures, Comput. Biol.Med. 43 (10) (2013) 1523–1529.

[6] U.R. Acharya, K.P. Joseph, N. Kannathal, M.L. Choo, J.S. Suri, Heart ratevariability: a review, Med. Biol. Eng. Comput. 44 (2006) 1031–1051.

[7] American Diabetes Association (ADA) Fast Facts Data and statistics aboutdiabetes, 2013.

[8] P.T. Ahamed Seyd, T.V.I. Ahamed, J. Jeevamma, P.K. Jospeh, Time and frequencydomain analysis of heart rate variability and their correlations in diabetesmellitus, World Acad. Sci., Eng. Technol. 2 (2008) 583–586.

[9] S. Akselrod, D. Gordon, J.B. Madwed, D.C. Snidman, R.J. Cohen, Hemodynamicregulation: investigation by spectral analysis, Am. J. Physiol. 249 (1985) 867–875.

[10] American diabetes association (ADA), Diagnosis and classification of diabetesmellitus. Diabetes Care, vol. 27, 2004.

[11] A. Awdah, A. Nabil, S. Ahmad, Q. Reem, A. Khidir, Time-domain analysis ofheart rate variability in diabetic patients with and without autonomicneuropathy, Ann. Saudi Med. 22 (2002) 5–6.

[12] P. Aurelien, R. Manuel, A.J. Sophie, B. Claire De, Anre Denjean, Spectral analysisof heart rate variability interchangeability between autoregressive analysisand fast Fourier transform, J. Electrocardiol. 39 (2006) 31–37.

[13] J.F. Box, Guinness, gosset, fisher, and small samples, Statist. Sci. 2 (1987) 45–52.

[14] Roy Bhaskar, G. Sobhendu, Nonlinear methods to assess changes in heart ratevariability in type 2 diabetic patients, Arq. Bras. Cardiol. (2013).

[15] S. Cerutti, A. Bianchi, B. Bontempi, G. Comi, Power spectrum analysis of heartrate variability signal in the diagnosis of diabetic neuropathy, Proceedings ofthe annual international conference of the IEEE engineering in medicine andbiology society 1 (1989) 12–13.

Page 9: knowledge based system

64 U. Rajendra Acharya et al. / Knowledge-Based Systems 81 (2015) 56–64

[16] D. Chemla, J. Young, F. Badilini, P. Maison-Blanche, H.Y. Affres, Lecarpentier P.Chanson, Comparison of fast Fourier transform and auto-regressive spectralanalysis for the study of heart rate variability in diabetic patients, Int. J.Cardiol. 104 (3) (2005) 307–313.

[17] G. Donna, U.R. Acharya, J.M. Roshan, S. VinithaSree, T.C. Lim, A.V.I. Thajudin, J.S.Suri, Automated diagnosis of coronary artery disease affected patients usingLDA, PCA, ICA and discrete wavelet transform, Knowledge Based Syst. 37(2012) 274–282.

[18] M. Dash, H. Liu, Handling large unsupervised data via dimensionalityreduction. ACM SIGMOD Workshop on Research Issues in Data Mining andKnowledge Discovery, 1999.

[19] E. Osuna Edgar, F. Robert, G. Federico, Support vector machines: training andapplications, technical report. MIT AI Lab. Centre for Biological andComputational Learning, March 1997.

[20] O. Faust, U.R. Acharya, F. Molinari, S. Chattopadhyay, T. Tamura, Linear andnon-linear analysis of cardiac health in diabetic subjects, Biomed. SignalProcess. Control 7 (3) (2012) 295–302.

[21] Federico Bellavere, Italo Balzani, Giovanni De Masi, Maurizio Carraro, PasqualeCarenza, Claudio Cobelli, Karl Thomaseth, Power spectral analysis of heart ratevariations improves assessment of diabetic cardiac autonomic neuropathy,Diabetes 41 (1992) 633–640.

[22] A.C. Flynn, H.F. Jelinek, M. Smith, Heart rate variability analysis: a usefulassessment tool for diabetes associated cardiac dysfunction in rural andremote areas, Aust. J. Rural Health 3 (2) (2005) 77–82.

[23] F.J. Herbert, M.I. Hasan, A.A. Hayder, H.K. Ahsan, Association of cardiovascularrisk using nonlinear heart rate variability measures with the Framigham riskscore in a rural population, Front. Physiol., Comput. Physiol. Med. (2013) 4.

[24] J. Han, M. Kamber, J. Pei, Data mining: Concepts and Techniques, MorganKaufmann, Waltham, MA, 2005.

[25] Chu Duc Hoang Chu, Phan Kien Nguyen, Viet Dung Nguyen, A review of heartrate variability and its applications, APCBEE Proc. 7 (2013) 80–85.

[26] International Diabetes Federation Diabetes Atlas, sixth ed., 2013.[27] Constant Isabelle, Dominique Laude, Isabelle Murat, Jean-Luc Elghozi, Pulse

rate variability is not a surrogate for heart rate variability, Clin. Sci. 97 (1999)391–397.

[28] Wei Jian Lee, Cheng Lim Teik, Automated detection of diabetes by means ofhigher order spectral features obtained from heart rate signals, J. Med. ImagingHealth Inform. 3 (2013) 440–447.

[29] T. Kailath, The divergence and Bhattacharyya distance measures in signalselection, IEEE Trans. Commun. Technol. 15 (1) (1967) 52–60.

[30] G. Kheder, A. Kachouri, R. Taleb, M.M. Ben, M. Samet, Feature extraction bywavelet transforms to analyse the heart rate variability during two meditationtechnique. 6th WSEAS International conference on Circuits, Systems,Electronics, Control and Signal Processing, 2007.

[31] M. Kirvela, K. Salmela, L. Toivonen, A.M. Koivusalo, L. Lindgren, Heartratevariability in diabetic and non-diabetic renal transplant patients, ActaAnaesthesiol. Scand. 40 (7) (1996) 804–808.

[32] D.T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining,KNN, Willey Interscience, New Jersey, USA, 2004, pp. 90–106 (Chapter 5).

[33] D.T. Larose, Decision trees, Chapter 6 in discovering knowledge in data: anintroduction to data mining, Wiley Interscience, Hoboken, N, 2004, pp. 108–126.

[34] A. Malliani, F. Lombardi, M. Pagani, Power spectrum analysis of heart ratevariability: a tool to explore neural regulatory mechanisms, Brit. Heart J. 71(1994) 1–2.

[35] A. Metin, Nonlinear Biomedical Signal Processing. Fuzzy Logic, NeuralNetworks, and New Algorithms, vol. 1, IEEE Press, Fuzzy logic, 2000.

[36] F. Molinari, U.R. Acharya, R.J. Martis, R. De Luca, G. Petraroli, W. Liboni, Entropyanalysis of muscular near-infrared spectroscopy (NIRS) signals during exerciseprogramme of type 2 diabetic patients: quantitative assessment of musclemetabolic pattern, Comp. Method Prog. Biomed. 112 (2013) 518–528.

[37] Karim Nasim, Hasan Jahan Ara, Ali Syed Sanowar, Heart rate variability – areview, J. Basic Appl. Sci. 7 (2011) 71–77.

[38] R.P. Nolan, S.M. Barry-Bianchi, A.E. Mechetiuc, M.H. Chen, Sex-baseddifferences in the association between duration of type 2 diabetes, Diab.Vasc. Dis. Res. 6 (2009) 276–282.

[39] N.A. Obuchowski, Receiver operating characteristic curves and their use inradiology, Radiology 229 (2003) 3–8.

[40] J. Pan, W.J. Tompkins, A real time QRS detection algorithm, IEEE Trans. Biomed.Eng. 32 (3) (1985) 230–236.

[41] M.A. Pfeifer, D. Cook, J. Brodsky, D. Tice, A. Reenan, S. Swedine, J.B. Halter, D.Porte, Quantitative evaluation of cardiac parasympathetic activity in normaland diabetic man, Diabetes 31 (4) (1982) 339–345.

[42] R.B. Pachori, P. Avinash, K. Shashank, R. Sharma, U.R. Acharya, Application ofempirical mode decomposition for analysis of normal and diabetic RR-intervalsignals. Expert Systems with Applications, 2015 (in press).

[43] A.M. Pincus, Approximate entropy as a measure of system complexity, Proc.Nat. Acad. Sci. 88 (1991) 2297–2301.

[44] S.M. Pincus, I.M. Gladstone, A.E. Richard, A regularity statistic for medical dataanalysis, J. Clin. Monit. (1991) 7.

[45] S.M. Pincus, D.L. Keefe, Quantification of hormone pulsatility via anapproximate entropy algorithm, Am. J. Physiol. 262 (1992) E741–E754.

[46] Shi Ping, Hu Sijung, Z. Yisheng, A preliminary attempt to understandcompatibility of photoplethysmographic pulse rate variability withelectrocardiogramic heart rate variability, J. Med. Biol. Eng. 28 (2008) 173–180.

[47] W. Sarah, S. Richard, R. Gojka, G. Anders, K. Hilary, Global prevalence ofdiabetes estimates for the year 2000 and projections for 2030, Diabetes Care27 (2004) 1047–1053.

[48] Sarika Tale, T.R. Sontakke, Time-frequency analysis of heart rate variabilitysignal in prognosis of type 2 diabetic autonomic neuropathy, 2011.International Conference on Biomedical Engineering and Technology, vol. 11,2011.

[49] E.B. Schroeder, L.E. Chambless, D. Liao, R.J. Prineas, J.W. Evans, W.D. Rosamond,G. Heiss, Diabetes, glucose, insulin, and heart rate variability: theatherosclerosis risk in communities (aric) study, Diabetes Care 28 (3) (2005)668–674.

[50] A. Schumacher, Linear and nonlinear approaches to the analysis of RR intervalvariability, Biol. Res. Nursing 5 (2004) 211–221.

[51] B. Pomeranz, R.J.B. Macaulay, M.A. Caudill, I. Kutz, D. Adam, K.M. Kilborn, A.C.Barger, D.C. Shannon, R.J. Cohen, H. Benson, Assessment of autonomicfunction in humans by heart rate spectral analysis, Am. J. Physiol. 248(1985) 151–153.

[52] J.P. Singh, M.G. Larson, C.J. O’Donnell, P.F. Wilson, H. Tsuji, D.M. Lloyd-Jones, D.Levy, Association of hyperglycemia with reduced heart rate variability (theFramingham heart study), Am. J. Cardiol. 86 (3) (2000) 309–312.

[53] J.A.K. Suykens, J. Vandewalle, Least square support vector machine classifiers,Neural Process. Lett. 9 (1999) 293–300.

[54] G. Swapna, U.R. Acharya, V.S. Sree, J.S. Suri, Automated diagnosis of diabetesusing higher order spectra features extracted from heart rate signals, Intell.Data Anal. 17 (2) (2013) 309–326.

[55] M. Ratnakar, K.S. Sunil, J. Nitisha, Signal filtering using discrete wavelettransform, Int. J. Recent Trends Eng. (2009) 2.

[56] J.M. Roshan, U.R. Acharya, C.M. Lim, ECG beat classification using PCA, LDA,ICA, and discrete wavelet transform, Biomed. Signal Process. Control 8 (2013)437–448.

[57] J.S. Richman, J.R. Mooran, Physiological time-series analysis using approximateentropy and sample entropy, Am. J. Physiol. Heart Circphysiol. 278 (2000)2039–2049.

[58] Z. Trunkvalterova, M. Javorka, I. Tonhajzerova, J. Javorkova, Z. Lazarova, K.Javorka, M. Baumert, Reduced short-term complexity of heart rate and bloodpressure dynamics in patients with diabetes mellitus type 1: multiscaleentropy analysis, Physiol. Meas. 29 (7) (2008) 817–828.

[59] K. Watanabe, T. Miyamoto, Y. Tanaka, K. Fukuda, T. Moritani, Type 2 diabetesmellitus patients manifest characteristic spatial EMG potential distributionpattern during sustained isometric contraction, Diab. Res. Clin. Pract. 97 (3)(2012) 468–473.

[60] WHO Consultation: definition and diagnosis of diabetes mellitus andintermediate hyperglycemia, 2006.

[61] F. Wilcoxon, Individual comparisons by ranking methods, Biometric Bull. 1(1945) 80–83.