[ieee 2013 fifth international conference on computational intelligence, communication systems and...

Premature Ventricular Contraction Arrhythmia Detection and Classification with Gaussian Process and S Transform

Yacoub Bazi, Haikel Hichri, Naif Alajlan, Nassim Ammour Advanced Lab for Intelligent Systems Research (ALISR)

College of Computer and Information Sciences, King Saud University Riyadh, Saudi Arabia, 11543

[email protected], [email protected], [email protected] Abstract—This paper presents an efficient Bayesian classification system based on Gaussian process classifiers (GPC) for detecting premature ventricular contraction (PVC) beats in electrocardiographic (ECG) signals. GPC have the advantage over SVM classifiers in that the parameters of its kernel are automatically selected according to the Bayesian estimation procedure based on Laplace approximation. We also propose to feed the classifier with different representations of the ECG signals based on morphology, discrete wavelet transform, and S-transform. The latter representation has never been used for ECG signals before. The experimental results obtained on 48 records (i.e., 109887 heart beats) of the MIT-BIH arrhythmia database showed that for all feature representations adopted in this work, the proposed GP classifier combined with the S-transform and trained with only 600 beats from PVC and Non-PVC classes can provide an overall accuracy and a sensitivity above 96% on the whole 48 recordings.

Keywords- ECG; Gaussian process classification; S-transform; PVC arrhythmia detection;

I. INTRODUCTION

The automatic detection and classification of ECG arrhythmias such as premature ventricular contraction (PVC) is essential for the treatment of patients with heart disease. In this context, many classification systems have been proposed in the literature [1]-[4]. Among these methods, the support vector machine (SVM) classifier introduced recently showed to be a promising tool for classifying ECG signals [1],[4]. Results obtained on different records from the MIT-BIH Arrhythmia database reveal that it exhibits better performances in terms of classification accuracy with respect to the state of the art methods such as the k-nearest neighbor classifier and artificial neural networks. In addition, it shows to be less sensitive to the curse of dimensionality (i.e., Hughes effect) with respect to traditional classifiers. This is mainly due to the maximal margin principle it is based on [5], which avoids the explicit estimation of the statistical distribution of classes in the hyperdimensional feature space in which the classification task is carried out.

The aforementioned proprieties greatly motivate the exploration of other kernel-based methods less or not yet investigated for the classification of ECG signals. In this context, Gaussian process classifiers (GPC) [6]-[8] appear

to be a potentially interesting solution. Compared to SVM, they have the advantage of providing probabilistic outputs rather than discriminant function values. They are statistical classifiers which permit a fully Bayesian treatment of the considered classification problem. They have gained in prominence in recent years as they represent a powerful and interesting theoretical framework for Bayesian classification. They can use evidence for solving the model selection issue in a completely automatic way. The main idea of GPCs is to assume that the probability of belonging to a class label for an input sample is monotonically related to the value of some latent function at that sample. Such monotonic relationship is defined according to a so-called squashing function. A Gaussian process prior characterized by a covariance matrix embedding a set of hyper-parameters is placed on this latent function. Inference is made by integrating over the latent function. Since such an integral is analytically intractable, solutions based on Monte Carlo sampling or analytical approximation methods are adopted. Two key analytical approximation algorithms are the Laplace and the expectation propagation (EP) algorithms.

In this paper, we propose to investigate the capabilities of GPC for the automatic detection of premature ventricular contractions (PVCs) in ECG signals. In particular, we feed this classifier with different kinds of ECG signal representations (or features), which are the 1) standard temporal signal morphology, 2) the discrete wavelet transform, and 3) the S transform characteristics. The aim is to find, which feature representation combined with GPC is the most accurate for discriminating PVC from Non-PVC beats. The experimental results obtained on 48 records of the MIT-BIH arrhythmia database confirm the great capabilities of the proposed classification system.

II. ECG SIGNALS PREPROCESSING

Before feature extraction, preprocessing of the ECG signal is required as it is often contaminated by different kinds of noise such as powerline interference, baseline wander, patient electro demotion artifacts, etc.[16],[17]. At first, noise could be removed by a bandpass filter [18]. Then, a QRS detection step is applied to detect Q, R, and S points, which are present in every ECG signal of an heart beat. The resultant filtered signal is then segmented into individual heartbeats (e.g., R–R or P–P segmentation) as

2013 Fifth International Conference on Computational Intelligence, Communication Systems and Networks

978-0-7695-5042-8/13 $26.00 © 2013 IEEE

DOI 10.1109/CICSYN.2013.44

36

illustrated in Fig 1. Here, we only need to obtain a complete heart beat and the onset of segmentation is not important as the proposed feature is invariant to positional shift of different parts of the 1-D signal. One final step is to normalize the heart beat vectors, i.e. to resample them so that they all have the same number of samples.

The set of normalized ECG vectors obtained, can be passed to a classifier directly. We call these ECG vectors temporal signal morphology feature vectors.

(a)

�(b)

Figure 1. Two segmented heart-beats with different sampling rates, (a) R-R segmentation (b) P-P segmentation. The heart-beats sampling rates are 1000 Hz and 360 Hz respectively.

III. FEATURE EXTRACTION METHODS

Feature extraction is an important step that can help transform the ECG raw signal (after the preprocessing step described in the previous section) to another domain where the representation of the heart beats are more invariant. We considered in this study different feature vectors characterizing the heart beats. These are the standard temporal signal morphology (Morpho) [1]; the discrete wavelet transform (DWT) [10]; and the S-transform (ST) [12]. Brief descriptions of each method is given in the next subsections.

Finally, for each of the ECG feature representation, we added three temporal features that are the 1) QRS complex

duration, 2) the RR interval (i.e., time span between two consecutive R points representing the distance between the QRS peaks of the present and previous beats), and 3) the RR interval averaged over the ten last beats.

A. Temporal signal morphology

The temporal signal morphology feature vector is just the preprocessed heart beat signal as described in Section II.

B. Discrete wavelet transform

The wavelet transform can be used to transform any function �� to another domain, by representing the function as a linear combination of a predefined set of basis functions ��.

�� (1) Functionally, this is very much like the Fourier

Transform (FT), where a function is expressed as a linear combination of sinusoidal basis functions. Whereas, the basis function of the Fourier transform is a sinusoid, the wavelet basis is a set of functions which are defined by a recursive difference equation

�� (2) where the range of the summations is determined by the specified number of nonzero coefficients M. The number of nonzero coefficients is arbitrary, and is referred to as the order of the wavelet. The value of the coefficients is, of course, not arbitrary, but is determined by constraints of orthogonality and normalization.

Two example of wavelet functions are shown in Fig. 2; the first is a Haar basis function (which is the simplest wavelet function), and the second is the Daubechies-4 wavelet, which is very commonly used in signal processing applications. Both are named after pioneers in wavelet theory [13],[14].

A wavelet is an orthogonal basis function which can applied to a finite group of data. The wavelet basis function is orthogonal, meaning that a signal passed twice through the transformation is unchanged.

The Discrete Wavelet Transform (DWT) (or "pyramid" algorithm [15]) is computationally efficient method of implementing the wavelet transform. It operates on a finite set of N input samples, where N is a power of two (if not, the data can be padded with zeros). The data are passed through two convolution functions, each of which creates an output stream that is half the length of the original input. These convolution functions are filters, one half of the output is produced by the "low-pass" filter function:

��

�

� ��

�

and the other half is produced by the "high-pass" filter function:

!� ��

�

� ��

�

37

where is the input size, � are filter coefficients, � is the input function, and � and ! are the output functions.

(a)

(b)

Fig. 2: Sample basis wavelet functions: a) haar wavelet and 2) Daubechies-4 wavelet.

C. S-trasform

Unlike DWT, the ST has not been yet explored in the context of ECG signal classification. It performs a multi-resolution analysis on a time varying signal as its window width varies inversely with frequency. The output from the ST applied to a heart beat is a matrix called the S-matrix whose rows pertain to frequency and columns to time. The S transform is given by: "�#� �� $ %��&'()��*��+,�� #�-.�/

�/ (3) Where the window function is given by: +, � �

0�1 2�2 '()��3�� (4)

We notice that the width of the window is reciprocal of

the absolute frequency 2�2. In addition it is different from the wavelet transform because the phase of the kernel function does not translate with time.

IV. GAUSSIAN PROCESS CLASSIFIERS

Let us consider a training set D=(X,y) consisting of a matrix of PVC and Non-PVC training data

[ ]TNxxxX ...21= accompanied with a target vector

[ ]TNyyy ...21=y

where N is the number of heart beats.

To each vector 4� 5 �6 �� a target yi ∈ {-1, +1} is associated. Given this training set D, we aim at

predicting the label of a new sample *x (whose true label is unknown) through the computation of the class posterior probability ),|( *xDyp * .

In GPC, the probability of belonging to a class label yi=+1 for an input sample xi is monotonically related to the

value fi of some latent function f. Such monotonic relationship is defined according a squashing function; which can take several forms such as logistic and probit functions:

��

��

Φ

+=+=

functionprobit )(

function logistic)exp(-1

1)|1(

ii

iiii

fyfyfyp (5)

where � is the Gaussian cumulative distribution function;

i.e., �∞−

−=Φ

z

dxx

z )2

exp(2

1)(

2

π.

The prediction of the output probability for the sample

*x is obtained by integration over the latent function *f as follows:

( ) ( )�=+= ***** ,||),|1( dfDfpfypDyp ** xx

(6)

The second term of the integral (6) represents the distribution of the latent variable corresponding to the sample *x . It is obtained by marginalization over

[ ]Nfff ...21=f :

( ) ( ) ( )�= fffxXx ** dDpfpDfp |,,|,| **

(7)

where ( )Dp |f is the posterior over the latent variables. This last can be reformulated through the Bayes rule as:

)|(/)|()|(

)|(/)|()|()(

1

XyXf

XyXffy|f

ppfyp

pppDp

N

iii ��

��

=

=

∏=

(8)

p(y|f) is the likelihood function. It can be expressed by

using one of the forms of the squashing functions given in

equation (5). p(y|X) is the marginal likelihood and p(f|X)

is the GP prior over the latent variables. The GP prior is

typically characterized by a zero-mean and a covariance

matrix embedding a set of hyperparameters, i.e.,

38

��

��

−= fKfK

X|f 1-T

2

1exp

)2(

1)(

2

12/N

p

π

(9)

where each term of the covariance matrix K is a function of xi and xj. By analogy, K can be seen as the Gram matrix in SVM. The covariance function encapsulates the prior knowledge about the function smoothness. In this paper, we shall consider two types of covariance functions. The first one is the squared exponential (or Gaussian radial basis function - RBF). It takes the following relationship:

��

��

�

−=

2

2

02

-exp),(

lk

qpqp

xxxx θ (10)

where �0 is the process variance. It controls the overall vertical scale of variation of the latent function. l denotes the length scale. It governs the latent function smoothness. The second is the squared exponential function but with automatic relevance determination (RBF-ARD):

��

��

�

−= �

=

d

m m

mq

mp

qpl

xxk

12

2

02

-exp),( θxx (11)

where lm denotes the length scale for the mth feature dimension. A large value of this parameter indicates that the corresponding feature is irrelevant for classification. The implicit form of feature selection embedded in the covariance function in this case is termed as feature weighting. The hyper-parameter vector for the two covariance functions is given by �=[l, �0] and �=[l1 l2... ld , �0], respectively.

Since the integrals in equations (6) and (7) are not analytically tractable because of the non-Gaussian nature of the likelihood terms, analytical approximation or Monte Carlo methods have to be adopted. Monte Carlo methods are nondeterministic methods, which compute numerically the integral by simply making use of the law of large numbers. Therefore, they are particularly time-consuming because of the typically huge number of samples they require to perform the numerical integration. In this work, we consider an attractive alternative analytical approximation method, termed as Laplace approximation. The latter substitutes the non-Gaussian posterior in the integral (7) by a Gaussian approximation q(f|D) derived from a second-order Taylor expansion of log p(f|D) around the maximum of the posterior

V. EXPERIMENTAL RESULTS

A. Dataset Description and Experiment Settings

In the experiments, ECG data recordings of the MIT-BIH arrhythmia database coming from 48 patients were used [9]. These heart beats in each ECG recording (48 in

total) were subdivided into two groups, one for training and the other for testing purposes.

In order to extract the standard temporal morphological features, we first perform the QRS detection and ECG wave boundary recognition tasks by means of the Ecgpuwave software available on:

http://www.physionet.org/physiotools/ecgpuwave/src/. Then after extracting the temporal features of interest,

we normalize to the same periodic length the duration of the segmented ECG heart beats according to the procedure reported in [1]. To this purpose, the mean beat period was chosen as the normalized periodic length, which was represented by 300 uniformly distributed samples. From these morphological features, we compute other set of feature vectors by applying separately the different transforms mentioned above. For the DWT, we used the Daubechies wavelet function of order 4 by decomposing the ECG beats up to the fourth scale. Whereas for the S-transform, we computed the maximum amplitude of each row of the S-matrix.

To obtain reliable assessments of the classification accuracy, in all experiments, we carried out five different trials, each with a new training set of 600 beats (i.e., 300 samples for both PVC and Non-PVC beats, respectively) randomly selected from the 48 records (see Table 1).

Table 1. Number of training and test beats adopted

for PVC and NPVC beats. PVC NPVC Total

Training 300 300 600 Testing 7117 102770 109887

The results of these five trials obtained on testing sets

were thus averaged. The classification performances obtained on a personal computer with a 1.86 GHz processor are compared to those provided by the SVM classifier. Such comparison is evaluated in terms of four measures, which are: 1) the overall accuracy (OA), which is the percentage of correctly classified beats among all the beats considered belong to, 2) The sensitivity (Se) defined as the correct number of classified PVC beats over the total number of PVC beats; 3) the Specificity (Sp) defined as the number of Non-PVC beats classified as Non-PVC over the total number of Non-PVC beats, and finally 4) the classification time for one beat.

In the experiments, for GPC, the RBF and RBF-ARD covariance functions characterized by the hyper-parameter vectors �=[l, �0] and �=[l1 l2... ld , �0] were initially set to one. Concerning SVM, we considered the popular Gaussian kernel. The related regularization and kernel width parameters were varied in the ranges [10-3, 200] and [10-3, 2] so that to cover high and small regularization of the classification model, and fat as well as thin kernels, respectively.

39

B. Results We applied the classifiers in the original feature space

corresponding to each ECG signal representation method. During the training phase, the SVM parameters were

selected according to a 5-fold cross-validation (CV) procedure. First by randomly splitting the 600 training beats into 5 mutually exclusive subsets (folds) of equal size, and then by training five times the classifier modeled with predefined values for the regularization and kernel parameters. Each time we left one of the subsets out of the training, and only used it to obtain an estimate of the classification accuracy. From five times of training and accuracy computation, the average accuracy yielded a prediction of the classification accuracy of the considered classifier. We chose the best classifier parameter values to maximize this prediction. Concerning the GP classifier, the parameters of its related kernel were automatically selected according to the Bayesian estimation procedure based on Laplace approximation.

The classification results averaged over five trials (i.e., five randomly selected training sets) on the two considered test sets are reported in Table 2.

Table 2. Sensitivity (Se), specificity (Sp), and overall (OA) accuracies achieved by the GP (with RBF and RBF-ARD covariance functions) and SVM classifiers fed with different feature representations.

Method Features OA Se Sp

SVM

Morpho 93.25 95.34 93.05 WT2 93.29 95.29 93.10 WT3 93.26 95.20 93.07 WT4 93.16 95.19 92.96 ST 96.15 97.29 96.04

GPC (RBF)

Morpho 94.25 94.48 94.13 WT2 94.28 94.45 94.16 WT3 94.30 95.47 94.19 WT4 94.29 95.44 94.18 ST 96.66 97.30 96.60

GPC

(RBF-ARD)

Morpho 95.25 95.70 95.21 WT2 95.13 95.72 95.07 WT3 94.94 95.31 94.90 WT4 95.07 95.49 95.02 ST 96.72 97.32 96.66

First, note that the GPC provided better accuracies for

all different feature representations considered in this study. The minimum difference in accuracy was obtained for ST features and was equal to 0.51% .

Second, among the adopted feature extraction methods, the S-transform provided the best classification results over all types of classifiers (highlighted in bold). And the best results were obtained for the GPC-RBF-ARD classifier. The corresponding (OA, Se and Sp) were equal to (96.72%, 97.32%, and 96.66%).

The latter GPC-RBF-ARD classifier embeds the feature selection process implicitly in its covariance function, that is in our case in the RBF-ARD covariance

function. This interesting propriety represents another advantage of GPC with respect to SVM.

Fig. 3 shows the length scale distribution associated to the different features at convergence. Large/small values of this parameter indicate that the corresponding feature is irrelevant/relevant for classification.

(a) (b)

(c)

Figure 3. Length scale values (feature weights) obtained by GPC with the RBF-ARD covariance function: (a) morphology (b) wavelet (c) S transform.

It appears clearly from this table that GPC-RBF-ARD combined with ST features are more suitable for classifying ECG signals. It is worth mentioning that these results are competitive with state-of-the-art methods [10] in which the overall accuracy (OA) and the sensitivity (Se) obtained on 40 patient recordings from the MIT-BIH arrhythmia database were equal to 95.16% and 82.87%, respectively.

VI. CONCLUSIONS

This paper has presented an efficient classification system based on GPC for detecting PVC beats in ECG signals. The classification results obtained on 48 recordings from the MIT arrhythmia database allow us to draw the following conclusions: 1) The GP detector is able to provide high classification accuracies; 2) Similar to SVM, it does not exhibit a particular sensitivity to the curse of dimensionality; 3). Unlike SVM, feature selection can be done intrinsically by defining an appropriate covariance function. Finally, among the different kinds of features explored, those based on S-transform appear to give the best detection results when combined with GPC.

Future work for this proposed method should investigate more feature extraction methods. It should also test it on other possibly larger databases and also on unseen ECG signals.

40

ACKNOWLEDGMENT

This work is supported by the National Plan for Sciences and Technology (NPST), King Saud University. Project ID: 11-MED1832-02

REFERENCES

[1] F. Melgani and Y. Bazi, “Classification of electrocardiogram

signals with support vector machines and swarm particle optimization”, IEEE Trans. Information Technology in Biomedicine, vol. 12, pp. 667-677, 2008.

[2] H. Khorrami, M. Moavenian, "A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification", Expert Systems with Applications, vol. 37, pp. 5751-5757, 2010.

[3] Y. Ozbay, G. Tezel, "A new method for classification of ECG arrhythmias using neural network with adaptive activation function", Digital Signal Process., vol. 20, pp. 1040-1049, 2010.

[4] E. Pasolli, F. Melgani, “Active Learning Methods for Electrocardiographic Signal Classification”, IEEE Trans. Inform. Tech. Biomed., Vol. 14, No. 6, Nov. 2010, pp. 1405-1416.

[5] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

[6] C. Rasmussen and C.K.I. Williams, Gaussian process for machine learning. The MIT press 2006.

[7] H. C. Kim and Z. Ghahramani, “Bayesian Gaussian process classification with the EM-EP algorithm”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, pp. 1948-1959, 2006.

[8] Y. Bazi, and F. Melgani, “Gaussian process approach to remote sensing image classification”, IEEE Trans. Geoscience and Remote Sensing, vol. 48, pp. 186-197, 2010.

[9] R. Mark and G. Moody, MIT-BIH Arrhythmia Database 1997 [Online]. Available: http://ecg.mit.edu/dbinfo.html.

[10] S. G. Mallat, “Multifrequency channel decompositions of image and wavelet models”, IEEE Trans. Acoust. Speech Signal Processing, vol. 37, pp. 2091–2110, 1989.

[11] C. Nikias and A. Petropulu, Higher Order Spectral Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[12] R. G. Stockwell, L. Mansinha, and R. P. Lowe, “Localization of the complex spectrum: The S-transform,” IEEE Trans. Signal process., vol. 44, pp. 998-1001, 1996.

[13] A. Haar, Zur Theorie der orthogonalen Funktionensysteme Mathematische Annalen 69 (1910), pp. 331-371.

[14] Ingrid Daubechies. 1993. Orthonormal bases of compactly supported wavelets II: variations on a theme. SIAM J. Math. Anal. 24, 2 (March 1993), 499-519.

[15] Mallat, S.G., "A theory for multiresolution signal decomposition: the wavelet representation," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.11, no.7, pp.674,693, Jul 1989

[16] G. D. Clifford, F. Azuaje, and P. E. Mc Sharry, Advanced Methods and Tools for ECG Data Analysis.: Artech House, 2006.

[17] S. Z. Fatemian and D. Hatzinakos, "A new ECG feature extractor for biometric recognition," in 16th International Conf. on Digital Signal Processing, Santorini-Hellas, 5–7 July, 2009, pp. 1–6.

[18] S.-C. Fang and H.-L. Chan, "Human identification by quantifying similarity and dissimilarity in electrocardiogram phase space," Pattern Recognition, vol. 42, pp. 1824–1831, 2009.

41

[ieee 2013 fifth international conference on computational intelligence, communication systems and...

Documents