biomedical signal processing and control · mazi´c et al. / biomedical signal processing and...

14
Biomedical Signal Processing and Control 21 (2015) 105–118 Contents lists available at ScienceDirect Biomedical Signal Processing and Control jo ur nal homepage: www.elsevier.com/locate/bspc Two-level coarse-to-fine classification algorithm for asthma wheezing recognition in children’s respiratory sounds Igor Mazi ´ c a , Mirjana Bonkovi ´ c b,, Barbara zaja b a University of Dubrovnik, Dubrovnik, Croatia b University of Split, Split, Croatia a r t i c l e i n f o Article history: Received 8 November 2014 Received in revised form 23 April 2015 Accepted 5 May 2015 Keywords: Asthma wheeze recognition Support vector machines Mel frequency cepstral coefficients a b s t r a c t The paper proposes a two-layer pattern recognition system architecture for asthma wheezing detec- tion in recorded children’s respiratory sounds. The first layer consists of two SVM classifiers specifically designed as a cascade stacked in parallel to emphasize the differences among signals with similar acous- tic properties, such as wheezes and inspiratory stridors. The second layer is realized using a digital detection threshold, which further upgrades the proposed structure with the aim of improving the pro- cess of wheezing detection. The results were experimentally evaluated on the data acquired from the General Hospital of Dubrovnik, Croatia. Classification results obtained on the test data sets revealed that the central frequency of wheezes included in the training data is important for the success of classification. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction Bronchial asthma (asthma) is one of the typical modern diseases that constantly attracts the attention of the medical and research community due to the growing number of people who suffer from it or for members of families involved in care. It is a chronic, inflam- matory, reversible obstructive lung disease and is accompanied by the narrowing of the bronchi. Acoustic signals are formed in the lungs due to the oscillation of the bronchi walls caused by the turbu- lent air flow during breathing and are a source of information on the state of the respiratory system. In medicine, these sounds correlate with pulmonary pathology and have been studied since the inven- tion of the stethoscope in 1816 [1]. Phonopneumograms (acoustic breathing records) are dependent on anatomical and physiological parameters such as sex, age, type and stage of disease. Additionally, they can be different in one person. Digital methods of collecting, processing and analyzing phonopneumograms have been used for more than 30 years. Measuring systems, among others, consisted of transducers that were placed on the chest or trachea and collected acoustic signals during breathing. A normal respiratory sound is the sound produced by the lungs of healthy people during inspira- tion and expiration. Abnormal respiratory sounds that occur at an Corresponding author: Tel.: +385 21305641. E-mail addresses: [email protected] (I. Mazi ´ c), [email protected] (M. Bonkovi ´ c), [email protected] (B. zaja). earlier stage of asthmatic attacks found by stethoscope ausculta- tion include prolonged expiration and the emergence of wheezing. Typical devices that monitor the degree of airway obstruction are the spirometer and peak flow meters, but these devices are unsuit- able for children from zero to six years of age who cannot determine how to use them. Additionally, to ensure sufficient acoustic power for the respiratory sound, the children were usually encouraged to perform forced breathing [28]. Their efforts often resulted in spe- cific physiological artifacts such as inspiratory stridor or snoring, which sound similar to asthmatic wheezes. Stridor is a mono- phonic high pitch wheezing sound produced by air flow passing through the narrowed larynx. Snoring is a sound arising from the passage of an air flow through the upper part of the respiratory tract that is obstructed by secretions. In addition, medical practi- tioners observed that children of a specified age (zero to six) are more prone to respiratory infections than adults due to well vas- cularized mucosa. Additionally, because of the anatomy of organs (small size), infection is rarely limited to a particular region of the respiratory system, and usually the larynx, trachea and bronchi are also affected. Hence, there is a problem with the exact classification of respiratory sounds in children, which should be considered sepa- rately from adults. In addition, there are non-physiological artifacts caused by the interaction of the transducer with the skin. These arti- facts were more pronounced in restless children, which is a typical behavior for younger and/or asthmatic children. Therefore, trained physicians are able to recognize the children’s breathing sound quality and diagnose an asthmatic attack, but there http://dx.doi.org/10.1016/j.bspc.2015.05.002 1746-8094/© 2015 Elsevier Ltd. All rights reserved.

Upload: others

Post on 01-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

Tr

Ia

b

a

ARRA

KASM

1

tcimtllswtbptpmtatt

(

h1

Biomedical Signal Processing and Control 21 (2015) 105–118

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

jo ur nal homepage: www.elsev ier .com/ locate /bspc

wo-level coarse-to-fine classification algorithm for asthma wheezingecognition in children’s respiratory sounds

gor Mazic a, Mirjana Bonkovic b,∗, Barbara Dzajab

University of Dubrovnik, Dubrovnik, CroatiaUniversity of Split, Split, Croatia

r t i c l e i n f o

rticle history:eceived 8 November 2014eceived in revised form 23 April 2015ccepted 5 May 2015

a b s t r a c t

The paper proposes a two-layer pattern recognition system architecture for asthma wheezing detec-tion in recorded children’s respiratory sounds. The first layer consists of two SVM classifiers specificallydesigned as a cascade stacked in parallel to emphasize the differences among signals with similar acous-tic properties, such as wheezes and inspiratory stridors. The second layer is realized using a digital

eywords:sthma wheeze recognitionupport vector machinesel frequency cepstral coefficients

detection threshold, which further upgrades the proposed structure with the aim of improving the pro-cess of wheezing detection. The results were experimentally evaluated on the data acquired from theGeneral Hospital of Dubrovnik, Croatia. Classification results obtained on the test data sets revealedthat the central frequency of wheezes included in the training data is important for the success ofclassification.

© 2015 Elsevier Ltd. All rights reserved.

. Introduction

Bronchial asthma (asthma) is one of the typical modern diseaseshat constantly attracts the attention of the medical and researchommunity due to the growing number of people who suffer fromt or for members of families involved in care. It is a chronic, inflam-

atory, reversible obstructive lung disease and is accompanied byhe narrowing of the bronchi. Acoustic signals are formed in theungs due to the oscillation of the bronchi walls caused by the turbu-ent air flow during breathing and are a source of information on thetate of the respiratory system. In medicine, these sounds correlateith pulmonary pathology and have been studied since the inven-

ion of the stethoscope in 1816 [1]. Phonopneumograms (acousticreathing records) are dependent on anatomical and physiologicalarameters such as sex, age, type and stage of disease. Additionally,hey can be different in one person. Digital methods of collecting,rocessing and analyzing phonopneumograms have been used forore than 30 years. Measuring systems, among others, consisted of

ransducers that were placed on the chest or trachea and collected

coustic signals during breathing. A normal respiratory sound ishe sound produced by the lungs of healthy people during inspira-ion and expiration. Abnormal respiratory sounds that occur at an

∗ Corresponding author: Tel.: +385 21305641.E-mail addresses: [email protected] (I. Mazic), [email protected]

M. Bonkovic), [email protected] (B. Dzaja).

ttp://dx.doi.org/10.1016/j.bspc.2015.05.002746-8094/© 2015 Elsevier Ltd. All rights reserved.

earlier stage of asthmatic attacks found by stethoscope ausculta-tion include prolonged expiration and the emergence of wheezing.Typical devices that monitor the degree of airway obstruction arethe spirometer and peak flow meters, but these devices are unsuit-able for children from zero to six years of age who cannot determinehow to use them. Additionally, to ensure sufficient acoustic powerfor the respiratory sound, the children were usually encouraged toperform forced breathing [28]. Their efforts often resulted in spe-cific physiological artifacts such as inspiratory stridor or snoring,which sound similar to asthmatic wheezes. Stridor is a mono-phonic high pitch wheezing sound produced by air flow passingthrough the narrowed larynx. Snoring is a sound arising from thepassage of an air flow through the upper part of the respiratorytract that is obstructed by secretions. In addition, medical practi-tioners observed that children of a specified age (zero to six) aremore prone to respiratory infections than adults due to well vas-cularized mucosa. Additionally, because of the anatomy of organs(small size), infection is rarely limited to a particular region of therespiratory system, and usually the larynx, trachea and bronchi arealso affected. Hence, there is a problem with the exact classificationof respiratory sounds in children, which should be considered sepa-rately from adults. In addition, there are non-physiological artifactscaused by the interaction of the transducer with the skin. These arti-

facts were more pronounced in restless children, which is a typicalbehavior for younger and/or asthmatic children.

Therefore, trained physicians are able to recognize the children’sbreathing sound quality and diagnose an asthmatic attack, but there

Page 2: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

1 ocessi

afsctuitartsAme

taawn

cocfwowrw

••

dmdsec6

2

ftiaidwCekaf

06 I. Mazic et al. / Biomedical Signal Pr

re only a few devices [2,34], based on our knowledge, suitableor home use, which could help parents decide if it was neces-ary to treat the children with medications. One type of device thatan provide insight into the prognosis and the degree of obstruc-ion is the wheezometer [34], a device that detects wheezes thatsually appear in the early stage of an asthma attack. The most

mportant component, which constitutes these devices, except theransducer that captures the signal of breathing, is certainly anlgorithm that recognizes the wheezes from the acquired respi-atory sounds. Scientific literature devotes considerable attentiono wheezing recognition problems, which became a typical testtudy to prove new advances in the field of signal processing [3,4].lthough the first attempts to automatically detect wheezing wereore than 20 years ago, a significant success is recorded with the

mergence of machine learning algorithms.With an aim to reduce false asthma alarms and to improve

he quality of the wheezometer’s ability to distinguish asthmaticnd non-asthmatic inspiratory stridors, in this paper, we present

two-level coarse-to-fine classification algorithm for asthmaheezing recognition in children’s respiratory sounds (phonop-eumograms).

Due to the mentioned specificity, the first level of the classifi-ation algorithm has been assured with an SVM cascade consistingf two parallel classifiers that perform better than a single SVMlassifier and helps to discriminate similar non-asthma soundsrom asthma wheezes. Additionally, decisions on the second level,hich are based on the newly introduced digital detection thresh-

ld, approve or reject the first-level results, making the process ofheezing recognition in respiratory sounds with artifacts highly

eliable. Therefore, the novelty of the paper can be summarizedith the following outlines:

The proposed algorithm is more suitable for children;A new, cascade classification algorithm is introduced that reducesfalse asthma alarms due to inspiratory stridors;Feature set drawn from the Mel-cepstrum could be optimized bythe appropriate number of cepstral coefficients and their kurtosisand entropy;The algorithm is more robust to noise due to a newly introduceddigital detection threshold.

To clearly present the purpose of this research, the paper isivided into six sections. In Section 2, a literature overview, whichotivated the performance this research, is presented. Section 3

escribes the materials and methods used, particularly the mea-uring system and the components of the algorithm. Performedxperiments and their results are presented in Section 4. The dis-ussion is in Section 5. The paper presents the conclusions in Section.

. The research overview

A typical pattern recognition system is composed of two blocks:eature extraction and classification. They function in two steps:raining and testing. Training models the data, and then a discrim-nant is determined to delimitate the classes. For testing, new datare classified using the discriminant [5]. According to Bahoura [5],n the last two decades, the Fourier transform (FT) [6–9], linear pre-ictive coding also known as autoregressive modeling (AR) [10],avelet transform (WT) [11,12,30], and Mel Frequency Cepstraloefficients (MFCC) [23–25] have been used for wheezing feature

xtraction, whereas artificial neural networks (ANN) [6–9,12,30],-nearest neighbor (k-NN) [31], vector quantization (VQ) [11]nd Gaussian mixture models (GMM) [24,25,36] have been usedor respiratory sound classification. Current researchers use new

ng and Control 21 (2015) 105–118

approaches for spectro-temporal feature extraction [13,14] and/orSVM [15,32,35] for detection and quantization of wheezes based onvarious wheezing models. Additionally, there are methods for noiseremoval [16], which may be useful in obtaining better classificationresults.

In this paper, wheezing is described by using the MFCC analysisof an acoustic breathing record. This analysis closely approximatesthe human auditory system’s response. In this way, the featuredescribes exactly what humans (physician) can hear over thestethoscope. The researchers show that by using MFCC as featuresand GMM [24,25] or SVM [17] as classifiers, wheezing detectioncan achieve an accuracy higher than 95%. A similar accuracy hasbeen obtained using advanced signal processing techniques basedon the persistent homology of delay embedding [3]. Although theliterature mentions the importance of ensemble classifiers in thedecision making related to asthma patients [18,19], there is nopaper, based on our knowledge, that investigates more deeply theefficiency of ensemble methods in wheezing detection in com-parison to individual classifiers. In addition, the majority of theexperimental results are evaluated based on the breathing recordacquired under the controlled condition from the adult patients[29]. Considering the importance of reliability for wheezing detec-tion in children respiratory sounds, the goal of this paper is topresent how much the standard machine learning methods couldbe improved by the classifiers’ cascade, which can be perceived as astep toward ensembles. Under the specified conditions, the impor-tance and the number of used features in cascade was not testedrandomly, but the experiment has been set up to force the indi-vidual classifiers from the cascade to generate different errors dueto which the effectiveness of the cascade comes to the forefront.Typical situations could be illustrated for the breathing record withasthma wheezing and inspiratory stridor in which spectral and psy-choacoustic properties are similar. Therefore, we introduce cascadewith the ability to distinguish those types of wheezes and eliminatefalse positive detections. It is worth noting that wheezes indicatean asthmatic attack under which consuming medications makessense. Otherwise, applied therapy has the opposite effect—patientbecomes accustomed to the medication. Hence, eliminating thefalse positive detections is as important as the detection of a truepositive. Additionally, using a two-layer architecture in which thefirst layer is represented with the classifiers’ cascade and the sec-ond is realized using a digital threshold, it is possible to distinguishwheezes from non-wheezes more accurately. Therefore, the pro-posed architecture is further upgraded with the aim of improvingthe process of wheezing detection.

3. Materials and methods

In the following sections, the proposed algorithm is describedin detail including the method of selecting the parameters of theclassifier, the role of the chosen features and the way in whichfeatures are used to ensure greater reliability of the classification.Prior to that, the measuring system specifications utilized for datacollection are presented.

3.1. The measuring system

The measuring system consists of a transducer, 4-m long micro-phone cable, preamplifier transistor in a common emitter circuitbuilt with transistor BC107, stable 5-V source, and a personal com-puter with an integrated audio card that has one microphone input,

two analog inputs and a stereo output. The resolution of the AD/DAconverter was 16 bits, the input impedance of the analog inputs was47 k�, and the input voltage range was 2 VPP, signal-to-noise ratio(SNR) was 90 dB, and the total harmonic distortion at 1 VRMS was
Page 3: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processi

0fttoaiisb

XspBoam

3

taclta

3a

wcawcaaswotairwqitf

Fig. 1. Transducer connection to the audio card via transistor preamplifier.

.01%. The measurements used an analog card input. The samplingrequency was set to 8 kHz. Due to the low-level signal voltage athe transducer terminals, previously, a signal was enhanced with aransistor preamplifier as shown in Fig. 1. Measurements in previ-us studies showed the absence of significant spectral componentst frequencies higher than 650 Hz. Because the sampling frequencys 8 kHz (Nyquist frequency is 4 kHz), the analog low-pass antialias-ng filter was not embedded. The most important component of theystem is the transducer created with the accelerometer BU-3173y Knowles Electronics.

Using a computer program Sound Recorder (part of WindowsP) and an audio card Sound Blaster PCI64, digital data (sampledignal value) was collected in .wav format. The corresponding com-uter program converted data from the .wav format to a .txt format.y using MATLAB software tools, several programs were devel-ped (some with graphical interface) for data (signal) processings contained in these text files and were also used to display theeasurement results.

.2. The algorithm

The main goal of the presented research was to discriminatehe wheeze signals from the non-wheeze signals. Therefore, thecquired respiratory sounds have been separated into two specificlasses marked as W (wheeze) and NW (non-wheeze). In the fol-owing sections, all of the algorithm’s components, which includehe preprocessing stage, feature selection, SVM classifier cascadend digital detection threshold, are described in detail.

.2.1. The preprocessing stage of machine learninglgorithm—Extraction of breathing samples

Each signal in phonopneumograms lasting 10 s was first filteredith the Yule–Walker 50th-order high pass filter with the lower

ut-off frequency of 100 Hz to reduce the impact of cardiovascularnd muscular noise (Fig. 2a). Then, STFT is calculated (FFT methodith 50% overlapping Hamming window using 256 samples that

orrespond to 32 ms of time), and the result is graphically displayeds a 2D spectrogram (Fig. 2b). By the simultaneous playback andnalysis of spectrograms, a physician visually allocates distinctiveignal sequences and manually classifies them as breathing sounds,heezing, inspiratory stridors or snoring. Depending on the type

f signal, the corresponding allocation procedure is applied. Forhe case of wheezing, it is important to determine the beginningnd the end of that distinctive sequence. Therefore, after the dom-nant frequency of the wheeze is determined, the whole signal isouted through the Yule–Walker 50th-order band pass filter againhere the central frequency overlaps with the central wheeze fre-

uency (Fig. 2c). It is important to mention that this band pass filters only used to precisely determine the beginning and the end ofhe wheeze and to extract the features of the wheeze. Samples usedor training, validation and testing were not run through this filter.

ng and Control 21 (2015) 105–118 107

Fig. 2d shows an enlarged wheeze marked as W1 in Fig. 2b afterprecisely determining the wheeze start sample (1.03 × 104th sam-ple from the beginning of record) and the wheeze end sample aswell (1.32 × 104th sample from the beginning of record). It has 2900samples in total or, as expressed in the time domain, it lasts 362 ms.The overall signal in a phonopneumogram is then routed throughthe Yule–Walker 50th-order high pass filter again with the lowercut-off frequency of 100 Hz from which the signal sequence of theallocated wheeze W1 (Fig. 2e) was extracted, with an appropriatespectrum shown in (Fig. 2f) The initial signal segment of 100 msis allocated at the beginning of W1, which is then shifted every10 ms, forming a series of wheeze signal segments. Therefore, fromthe signal sequence of 362 ms, 27 signal segments are obtained,and the residue of the last 2 ms is rejected. Signal segments during100 ms represent the wheeze signal pattern from which featureextraction should be performed. Based on these feature values, themachine learning algorithm provides the classification. The sameprocedure was applied to other wheeze signal sequences. The totalamount of phonopneumograms was 45. For training and validationpurposes, 21 of them were used. The rest of the 24 phonopneu-mograms were used for testing. In 21 phonopneumograms, 5 ofthem contained wheezing, 9 contained other distinctive signalsbut not wheezing (inspiration, expiration, inspiration stridor andsnoring), and the remaining 12 phonopneumograms containedunclassified non-wheeze signals. For a total of 17 wheeze recor-dings in 5 phonopneumograms, 269 signal segments (patterns)were obtained. Other distinctive signal extractions (inspiration,expiration, inspiratory stridor and snoring) were determined in asimilar way. In total, nine phonopneumograms were used to obtain17 wheezes, 22 inspiratory, 19 expiratory, 5 inspiratory stridorsand 9 snore signal sequences. Using the same procedure describedabove for wheeze segment extraction, the following segments werestored: 924 segments to the sub-class I (Inspiration), 1650 seg-ments to the sub-class E (Expiration), 99 segments to the sub-class L(inspiratory stridor—appears in larynx) and 245 snores segments tothe sub-class S. All allocated distinctive signal sequences were with-out non-physiological artifacts with a clearly visible start and endtime. Except for the allocated distinctive signal sequences, the other12 randomly selected phonopneumograms without wheezes wereused for acquiring 2388 unclassified signal segments (forming sub-class U) lasting 100 ms, using 50% overlapping between each other.Due to the small number of wheezes, the wheeze signal detectionswere tested by using the leave-one-out cross validation (LOOCV)method described in Section 4.2.

3.2.2. Two-level coarse-to-fine classifier cascade decisionstructure

The performance of ensemble learning methods applied to clas-sification is based on the preparation of the model, which impliesa dataset, an initial pool of descriptors, and a machine-learningapproach. Variation of any of these items can be used to generate anensemble of models [33]. Here, we consider similar strategy, whichassumes the descriptors variation approach combined with an SVMclassifier stacked in parallel. The block scheme of the proposed clas-sifier is shown in Fig. 3. Each classifier functionality is described bythe appropriate model formed during the training phase. Based onthe results from literature [24,25], the MFCC are used as the basicfeatures that constitute descriptors for both of the models. Consid-ering the fact that data contain artifacts, additional features wereadded to adopt the discriminant between classes of wheezes andnon-wheezes. These would take into account the consequences ofnoise presence, such as energy distributions in the Mel filter bank

(kurtosis) or measures for disorder (entropy). The results from clas-sifiers are assembled using the logical AND function, which meansthat the signal segment (pattern) would be classified as a wheezeonly if each of the classifiers votes positively. Additionally, other
Page 4: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

108 I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118

Fig. 2. (a) Respiratory signal from phonopneumogram lasting 10 s; (b) 2D spectrogram with marked wheezes (W1, W2, W3); (c) filtered signal with the Yule–Walker 50th-o eeze fY ; (f) Sp

csado

3

rmtt

rder band pass filter in which the central frequency overlaps with the central whule–Walker 50th-order high pass filter with the lower cut-off frequency of 100 Hz

ombinations are possible, but these experiments go beyond thecope of this paper. Finally, as seen in the block scheme, the resultsre improved by introducing a digital detection threshold, which isescribed in Section 4.2. The following section gives a descriptionf the SVM functionality.

.2.3. Support vector machine (SVM)The SVM classifier is a kernel-based supervised learning algo-

ithm [27]. It is specifically designed for binary classification, whichatches our goals related to wheeze and non-wheeze classifica-

ion. During the training phase, SVM uses a kernel function to maphe input vectors into a higher dimensional feature space in which

requency; (d) enlarged wheeze W1; (e) allocated and filtered wheeze W1 with theectrum of the previously allocated and filtered wheeze signal sequence.

the support vectors (some of training samples) define a decisionboundary, i.e., the hyperplane that separates different classes (W:wheezes and NW: non-wheezes). In this research work, the radialbasis kernel functions (RBF) were used because they perform wellwith nonlinear models. The performance of the SVM classifier withan embedded RBF kernel relies on two parameters: C and � . Theprocedure for tuning the kernel parameters is useful and is a ver-satile tool for various tasks such as finding the right shape of the

kernel, feature selection, and finding the right tradeoff betweenerror and margin [26]. The value of parameter C for the soft marginwas set to one because we found that the classification results wereinsensitive to its changes for a very wide interval [10−2, 102]. The
Page 5: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processi

Class whe eze1 (W1)Class non -whe eze1 (NW1 )Kernel =RBF, gamma 1Features1Trainin g set=2 0%

Class whee ze2 (W2)Class non-wheeze2 (NW2)Kernel=RBF, gamma 2Features2Trainin g set=20%

SVM model M1 SVM model M2

Phonopneumo gramloading

High-pass filt erflow=100 Hz

Segmentation andfeatures extraction

Classification2

X1

X1·X 2

X2

Digit al th resh old d etection

AWDR (Autom ati c whee ze d etecti on r esult)

x1(i)={Features1} x2(i)={Features2}

i=i+1

Classification1

i=i+1

Fig. 3. Block scheme of the two-level SVM cascade classifier. The values for Fea-tl

pntr

3

pNawwsonsnfiwe

3

atmwt

A

R

sampling validation methods were performed. Twenty percent of

ures1, Features2, gamma1, gamma2, W1, W2, NW1 and NW2 are variable and areisted in Tables 3 and 4.

arameter � for the RBF kernel was adopted to obtain an optimalumber of MFCC features resulting from the maximum classifica-ion reliability, which is presented in Section 5 of the experimentalesults. Section 4 discusses the experimental results.

.2.4. Digital detection thresholdTo make the whole process more robust and more suitable for

ractical realization, a digital detection threshold was introduced.amely, assuming that the signal segments classified as wheeze aressigned the value one, and the signal segments classified as non-heeze are assigned the value zero with the aim that each error inheeze detection would not be considered an alarm (false positive

egments), the additional criterion was introduced for the decisionf the wheeze’s existence. This works in the following way: the sig-al sequence would be declared as wheeze only if in n consecutiveignal segments, at least k segments were classified as wheeze. Theumber k is called the digital detection threshold. Thus, the classi-cation results have a higher reliability of detected wheezes thanithout using the threshold, which is presented in Section 4 with

xperimental results.

.2.5. Evaluations measuresThe efficiency of wheezing recognition was evaluated with the

ppropriate validation measures such as the overall accuracy andhe overall reliability. The overall accuracy is a standard validation

easure defined in literature [33] and was calculated using (1),hereas the overall reliability was defined by the following equa-

ion:

CC = 1N

N∑ TPi + TNi

TPi + TNi + FPi + FNi(1)

i=1

= TPR · TNR (2)

ng and Control 21 (2015) 105–118 109

where TPR and TNR (TPR—True Positive Rate, TNR—True NegativeRate) were calculated using the following equations, respectively:

TPR = 1N

N∑

i=1

TPi

TPi + FNi(3)

TNR = 1N

N∑

i=1

TNi

TNi + FPi(4)

Parameter N represents the number of experiments performedusing the random subsampling validation method, and i is the indexof each iteration. The input data set was divided in two subsets:wheezes and non-wheezes. Typically, two-thirds of the data wereused for training and one-third for validation, but it could be anyother proportion [33]. TP, FP, TN and FN are the numbers of thesignal segments (samples) classified as true positive, false posi-tive, true negative and false negative, respectively. Reliability is theevaluation measure similar to the measure known in literature asperformance [20–22] or g-mean, which is calculated as a squareroot of Eq. (2). We found that the proposed measure is better thanperformance because it ensures a higher sensitivity of the highlyefficient classification results (results for which TPR·TNR is close toone). Thus, the evaluation of the classifier efficiency is even stricterthan using performance as a measure.

4. Experimental results

The phonopneumograms have been acquired using our ownmeasuring device with the accelerometer as a central part of thetransducer section. The measurements have been performed on 16children from one to six years of age in the General Hospital ofDubrovnik, Croatia, during their regular visit to a physician due to:

• Difficulties in breathing caused by obstructive bronchitis, whereasthma has not been diagnosed yet. In children younger thanthree years old, it is not possible to separate the symptoms ofasthma and obstructive bronchitis, which occurs more frequentlyas a part of various infections caused by virus.

• Regular medical check-ups for children with diagnosed asthma.• Children with asthma who have difficulties in breathing.

The measurements have been taken under the supervision ofthe physician. The sensor has been placed on the back lower sideof the chest (medial side of the back axillary), which correspondsto the right lower lobe of the lungs.

Each of the 45 acquired phonopneumograms lasted 10 s. Twotypes of experiments were performed. The first experiment wasoriented toward the validation of ordinary SVM and the newlyintroduced cascade structure in which the primary goal is to elimi-nate false positive detections caused by the inspiratory stridor. Thesecond experiment was oriented toward the reliability of wheezeclassification. In the following sections, both of the experiments aredescribed in detail.

4.1. Validation of ordinary SVM model and cascade SVM structure

The process of pattern recognition includes optimal featureset selection during the validation phase. Performed experimentsshowed that classification results were affected not only with RBFkernel parameter gamma but also with the number of used MFCCfeatures. Therefore, to find the optimal set of features, random sub-

samples from the class of wheezes (W) and 20% of samples fromthe class of non-wheezes (NW) were chosen for the model train-ing, whereas the remaining 80% of samples were used for model

Page 6: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

110 I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118

W} a

veudaamtome�atTt

sosws

rhfbertr

TTu

Fig. 4. Classifiers’ overall reliability curves (used classes were W = {

alidation in which an appropriate number of MFCC and param-ter � were determined. Primely, the ordinary SVM classifier wassed and the procedure was repeated N = 20 times, each time withifferent randomly chosen data to average TPR and TNR. The over-ll values for TPR and TNR were calculated according to Eqs. (3)nd (4). With this procedure, the goal was to determine the maxi-al overall reliability by changing the RBF kernel parameter � and

he number of MFCC. Typical results are shown in Fig. 4 and arebtained for signal segments of class W and class NW = {I, E}. Thiseans that the non-wheeze class is composed of inspiratory (I) and

xpiratory (E) signal segments. The curves obtained with � = 2 and = 20 obviously framed the curve with the maximal overall reli-bility calculated with the appropriate number of MFCC. To obtainhe optimal number of MFCC, the classification results were listed.hey were sorted from the highest overall reliability on the top tohe lowest overall reliability on the bottom.

In Table 2, the first two columns characterize the type of signalegments and their number in the class. For example, in the first rowf the first column, {W}, 269 means that the wheeze class has 269amples. In the second column, {I, E}, 2574 means that the non-heeze class, which consists of inspiratory and expiratory signal

egments, has a total of 2574 samples.It can be noted from Table 2 that the maximal classification

eliability occurs for signals in which the spectral characteristicsave less resemblance to each other. This is the case with signals

rom classes W and NW = {I, E}. This statement could be justifiedy the visual inspection of spectrograms that belong to inspiration,xpiration and wheeze presented in the section with experimentalesults (Fig. 6). Additionally, it can be noted from Table 2 that with

he increasing number of samples in the class NW (1st, 4th and 5thow), the overall reliability of classification decreases.

able 1op 10 best validation results expressed with defined measures and achieved bysing the appropriate number of MFCC (first column) and parameter � .

MFCC TPR (%) TNR (%) R = TPR · TNR (%) ACC (%) �

10 99.6877 99.9347 99.6226 99.9105 2.012 99.6640 99.9418 99.6060 99.9146 3.410 99.6836 99.9106 99.5945 99.8886 2.612 99.6162 99.9542 99.5705 99.9215 3.09 99.5723 99.9927 99.5650 99.9519 2.812 99.5971 99.9638 99.5611 99.9280 2.812 99.5783 99.9590 99.5375 99.9216 2.410 99.5707 99.9395 99.5104 99.9037 2.210 99.5685 99.9396 99.5083 99.9039 2.48 99.5619 99.9371 99.4992 99.8997 2.0

nd NW = {I,E}) using RBF kernel with 1 ≤ � ≤ 20 and 2 ≤ MFCC ≤ 20.

The top 10 best results are shown in Table 1. The highest overallreliability was obtained for 10 MFCC and � = 2.0.

The same procedure was repeated for other NW (non-wheeze)sub-classes, and the results are shown in Table 2.

The statistical measures of classification efficiency, such as reli-ability and accuracy, can clearly indicate that the system is able (ornot) to discriminate the signal segments from different classes well.Nevertheless, we wish to have a deeper insight into the correlationof calculated reliability and the ability of the system to discriminatesimilar signals belonging to different classes. A typical example isthe ability to discriminate inspiratory stridor (which appears in thelarynx and is not the asthma consequence) from asthma wheezes(which appear in bronchi and have a similar acoustic property).More specifically, we wonder if the calculated overall reliabilityobtained on the training and validation set, R = 97.68% (the last rowin Table 2), is high enough to correctly identify the time at whichthe distinctive signals (wheezes and non-wheezes) appear by usingthe basic SVM classifiers. The classification scheme is shown inFig. 5. The feature set marked as Features0 was composed of 12MFCC, calculated from each signal segment x(i) and lasting 100 ms.The model M0 was formed by using an RBF kernel with � = 5.0 andwith 20% of the randomly selected samples belonging to the classW0 = {W} and 20% of the randomly selected samples belonging tothe class NW0 = {I,E,S,L,U} using 12 MFCC as features (see the last rowin Table 2). The SVM classification result is a binary value, assum-ing that the sample classified as wheeze value one is assigned andthe sample classified as non-wheeze value zero is assigned. Thefinal decision depends on the digital detection threshold approvalobtained by experimentally determined values for n and k (n = 10and k = 5). The described procedure resulting with wheeze detec-tion (AWDR—automatic wheeze detection results) was applied on21 phonopneumograms (training and validation set) with the mostinteresting results shown in Fig. 6. It consists of six parts, each

belonging to the corresponding phonopneumogram with analysisresults presented in three separate parts: (a), (b) and (c). The partsrepresent the spectrogram of the appropriate phonopneumogram,the SVM classification result and the digital detection threshold

Table 2Type and number of the signal segments in the classes - n, parameter � values,number of MFCC features and overall reliability.

Class W, n Class NW, n � Features R = TPR · TNR (%)

{W}, 269 {I,E}, 2574 2.0 {10MFCC} 99.62{W}, 269 {S}, 245 2.4 {13MFCC} 99.61{W}, 269 {L}, 99 2.6 {12MFCC} 98.83{W}, 269 {I,E,S,L}, 2918 4.2 {14MFCC} 98.71{W}, 269 {I,E,S,L,U}, 5306 5.0 {12MFCC} 97.68

Page 7: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 111

Class w hee ze0=( W0)Class non-whezz e0=(NW0 )Kernel=RBF, gamma 0Fea tures0Training set=20 %

SVM model M0

Phonopneumogramloading

High-pass filt erflo w=100 Hz

Segmentati on andfea tur es extracti on

Classificati on

Digit al threshold d etec tion

AWDR (A utomati c whee ze d etec tion result)

x(i)={Fea tures0 }i=i+1

Fig. 5. Block scheme of the AWDR process with the SVM classifier and binary detec-tion threshold. The feature set Features0 is composed of 12 MFCC. The model M0is formed using an RBF kernel with � = 5.0 and with 20% of the randomly selectedsamples belonging to the class W0 = {W} and 20% of the randomly selected samplesbi

afisbaeaWsiros(e

dciidiPs

sntfstpct

Table 3Specifications for cascade SVM models where n is the total number of samples inthe class.

Model M1 M2

Class Wx, n W1 = {W}, 269 W2 = {W}, 269Class NWx, n NW1 = {I,E,S,L,L′ ,U}, 5383 NW2 = {L,L′}, 176Features Features1 = {12MFCC} Features2 = {12MFCC}gamma gamma1 = 5.0 gamma2 = 2.0Training set 20% 20%

Remark: Model M2 was formed using 20% of the randomly selected samples of classW2 and 20% of the randomly selected samples of class NW2 . The best classificationresults were obtained for 12 MFCC and � = 2.0. The overall reliability was R = 94.68%.

Table 4The specifications of the new M1 model.

Model M1 M2

Class Wx , n W1 = {W}, 269 W2 = {W}, 269Class NWx , n NW1 = {I,E,S,L,L′ ,U}, 5383 NW2 = {L,L′}, 176Features Features1 = {12MFCC,K,E} Features2 = {12MFCC}gamma gamma1 = 5.0 gamma2 = 2.0Training set 20% 20%

Table 5The comparison of the overall reliability values obtained with the old descriptor setand the newly introduced set.

Class W, n Class NW, n � FEATURES R = TPR · TNR (%)

elonging to the class NW0 = {I,E,S,L,U} using 12 MFCC as features (see the last rown Table 2).

pproval, respectively. The label of the phonopneumogram is speci-ed in the left bottom corner (P94, P77, P117, P82, P96, P105). In thepectrogram P94, three wheezes marked with W1, W2 and W3 cane clearly seen, and signal segments belonging to these wheezesre correctly detected, which is indicated with a binary functionqual to one in time at which wheezes appear on the figure part (b)nd (c). The same can be concluded for the wheezes labeled with

1, W2, W3 and W4 in the phonopneumogram P77. Additionally, inpectrogram P117, which contains six wheezes (W1–W6) and fivenspiratory stridors (L1 through L5), all distinctive signals were cor-ectly detected. Phonopneumogram P82 contains three snores andne wheeze (between the 2nd and 3rd snore). A comparison of thepectrogram P82 (figure part a) and the SVM classification resultsfigure part b) reveals that the wheeze was not detected along thentire duration.

The first two thirds of the wheeze duration were not correctlyetected (false negative signal segments) and only the last third wasorrectly detected (true positive signal segments). Additionally,ncorrectly classified non-wheeze signal segments in P82 belong-ng to snores (Fig. 6b) were eliminated (Fig. 6c) using a suitableigital detection threshold value (n = 10, k = 5). It is worth observ-

ng that all signal sequences in between marked sequences in P94,97, P117 and P82 do not belong to the training set or validationet.

The phonopneumogram labeled with P96 (in which all signalequences belong to sub-classes I and E) represents the breathingoise without artifacts and with visually different areas belongingo inspiration and expiration. The classification reliability is 100%or this breathing record. In P105 (all signal sequences belong toub-class U), there are artifacts in the respiratory sound caused by

he movement of the transducer, but also, an inspiratory stridor wasresent (marked with L1 and L2). All signal sequences in P105 wereorrectly detected. In 19 of 21 phonopneumograms that were inhe training set, all of the signal sequences were correctly classified

{W}, 269 {I,E,S,L,L′U}, 5383 5.0 {12MFCC} 96.71{W}, 269 {I,E,S,L,L′U}, 5383 5.0 {12MFCC,K,E} 97.36

after the digital detection threshold’s final decision. However, thealgorithm results for phonopneumograms labeled as P99 and P126were not correct. The algorithm detected four false positive sig-nal sequences in which the physician marked them as inspiratorystridors (Fig. 7). In addition, it is interesting that an attempt to over-fit the training data did not improve the classification result. Forthe mentioned purpose, an additional training was included with77 new inspiratory stridor segments (L3, L4 in P99, and L1, L2 inP126). These additional signal segments were acquired by using a90% overlapping window instead of a 50% overlapping window (P99and P126 belonging to sub-class U). They formed a new sub-classlabeled with L′. Rebuilding the model M0 with 20% of the randomlyselected samples from class W0 = {W} and 20% of the randomlyselected samples from class NW0 = {I,E,S,L,L′,U}, these inspiratorystridor sequences were misclassified again. This would mean thatinspiratory stridors were again classified as wheezes. Therefore,two SVM cascade models were introduced of which block schemeis shown in Fig. 3.

The specified cascade structure classified all signal sequencesfrom phonopneumograms P99 and P126 correctly along with theother 18 of 21 phonopneumograms. However, the wheeze signalsequence W1 in P82 was not correctly classified. The results areshown in Fig. 8 in which part (e) represents the automatic wheezedetection result (AWDR). It can be observed that no wheezes weredetected, even though one wheeze exists. The reason is related tounsuitable M1 model specifications (SVM classification result formodel M1 is shown in Fig. 8b) (Table 3).

Therefore, additional features in the descriptor set were added.These would have to take into account the consequence of the respi-ratory noise presence, such as energy distributions in the Mel filtersbank (K-kurtosis) or measures for disorder (E-Renyi Entropy, = 2).The new M1 model specifications are presented in Table 4, whereas

model M2 is unchanged. The algorithm consisted of the describedmodels that classified all wheezes correctly. The comparison of theoverall reliability values obtained with the old descriptor set and
Page 8: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

1 ocessi

tiw

f

Fd

12 I. Mazic et al. / Biomedical Signal Pr

he newly introduced set is shown in Table 5. The overall reliabil-

ty increased to 97.36%, which was sufficient to discriminate all 17

heezes from non-wheeze sequences.The final classification results for critical phonopneumograms

rom the validation set (P82, P99 and P126) are presented in Fig. 9.

ig. 6. The results of automatic wheeze detection for different phonopneumograms. (a) 2ecision. The binary function is assigned to one if the signal segment is recognized as wh

ng and Control 21 (2015) 105–118

Additionally, it is important to emphasize that using the two-

level coarse-to-fine classification algorithm presented in Fig. 3 andmodels M1 and M2 presented in Table 4, all signals in 24 phonop-neumograms from the testing set (there are all non-wheeze signalswith artifacts) were correctly detected.

D spectrogram; (b) SVM classification result; (c) digital detection thresholds’ finaleeze.

Page 9: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 113

(Conti

4r

t

Fig. 6.

.2. Reliability of wheezing recognition and an observation

elated to (un)desirable characteristics of wheezing features

Based on the results presented in the previous section, the ques-ion related to the training and testing set quality emerged because

nued ).

there was no wheezing in the testing set. To verify the quality of

the acquired data, a set of experiments were performed, whichresulted in an interesting observation related to the distributionof the central frequencies that characterize the wheezes from thetraining set. It is worth mentioning that MFCC were calculated for
Page 10: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

114 I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118

Fig. 7. The results of automatic wheeze detection for P99 and P126 (a) 2D spectrogram; (b) SVM classification result; (c) digital detection thresholds’ final decision. Thebinary function is assigned to one if the signal segment is recognized as wheeze.

Fig. 8. (a) 2D spectrograms of phonopneumogram P82; (b) SVM classification result for model M1; (c) SVM classification result for model M2; (d) Cascade SVM classificationresult; (e) Final AWDR after the digital detection thresholds’ approval.

Page 11: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 115

phono

tfwf

Fig. 9. The final results for critical

he frequency range between 100 Hz and 1000 Hz, which is a typicalrequency range in which wheezing occurs. Additionally, the singleheeze is characterized by its own central frequency. These central

requencies were calculated for monophonic wheezes (wheezes

pneumograms P82, P99 and P126.

with basic harmonic or with basic harmonic multiples) and poly-phonic wheezes (wheezes without harmonic connections betweendominant spectral components) as well. The central frequency wasobtained from the dominant spectral component as the arithmetic

Page 12: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

116 I. Mazic et al. / Biomedical Signal Processing and Control 21 (2015) 105–118

Table 6Central frequencies of wheezes inside the frequency bandwidth of 20 Hz; f (Hz)—central wheeze frequency ±10 Hz; �f (Hz)—the difference between neighbor central wheezefrequencies; W—wheeze; P—phonopneumogram; C—children.

mCiwP

fewrrdNfsrwapmw

qWmbcdaai.bewmtbesaodec

ean calculated from the upper and lower cut-off frequencies.entral frequencies of all wheezes and the number of wheezes

nside the frequency bandwidth of 20 Hz are shown in Table 6. Theheezes were recorded in five phonopneumograms (P76, P77, P82,

94, and P117) that belonged to four children (C5, C6, C8, and C10).The leave-one-out cross validation (LOOCV) method was used

or testing the wheeze detection algorithm. Therefore, 17 differ-nt SVM models M0i were formed where i = 1,2,3,. . ., 17 is theheeze index number. The data set W = {W1,W2,W3, . . ., W17} rep-

esents all 17 wheezes. Each model M0i was validated using theandom subsampling method (20% for training and 80% for vali-ation with N = 20 iterations) from the classes W0i = {W\Wi} andW0 = {I,E,S,L,L′,U} to determine the optimal number of MFCCi and

or � i. Thereafter, model M0i was created using 20% of the randomlyelected signal segments from class W0i = {W\Wi} and 20% of theandomly selected signal segments from class NW0 = {I,E,S,L,L′,U}ith appropriate MFCCi and � i (Features0i = {MFCCi,Ki,Ei}). The

lgorithm of which block scheme is presented in Fig. 5 tested theossibility for each wheeze detection separately using each of theodels. Additionally, the digital detection threshold was definedith n = 10, k = 5.

Experiments showed that wheeze W1 with the central fre-uency of 260 Hz was not detected by model M01, whereas wheeze17 with the central frequency of 600 Hz was not detected byodel M017. It can be observed from Table 6 that the difference

etween the central frequencies of the tested wheeze W1 and itslosest neighbor from the training set W2, was 100 Hz, whereas theifference between central frequencies of the tested wheeze W17nd its closest neighbor from the training set W16 was 140 Hz. Usingll other possible combinations of wheeze classes for LOOCV train-ng and testing, all wheezes were detected correctly (models M02,

. ., M016). In these cases, as Table 6 shows, the maximal differenceetween the central frequencies of the tested wheeze and its clos-st neighbor from the training set was obtained by model M016 andas f(W16) − f(W15) = 40 Hz. This observation led to a new experi-ent with the aim to test the influence of the differences between

he central frequencies of the tested wheeze and the closest neigh-oring wheeze in the training set on classifier performance. Thexperiment was designed in such a way as to eliminate wheezeegments from sequence W15 with a central frequency of 420 Hznd wheeze segments from sequence W16 with a central frequency

f 460 Hz from the training set, thus allowing for the possibility toetermine more information regarding the desired maximal differ-nce between the central frequencies of the tested wheeze and itslosest neighbor from the training set. Specifically, the experiment

was performed with 20% of the randomly selected signal segmentsfrom class W0T = {W\W15\W16} and 20% of the randomly selectedsignal segments from class NW0 = {I,E,S,L,L′,U}, which formed themodel M0T (leave-two-out). The experimental results proved thatall wheezes were detected correctly except wheeze W16, thusleading to a conclusion that the maximal difference between thecentral frequencies of the tested wheeze f(W16) = 460 Hz and itsclosest neighbor from the training set f(W14) = 400 Hz should be lessthan 60 Hz. Therefore, the conclusion from the first experiments inwhich all wheezes with the differences in their central frequencyand their closest neighbor in the training set smaller than 40 Hzwere detected correctly is supported. Nevertheless, the number ofwheezes within the specified frequency range is not critical at all.It is worth mentioning that this result was obtain using 12 MFCCin the frequency range between 100 Hz and 1000 Hz, which corre-sponds to the appropriate density of mel filters (width of each is≈130 mel). For different Mel filter densities, the frequency of 40 Hzcould be different.

5. Discussion

This paper shows that the algorithm based on the SVM classifierwith RBF kernel, with appropriate values chosen for the parameter� and the number of MFCC, can achieve a high overall reliabil-ity (R = 97.68%) of wheezing recognition in children respiratorysounds that contain artifacts. It is obvious that errors in classifica-tion (related to false positive and false negative signal segmentsas well) are not evenly distributed but are grouped around sig-nal segments belonging to different classes with similar spectralcharacteristics. Therefore, there are some phononeumograms forwhich the applied procedure of wheezing detection results witherrors after the digital detection threshold makes the final deci-sion and causes the wrong classification of wheeze and non-wheezesignal parts. Namely, false positive distinctive signal sequences inphonopneumograms P99 and P126 were detected, whereas in P82,only the last third of the wheeze was detected correctly.

With training the model again by adding typical signal segments(i.e., overfitting), the overall reliability decreased (1st row in Table 5versus 5th row in Table 2). This obviously means that the new dis-criminant classified the new segments incorrectly. Changing thedescriptor set by adding the new features (kurtosis of the MFCC and

entropy) caused the overall reliability to increase but caused errorsthat were different from the ones obtained with the start descrip-tor that were set to appear. Currently, the first half of the wheezesegments in P82 was declared as false negative, but the second half
Page 13: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

I. Mazic et al. / Biomedical Signal Processi

Table 7List of signal features.

MFCC Mel frequency cepstral coefficients

W WheezeNW Non-wheezeI InspirationE ExpirationL Inspiratory stridorL′ Inspiratory stridorS SnoresU Unclassified non-wheezeRBF Radial basis kernel functions

ACC Overall accuracyR Overall reliabilityTPR True Positive RateTNR True Negative RateTP True PositiveFP False PositiveTN True NegativeFN False Negative� Gamman Number of signal segments in the classesK Kurtosis (energy distributions in mel filter bank)E Reyni entropy (measure for disorder)P PhonopneumogramC Children

wAanosOwvctaasia(

6

iwofpaantoqttrwt

es

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

AWDR Automatic wheeze detection results

as detected correctly (compare AWDR of P82 in Figs. 6 and 9).dditionally, false positive distinctive signal sequences from P99nd P126 were not removed. It is obvious that using the ordi-ary SVM and RBF kernel with optimal gamma and the numberf MFCC gave no discriminant function that would correctly clas-ify all signal sequences belonging to different classes (W and NW).n the contrary, introducing the sub-class discriminant achievedith an SVM cascade consisted of two SVMs with different gamma

alues (gamma1 = 5 and gamma2 = 2) and resulted in a better dis-riminant between different classes. The final improvement withhe digital detection threshold resulted in 100% reliability for allcquired respiratory sounds with significant artifacts. Certainly,

higher number of signal segments belonging to the inspiratorytridor (non-asthma wheezes with spectral characteristics sim-lar to asthma wheezes) would ruin the obtained results, anddditional attention would be needed on the cascade structureTable 7).

. Conclusions and further research direction

The proposed two-level algorithm for wheezing detectionn children’s respiratory sounds with artifacts can discriminate

heeze signals from non-wheeze signals more effectively than anrdinary SVM classifier with optimized parameters. SVM modelsrom the cascade structure are not generated by randomly chosenarameter values. The properties of the cascade are deliberatelynd systematically changed to improve the classification resultsnd are tested on the new data from 24 phonopneumograms withegative samples. Additionally, it has been experimentally verifiedhat a necessary condition for successful wheeze detection basedn MFCC features assumes that the difference between central fre-uencies of the tested wheeze and its closest neighbor from theraining set should be limited to 40 Hz. For differences above 60 Hz,he wheeze will be misclassified. This observation is valuable foresearch in which recognition is based on the MFCC features andhere the quality of the training set is questionable and is subject

o our future research.Therefore, the subject of future research is to compare the

fficiency of the proposed cascade structure with the “clas-ic” ensemble structure such as random forest and to confirm

[

[

ng and Control 21 (2015) 105–118 117

the emerged property related to the central frequency distribu-tion of wheezing in the training set, acquiring more respiratorysounds.

References

[1] R.T.H. Laennec, in: J.-A. Brosson, J.-S. Chaudı (Eds.), De l’auscultation mıediate,ou, Traitıe du diagnostic des, 1819, Paris.

[2] Aerocrine [Online], Niox Vero, 4 06, 2015–04 06, 2015, 2015,〈http://www.niox.com/en/about-niox-mino/about-niox-vero/〉.

[3] S. Le Cam, et al., Wheezing sounds detection using multivariate generalizedgaussian, in: IEEE International Conference on Acoustics, Speech and SignalProcessing, ICASSP 2009, Taipei, Taiwan, 2009.

[4] S. Emrani, T. Gentimis, H. Krim, Persistent homology of delay embeddings andits application to wheeze detection, IEEE Signal Process. Lett. 21 (4) (2014)459–463.

[5] M. Bahoura, Pattern recognition methods applied to respiratory sounds, Com-put. Biol. Med. 39 (9) (2009) 824–843.

[6] K.E. Forkheim, D. Scuse, H. Pasterkamp, A comparison of neural network, in:IEEE WESCANEX 95 Proceedings, New York, NY, 1995.

[7] S. Rietveld, M. Oud, E.H. Dooijes, Classification of asthmatic breath sounds: pre-liminary results of the classifying capacity of human examiners versus artificialneural networks, Comput. Biomed. Eng. 32 (4) (1999) 440–448.

[8] L.R. Waitman, K.P. Clarkson, J.A. Barwise, P.H. King, Representation and clas-sification of breath sounds recorded in an intensive care setting using neuralnetworks, J. Clin. Monit. Comput. 16 (2000) 95–105.

[9] I. Gler, H. Pola, U. Ergn, Combining neural network and genetic algorithm forprediction of lung sounds, J. Med. Syst. 28 (3) (2005) 217–231.

10] B. Sankur, et al., Comparison of AR-based algorithms for respiratory soundclassification, Comput. Biol. Med. 24 (1994) 67–76.

11] L. Pesu, et al., Classification of respiratory sounds based on wavelet packetdecomposition and learning vector quantization, Technol. Health Care 6 (1)(2004) 65–74.

12] A. Kandaswamy, et al., Neural classification of lung sounds using waveletcoefficients, Comput. Biol. Med. 34 (2004) 523–537.

13] F. Jina, F. Sattarb, D.Y.T. Gohc, New approaches for spectro-temporal featureextraction with applications to respiratory sound classification, Neurocompu-ting 123 (2014) 362–371, ISSN: 0925-2312.

14] Baiying Lei, Shah Atiqur Rahman, Insu Song, Content-based classificationof breath sound with enhanced features, Neurocomputing 141 (2) (2014)139–147, ISSN: 0925-2312.

15] M. Elhilali, et al., A multiresolution analysis for detection of abnormal lungsounds, in: 34th Annual International Conference of the IEEE EMBS, San Diego,2012.

16] M. Molaie, et al., A chaotic viewpoint on noise reduction from respiratorysounds, Biomed. Signal Process. Control 10 (2014), ISSN: 1746-8094.

17] R. Palaniappan, K. Sundaraj, S. Sundaraj, A comparative study of the svm andk-nn machine learning algorithms for the diagnosis of respiratory pathologiesusing pulmonary acoustic signals, BMC Bioinf. 15 (2014) 223.

18] M.C. Prosperi, et al., Predicting phenotypes of asthma and eczema with machinelearning, BMC Med. Genomics 7 (57) (2014).

19] N. Emanet, et al., A comparative analysis of machine learning methods forclassification type decision problems in healthcare, Decis. Anal. 6 (2014) 1(SpringerOpen Journal).

20] R. Riella, et al., Automatic wheezing recognition in recorded lung sounds, in:Proceedings of the 25th Annual International Conference of the IEEE Engineer-ing in Medicine and Biology Society, 2003, pp. 2535–2538.

21] Jen-Chien Chien, et al., Wheeze detection using cepstral analysis in Gauss-ian mixture models, in: 29th Annual International Conference Engineering inMedicine and Biology Society, 2007, pp. 3168–3171.

22] L. Bor-Shing, et al., Wheeze recognition based on 2D bilateral filtering of spec-trogram, Biomed. Eng. Appl. Basis Commun. 18 (2006) 29–38.

23] M. Bahoura, C. Pelletier, New parameters for respiratory sound classification,in: Canadian Conference on Electrical and Computer Engineering, 2003, IEEECCECE 2003, Montreal, 2003.

24] M. Bahoura, C. Pelletier, Respiratory sounds classification using cepstral analy-sis and Gaussian mixture models, in: 26th Annual Conference of the IEEE EMBS,San Francisco, 2004.

25] M. Bahoura, C. Pelletier, Respiratory sounds classification using Gaussian mix-ture models, in: Canadian Conference on Electrical and Computer Engineering,IEEE CCECE 2004, Niagara Falls, 2004.

26] Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, Sayan Mukherjee, Choos-ing multiple parameters for support vector machines, Mach. Learn. 46(2002).

27] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines andOther Kernel-based Learning Methods, Cambridge University Press, New York,NY, 2000.

28] F. Dalmay, et al., Acoustic properties of the normal chest, Eur. Respir. J. 8 (1995)1761–1769 (Series ‘Chest Physical Examination’).

29] W. Diane, Heart & Lung Sounds Reference Library, PESI Health Care, ProductCode: RNA007940 ISBN: 0984525483.

30] Z. Dokur, T. Olmez, Classification of respiratory sounds by using an arti-ficial neural network, Int. J. Pattern Recognit. Artif. Intell. 17 (4) (2003)567–580.

Page 14: Biomedical Signal Processing and Control · Mazi´c et al. / Biomedical Signal Processing and Control 21 (2015) 105–118 107 Fig. 1. Transducer connection to the audio card via transistor

1 ocessi

[

[

[

[

[35] M. Wisniewski, T.P. Zielinski, Application of tonal index to pulmonary wheezes

18 I. Mazic et al. / Biomedical Signal Pr

31] A. Gurung, C.G. Scrafford, J.M. Tielsch, O.S. Levine, W. Checkley, Computerizedlung sound analysis as diagnostic aid for the detection of abnormal lung sounds:A systematic review and meta-analysis, Respir. Med. 105 (9) (2011) 1396–1403,

http://dx.doi.org/10.1016/j.rmed.2011.05.007

32] B.-S. Lin, T.-S. Yen, An FPGA-based rapid wheezing detection system, Int. J.Environ. Res. Public Health 11 (2) (2014) 1573–1593.

33] Murty M. Narasimha, Devi V. Susheela, Pattern Recognition An AlgorithmicApproach, Springer-Verlag, London, 2011.

[

ng and Control 21 (2015) 105–118

34] Sonea Air, iSone Ltd., 2014, 〈http://isonea.com/〉 [Online] 9 25, 2014. 9 24, 2014.http://isonea.com/.

detection in asthma monitoring, in: 19th European Signal Processing Confer-ence (EUSIPCO 2011), Barcelona, Spain, 2011.

36] X. Lu, M. Bahoura, An integrated automated system for crackles extraction andclassification, Biomed. Signal Process. Control 3 (3) (2008), ISSN: 1746-8094.