feature extraction from heart sound 31-12-2010
TRANSCRIPT
PSZ 19 :16 (Pind. 1/97)
UNIVERSITI TEKNOLOGI MALAYSIA
BORANG PENGESAHAN STATUS TESIS***
JUDUL: FEATURES EXTRACTION OF HEART SOUNDS USING TIME-FREQUENCY
DISTRIBUTION AND MEL-FREQUENCY CEPSTRUM COEFFICIENT
SESI PENGAJIAN : 2005/2006
Saya MASNANI BT MOHAMED_________ (HURUF BESAR)
mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah)* ini disimpan di Perpustakaan Universiti
Teknologi Malaysia dengan syarat-syarat kegunaannya seperti berikut:
1. Tesis adalah hak milik Universiti Teknologi Malaysia.
2. Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan pengajian
sahaja.
3. Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi
pengajian tinggi.
4. * * Sila tandakan ( ���� )
Disahkan oleh
_____________________________ ______________________________
(TANDATANGAN PENULIS) (TANDATANGAN PENYELIA)
Alamat Tetap:
LOT 530, KAMPUNG TENDONG, PROF. IR. DR. SHEIKH HUSSAIN
17030 PASIR MAS, BIN SHAIKH SALLEH__________
KELANTAN Nama Penyelia
Tarikh : 2 MAY 2006 __ Tarikh : 2 MAY 2006__________
CATATAN : * Potong yang tidak berkenaan
** Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak
berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh
tesis ini perlu dikelaskan sebagai SULIT atau TERHAD.
*** Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara
penyelidikan, atau disertai bagi pengajian secara kerja kursus atau
penyelidikan, atau Laporan Projek Sarjana Muda (PSM).
����
SULIT
TERHAD
TIDAK TERHAD
(Mengandungi maklumat yang berdarjah keselamatan
atau kepentingan Malaysia seperti yang termaktub di
dalam AKTA RAHSIA RASMI 1972)
(Mengandungi maklumat terhad yang telah ditentukan
oleh organisasi / badan di mana penyelidikan dijalankan)
“I/We* hereby declare that I/we* have read this thesis and in my/our* opinion this
thesis is sufficient in terms of scope and quality for the award of the degree of
Master of Engineering (Electrical-Electronic & Telecommunication)”
Signature : ....................................................................
Name of Supervisor : Prof. Ir. Dr. Sheikh Hussain Shaikh Salleh
Date : 2 May 2006
FEATURES EXTRACTION OF HEART SOUNDS USING TIME-
FREQUENCY DISTRIBUTION AND MEL-FREQUENCY CEPSTRUM
COEFFICIENT
MASNANI BT MOHAMED
A thesis submitted in fulfillment of the
requirements for the award of the degree of
Master of Engineering (Electrical – Electronic & Telecommunication)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
MAY 2006
ii
I declare that this thesis entitled “Features Extraction of Heart Sounds using Time-
Frequency Distribution and Mel-Frequency Cepstrum Coefficient ” is the result of
my own research except as cited in the references. The thesis has not been accepted
for any degree and is not concurrently submitted in candidature of any other degree.
Signature : ………………………..
Name : Masnani Bt. Mohamed
Date : 2 May 2006
iii
To my beloved parents, Mohamed B. Shafie and Siti Zaharah Bt. Othman, my lovely
brothers and sisters
iv
ACKNOWLEDGEMENTS
I would like to thank my supervisor, Prof. Ir. Dr. Sheikh Hussain B. Shaikh
Salleh for giving me the opportunity to work on this project and always giving me
lots of good advices. I would also like to thank En. Kamarulafizam B. Ismail for
giving me the opportunity to cooperate with him. He has been a great source of
information and his invaluable guidance and timely input has helped me throughout
my Master’s project. His good guidance has given me an exposure to learn a lot of
new things.
A great thank to Mr. Asasul Islam for providing me the heart sounds data for
my analysis. I would also like to thank all my colleagues in UTM & KUITTHO for
their supports and helping me to complete this project successfully. Last but not
least, a special thanks to my family especially my beloved parents, brothers and
sisters who have supported me all through their life and provided constant
encouragement.
v
ABSTRACT
Heart sounds analysis can provide lots of information about heart condition
whether it is normal or abnormal. Heart sounds signals are time-varying signals
where they exhibit some degree of non-stationary. Due to these characteristics,
therefore, two techniques have been proposed to analyze them. The first technique is
the Time-Frequency Distribution using B-Distribution, used to resolve signal’s
components in the time-frequency domain and specifies the frequency components
of the signal that changing over time. Another proposed technique is the Mel-
Frequency Cepstrum Coefficient, used to obtain the cepstrums coefficients by
resolving signal’s components in the frequency domain. An experiment is presented
to extract features of heart sounds using both mentioned techniques and compare
their performances. Both techniques are discussed in details and tested against ideal
simulations of 50 heart sound signals including normal and abnormal signals. All
simulations are done using Matlab software except for MFCC where it has used the
Microsoft Visual C++ software. A brief description of SVD is included to the
technique using time-frequency distribution. Also, a brief description of Neural
Network is used to verify and to compare the performances results of the two
techniques with regard to the values of hidden node, learning rate and momentum
coefficient. The results showed that performance of the TFD can be achieved up to
90% whereas MFCC is only 80%. Therefore, the TFD technique is chosen as the
best technique to analyze and to extract features of the non-stationary signals such as
the heart sounds signals.
vi
ABSTRAK
Analysis degupan jantung dapat memberikan banyak maklumat tentang
keadaan jantung sama ada ia normal atau tidak. Isyarat degupan jantung sentiasa
berubah-ubah, menunjukkan bahawa ia adalah isyarat yang tidak pegun. Disebabkan
oleh ciri-ciri tersebut, maka dua teknik khas telah disarankan untuk menganalisanya.
Teknik yang pertama adalah menggunakan taburan masa-frekuensi (TFD) dengan
jenis taburan-B (B-Distribution) untuk merungkaikan komponen-komponen isyarat
dalam domain masa-frekuensi. Satu lagi teknik yang disarankan adalah mengunakan
pekali Mel-Frekuensi Sepstrum (MFCC) bagi mendapatkan pekali sepstrum dengan
merungkaikan komponen-komponen isyarat dalam domain frekuensi. Satu
eksperimen telah dilakukan bagi mengekstrak ciri-ciri yang ada pada bunyi degupan
jantung menggunakan dua teknik tersebut dan membandingkan tahap pencapaian
yang diperolehi. Kedua-dua teknik telah dibincangkan dengan terperinci dan telah
diuji dengan mensimulasi sebanyak 50 isyarat degupan jantung yang terdiri daripada
isyarat normal dan abnormal. Kesemua teknik simulasi tersebut telah dilakukan
menggunakan perisian Matlab kecuali MFCC menggunakan perisian Microsoft
Visual C++. Terdapat penerangan ringkas tentang penguraian nilai tunggal (SVD)
yang digunakan bersama teknik TFD. Juga disertakan huraian mengenai rangkaian
saraf tiruan (ANN) untuk menentukan pencapaian kedua-dua teknik tersebut
berdasarkan kepada jumlah lapisan tersembunyi, kadar latihan dan kadar momentum.
Keputusan telah menunjukkan bahawa pencapaian teknik TFD telah mencecah 90%
manakala teknik MFCC pula hanya 80%. Jadi, teknik TFD merupakan teknik yang
terbaik untuk menganalisa dan mengekstrak ciri-ciri yang ada pada isyarat yang tidak
pegun seperti isyarat degupan jantung.
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF FIGURES xi
LIST OF TABLE xiii
LIST OF ABBREVIATIONS xiv
LIST OF SYMBOLS xvi
1 INTRODUCTION 1
1.1 Project Background 1
1.2 Project Objectives 2
1.3 Scope of Work 2
1.4 Heart Sounds 3
1.5 Time-Frequency Distributions 5
1.6 Mel-Frequency Cepstrum Coefficient 6
1.7 Thesis Outline 8
viii
2 LITERATURE REVIEW 8
2.1 Time-Frequency Representations/Distributions 8
2.2 Cross-terms Elimination (Signal-to-Interference Ratio) 11
2.3 B-Distribution 12
2.4 Singular Value Decomposition 14
2.5 Mel-Frequency Cepstrum Coefficient 14
2.6 Neural Network 16
3 TIME-FREQUENCY DISTRIBUTION TECHNIQUE 18
3.1 Introduction 18
3.1.1 Time-Frequency Transforms 19
3.2 General Signal Representations 20
3.3 General Form of Time-Frequency Distributions 21
3.3.1 Properties of the Cohen Class of Distributions 22
3.3.2 Reduced Interference Distributions 24
3.3.2.1 The B-Distribution 25
3.3.2.2 The B-Distribution Kernel 26
3.3.2.3 The B-Distribution Properties 27
3.4 Singular Value Decomposition 28
3.4.1 Theories 29
3.4.2 Define Items 30
3.4.3 Formula and Criteria 30
3.4.4 Benefits of using SVD 31
4 MEL-FREQUENCY CEPSTRUM COEFFICIENT
TECHNIQUE 33
4.1 Introduction 33
4.1.1 Discrete Fourier Transform 34
4.1.2 Convolution 36
4.2 Short-Term Analysis 37
4.3 Cepstrum 38
4.4 Mel-Frequency Cepstrum Coefficient 40
ix
5 NEURAL NETWORK 43
5.1 Introduction 43
5.2 The Multilayer Perceptron 44
5.3 The Logistic Activation (Sigmoid) Function 45
5.4 The Backpropagation (BP) Algorithm 46
5.4.1 Learning Mode 47
5.4.2 The Generalized Delta Rule (G.D.R) 47
5.5 Summary of the BP Algorithm 49
5.6 Parameters for Neural Network 50
6 METHODOLOGY 52
6.1 Introduction 52
6.2 Methodology 53
6.2.1 Time-Frequency Distribution Technique 54
6.2.2 SVD Implementation 57
6.2.3 Mel-Frequency Cepstrum Coefficient 60
6.2.4 Verification Technique 63
7 RESULTS AND DISCUSSION 65
7.1 Introduction 65
7.2 Time-Frequency Distribution 66
7.2.1 B-Distribution of Normal Heart Sounds 66
7.2.2 B-Distribution of Abnormal Heart Sounds 68
7.2.3 SVD of Normal Heart Sounds 70
7.2.4 SVD of Abnormal Hart Sounds 71
7.3 Mel-Frequency Cepstrum Coefficient 73
7.3.1 MFCC of Normal Heart Sounds 73
7.3.2 MFCC of Abnormal Heart Sounds 75
7.4 Neural Network 77
7.4.1 Verification of TFD 78
7.4.2 Verification of MFCC 80
7.5 Discussion 82
x
8 CONCLUSIONS AND RECOMMENDATION 84
8.1 Conclusions 84
8.2 Recommendation 86
8.2.1 Modified MFCC 87
REFERENCES 89
xi
LIST OF FIGURES
FIGURE NO. TITLE PAGE
1.1 Heart sounds components 3
4.1 Discrete Fourier Transform 35
4.2 Short-term analysis 38
4.3 Example of signal magnitude spectrum 39
4.4 Cepstrum 39
4.5 Computing of Mel Cepstrum 40
4.6 Triangular filter used to compute Mel Cepstrum 41
5.1 A two-layer network 44
5.2 The architecture of a typical MLP network 44
5.3 Sigmoid activation function with different values of c 46
5.4 Configuration for training ANN using BP algorithm 47
5.5 Typical ANN model 48
6.1 Feature extraction techniques and verification 53
6.2 Technique using TFD/SVD 55
6.3 Technique using MFCC 60
6.4 Steps of computing MFCC 61
6.5 Steps of backpropagation algorithm 64
7.1 Normal heart sound; e.g. 1 66
7.2 Normal heart sound; e.g. 2 67
7.3 Abnormal heart sound; e.g. 1 68
7.4 Abnormal heart sound; e.g. 2 68
7.5 Abnormal heart sound; e.g. 3 69
7.6 Abnormal heart sound; e.g. 4 69
xii
7.7 SVD of normal heart sound; e.g.1 70
7.8 SVD of normal heart sound; e.g.2 71
7.9 SVD of abnormal heart sound; e.g.1 71
7.10 SVD of abnormal heart sound; e.g.2 72
7.11 MFCC of normal heart sound; e.g.1 73
7.12 MFCC of normal heart sound; e.g.2 74
7.13 MFCC of normal heart sound; e.g.3 74
7.14 MFCC of abnormal heart sound; e.g.1 75
7.15 MFCC of abnormal heart sound; e.g.2 76
7.16 MFCC of abnormal heart sound; e.g.3 76
7.17 Verification of TFD (number of hidden neurons) 78
7.18 Verification of TFD ( learning rate ) 78
7.19 Verification of TFD ( momentum coefficient ) 79
7.20 Verification of MFCC ( number of hidden neurons ) 80
7.21 Verification of MFCC ( learning rate ) 80
7.22 Verification of MFCC ( momentum coefficient ) 81
8.1 Basic structure of modified MFCC 87
xiii
LIST OF TABLE
TABLE NO. TITLE PAGE
7.1 Performance comparison between the TFD and MFCC 82
xiv
LIST OF ABBREVIATIONS
AF - Ambiguity Function
ANN - Artificial Neural Network
BoF - Bank of filters
BP - Backpropagation
BPN - Backpropagation Neural Network
DCT - Discrete Cosine Transform
DFT - Discrete Fourier Transform
DSP - Digital Signal Processing
FFT - Fast Fourier Transform
G.D.R - Generalized Delta Rule
GMM - Gaussian Mixture Model
IDFT - Inverse Discrete Fourier Transform
IF - Instantaneous Frequency
Im - Imaginary
MFCC - Mel Frequency Cepstrum Coefficient
MLP - Multilayer Perceptron
MLSA - Mel Log Spectrum Approximation
PC - Principal Component
PCA - Pricipal component analysis
PCG - Phono-cardiographic
PE - Processing element
PWVD - Pseudo-Wigner-Ville distribution
Re - Real
RMS - Root Mean Square
SBC - Subband Based Cepstral
xv
SCG - Scaled Conjugate Gradient
SNR - Signal-to-noise ratio
STFT - Short-Time Fourier Transform
SVD - Singular Value Decomposition
SWWVD - Smooth windowed Wigner-Ville distribution
TFA - Time Frequency Analysis
TFD - Time-Frequency Distribution
TFR - Time-Frequency Representation
VQ - Vector Quantization
WVD - Wigner-Ville distribution
WWVD - Windowed Wigner-Ville distribution
xvi
LIST OF SYMBOLS
Az (v,τ) - symmetrical ambiguity function
c - firing angle control
Ck - cosine function
Ei - instantaneous energy
Emin - minimum error
f - frequency
fi - instantaneous frequency
f(netj) - sigmoid function
G(n,m) - discrete-time expression of time-lag kernel
G(t,τ) - time-lag kernel
g(v,τ) - Kernel function
h[i] - number of samples of long system’s impulse response flipped
left-or-right
Hj[k] - transfer function of filter j
Hz - hertz
i,D - number of cepstrum coefficient
j,p - number of filter outputs
J(w) - lower threshold on the sum squared error
k - samples
kHz - kilo Hertz
l - latent variables dimension
m - number of rows of matrix X
Mag[k] - magnitudes notation
Mel(f) - Mel frequency
mj - log band-pass filter output amplitudes
xvii
ms - millisecond
n - number of columns of matrix X
N - number of filters
Ns - number of samples
Nw - window size
Oj - output of neuron j
Ok - output of neuron k
P(f ) - magnitude spectrum
Phase[k] - phase notation
P(M) - Mel spectrum
r - rank of matrix X
Rz (t,τ) - instantaneous autocorrelation function
s - second
S, σj - singular values
S(f) - frequency domain representation of a signal
Sk - sine function
s(t) - time domain representation of a signal
T - iteration number
t, τ - time
Tz - total signal duration
U - left singular vectors
V - right singular vectors
v - frequency variable
Wji (t+1) - adaptation weight between input (i) and hidden (j) layers
Wkj (t+1) - adaptation weight between output (k) and hidden (j) layers
w(n) - Hamming window function
X - dataset in matrix mxn
x[i] - input discrete signal
XT - transpose of matrix X
y[i] - output discrete signal
z(n) - discrete-time expression of analytic signal
z(t) - analytic signal (associated with real)
∆w - adaptation weights
θj - bias weights of neuron j
xviii
θk - bias weights of neuron k
ρ(t,f) - Cohen’s class of distributions
φ, fc, α - constant argument
λj - eigenvalues
δj - error signal through layer j
δk - error signal between the output and hidden layers
Γ(.) - gamma function
τg(f) - group delay
θ(f) - instantaneous phase in frequency domain
ϕ - instantaneous phase in time domain
η - learning rate
θ(M) - log mel spectrum
αm - momentum rate
ℜ{β} - real value of β
ℜ{γ} - real value of γ
β - smoothing parameter
CHAPTER 1
INTRODUCTION
1.1 Project Background
This project is focused on the problem of heart sounds analysis using an
integration of signal processing techniques and artificial neural networks. This
includes feature extraction technique, verification technique and estimation of
performance with related parameters. It has proposed two techniques for feature
extraction analysis. The first technique is emphasizing on Time-Frequency
Distributions (TFD). It used to choose a distribution from a group of bilinear time-
frequency distributions that satisfies the TFD properties. In that case, the B-
distribution was chosen because it satisfied the properties of TFD and it performed
well in reducing the cross-terms. Another technique is using Mel Frequency
Cepstrum Coefficient where the outputs are in terms of cepstrum coefficients. For
verification analysis, both of the techniques mentioned above are further simulating
using neural network and after that the performances were compared between the
both proposed techniques.
2
1.2 Project Objectives
The main objective of this work is to choose the best technique to extract
features of heart sounds signals. This can be achieved by comparing two proposed
techniques; Time-Frequency Distribution (B-distribution) and Mel Frequency
Cepstrum Coefficient. The best technique will be chosen according to the
performance accuracy.
1.3 Scope of Work
Different heart sounds were produced when the cardiac system is not in a
proper manner of working, which will produce the heart irregularities or heart
diseases. A good technique needs to be used to extract the features of heart sounds in
order to detect the diseases. Different features will represent different heart diseases.
This project has proposed two techniques that can be used for feature
extraction of heart sound signals. Both of them are outperformed their own classes
compared to others. The first technique is using Time-Frequency Distribution with
Singular Value Decomposition. The second technique is focusing on the Mel
Frequency Cepstrum Coefficient. The data used to implement both techniques are
taken from Centre of Biomedical in UTM, Skudai. They are actually the heart sound
signals including normal and abnormal signals. The normal heart sounds are taken
from healthy persons and the abnormal heart sounds are taken from patients that are
suffering from various kinds of diseases.
For the first technique, the heart sound signals are transformed into time-
frequency domain using bilinear time-frequency distribution. The transformation is
3
done using B-distribution with some parameters setting and the outputs from that
particular distribtuion are then dimensionality reduced using Singular Value
Decomposition. The results after that simulated further using neural network for
verification and performance analysis. All simulations are done using Matlab. The
second technique is different from the first one because the analysis is done based on
frequency analysis using Mel Frequency Cepstrum Coefficient. The heart sound
signals are extracted using MFCC with mel-frequency scaled. The simulation is
done using Microsoft Visual C++. The outputs of MFCC are actually the cepstrum
coefficients that were going to be simulated further using neural network for
performance analysis. Lastly, the performances accuracies from both techniques are
then compared to each other.
1.4 Heart Sounds
Figure1.1 Heart sound components
The heart sounds are generated by mechanical vibration of heart and
cardiovascular where they provide abundant information about them while the
measurement is noninvasive and low cost. Heart sounds and murmurs are the
important parameter used in diagnosing the heart condition and it can be captured by
using phonocardiogram or heart auscultation. Classically the sounds made by a
healthy heart are conceived as being a nearly periodic signal consisting of four
components. These four parts are referred to as the first, second, third and fourth
4
heart sounds. The first two heart sounds give rise to the familiar ‘lub-dup’ beating
sound of the heart and tend to dominate the Phono-CardioGraphic (PCG) signals.
The first heart sound is caused by the closure of the mitral and tricuspid valves. The
second heart sound is due to the closure of the aortic and pulmonary valves. The
four components of heart sounds are stated below:
1. First Heart Sound;
First heart sound is the effect of closing the tricuspid and mitral valve at the
beginning of ventricle systolic. There are four component of the first heart
sounds [22]:
(i) The first component is the effect of ventricular contraction and blood
movement towards atrio-ventricular valve. This occur at beginning of
ventricle systolic.
(ii) The second component is the effect of atrioventricle clossure.
(iii) The third component is reflected the opening of semilunar valve and
the beginning of blood ejection.
(iv) The fourth component represent the maximum blood ejection from
ventricle to aorta.
2. Second Heart Sound;
Second heart sound represents the vibrations as a result from closure of
semilunar valve at the end of ventricle systolic. Since there are two
component of semilunar valve, the second heart sound is a combination of
two components. The aortic valve closed earlier than the closing of
pulmonary valve.
3. Third Heart Sound;
The third heart sound result from vibration setting by early filling of ventricle
during ventricle diastole.
4. Fourth Heart Sound;
The fourth heart sound is caused by rapid filling of ventricle with blood
during atrium systole. It also marked the end of ventricle diastole.
5
Each beat is separated by an interval of the order of 1s, with each heart sound
having duration of roughly 50ms. The interval between beats varies even in a patient
at rest because of respiration. Similarly the exact nature of each beat varies from
beat to beat. The result is a signal which is non-periodic, even though it has a
repetitive character.
The heart murmurs occur as the additional components in the PCG signal,
most often arising in the interval between the first and second heart sound. Heart
murmurs are the result of turbulent blood flow, which produces a series of many
vibrations. The murmur signal is often of much smaller amplitude than either of the
heart sounds. Many murmurs are described as “whooshing” sounds and are believed
to be derived from flow noise. The heart murmurs will produce the abnormal heart
sounds. There are four main factor of producing murmurs [17]:
(i) High rates of flow through normal and abnormal valves
(ii) Forward flow through a constricted or irregular valve or into dilated vessels.
(iii) Backward flow through an incompetent valve, septal defect, or patient ductus
arteriosus.
(iv) Decreased viscosity, which causes increased turbulent and contributes to the
production and intensity of murmurs.
1.5 Time-Frequency Distributions
Many signals encountered in real-world situations are exhibit some degree of
non-stationarity where the frequency content changes over time. One of the most
common applications is heart sound signals processing. Classical signal analysis
tools, however, do not take this into account, assuming that the signal characteristics
are stationary. A solution to the problem of representing non-stationary signals is
found in their joint time and frequency representations which characterized the exact
6
behavior of the time-varying frequency content of the signal. Time-frequency
analysis methods are capable of detecting heart murmurs and vital information to the
classification of heart sounds and murmurs. Therefore, Time-Frequency Analysis is
used to represent the heart sounds in time-frequency domain by mapping the one-
dimensional time-domain signal into a two-dimensional function of time and
frequency.
The introduction of time-frequency analysis (TFA) has led to define new
tools to represent and characterize the time-varying contents of non-stationary
signals using time-frequency distributions (TFDs) [2, 7, 15], also for removing noise
and interference from a signal. Among the most studied time-frequency distributions
are the quadratic distributions. In this paper, a member of the quadratic class of
TFDs is proposed, referred to as the B-distribution, which can resolve close signals
in the time-frequency domain that other members fail to do so. In addition to that,
the B-distribution is shown to outperform existing reduced interference distributions
in suppressing the cross-terms of a multicomponent signal, while keeping a high
time-frequency resolution. The performance of this technique is depending on the
value of smoothing parameter applied to the signal analysis. This condition is
evaluated using the simulation on the heart sound signals using Matlab.
1.6 Mel-Frequency Cepstrum Coefficient
A representation of heart sounds using Mel-Frequency Cepstrum Coefficient
(MFCC) would be provided by a set of cepstrum coefficients. These coefficients are
the results of a cosine transform of the real logarithm of the short-term energy
spectrum expressed on a mel-frequency scale [32]. The MFCC are also an efficient
method to extract any kind of features [8]. The number of resulting mel-frequency
cepstrum coefficients is practically chosen relatively low, in the order of 12 to 20
coefficients. However, in many cases of MFCC analysis, the 0th coefficient of the
7
MFCC cepstrum is ignored because of its unreliability [24]. In fact, the 0th
coefficient can be regarded as a collection of average energies of each frequency
bands in the signal that is being analyzed. The energy of heart sound signal is also a
very important feature for pattern recognition. Many experiments have shown that
the performance can be improved when the energy information is added as another
model feature in addition to cepstrums.
Mel Frequency Cepstrum Coefficients (MFCC) is also used as a method that
analyzes how the Fourier transform extracts frequency components of a signal in the
time-domain. In addition, it is a representation defined as the real cepstrum of a
windowed short-time signal derived from the Discrete Fourier Transform (DFT) of
that signal. The difference from the real cepstrum is that a non-linear frequency, a
mel-scale is used. The mapping from linear frequency to mel frequency is done
using an equation as follows:
Mel(f) = 2595 log10 (1 + f/700) (1.1)
Basically, the analysis of the signal is done using Frequency Domain Analysis where
it converts a temporal signal to a frequency domain representation. The keywords
involve in this analysis as below:
• Cepstrum: a homomorphic signal processing technique that converts the
signal into a domain in which short-term and long-term variations in the
signal can be separated.
• FourierTransform: implements a variety of techniques for performing Fourier
Transforms, including the most effective fast transforms
• Spectrum: an umbrella class that encapsulates most of the frequency domain
techniques, and provides a uniform interface. This capability is used
extensively in many of our front end implementations.
8
1.7 Thesis Outline
This report has been organized into eight chapters. Chapter 1 outlines the
entire project giving a brief introduction to the time-frequency distribution technique
and Mel Frequency Cepstrum Coefficient technique. Chapter 2 provides the
literature reviews where the common references with some information that related
to the project are collected. Chapter 3 describes the time-frequency technique used
in this project by specifically elaborate the B-Distribution and its kernel. In addition,
a brief description about SVD is also included in this chapter. Chapter 4 is an
explanation about the MFCC principle and the steps involve in getting the MFCC.
Chapter 5 is having a detail explanation about neural network. The important
parameters involve in this chapter is explained further in order to get some ideas of
verification technique used in this project. Chapter 7 presents and explains the
results of signal processing experiments conducted on heart sound data including
normal and abnormal based on time-frequency distribution technique and MFCC
technique. The verification results are also attached to this chapter for performances
comparison. Chapter 8 is the last chapter of this thesis where it concludes this
project and provides suggestions for future recommendations and improvement.
CHAPTER 2
LITERATURE REVIEW
2.1 Time-Frequency Representations/Distributions
Hlawatsch and Boudreaux-Bartels (1992) have studied that Time-frequency
representations (TFRs) are powerful tools for the analysis and processing of “non-
stationary” signals for which separate time-domain and frequency-domain analyses
are not adequate. They have combined time-domain and frequency-domain analyses
to yield a potentially more revealing picture of temporal localization of a signal’s
spectral components. The TFRs are including both linear and quadratic
representations. The found that numerous TFRs which have been proposed may be
interpreted as smoothed versions of the WD, with the type of smoothing determining
the amount of attenuation of interference terms, loss of time-frequency concentration
and mathematical properties. Hence, the choice of the “best” TFR depends on the
nature of the signals to be analyzed. Once a specific TFR has been selected, the user
often has to select certain TFR parameters. Finally, the analysis result will also
depend upon the graphical representation of the TFR surface (e.g. 3D plots versus
contour-line plots).
10
White, Collis and Salmon (1996) have studied the use of time-frequency
methods in the detection and analysis of heart murmurs in Phono-CardioGraphic
(PCG) signals. These heart sounds can yield important diagnostic information. An
abnormality between the heart sounds is generically termed a murmur. Heart sounds
are clearly non-stationary signals and hence the natural analysis methods are those of
time-frequency and time-scale. In this application there is no evidence to suggest
that the analysis technique would benefit from a multi-resolution type analysis, so
they concentrated on time-frequency, rather than time-scale, methods. The method
exploits averaged versions of the Pseudo-Wigner-Ville distribution (PWVD). The
algorithms were shown to detect two types of heart murmurs and to be able to
distinguish between them. The general processing strategy they adopted consists of
initially segmenting the signal into individual beats. This segmentation process can
be performed in a variety of ways; the success of each approach depends critically on
the signal to noise ratio (SNR) of the recording under consideration. They have
presented the results illustrating that time-frequency methods are capable of
detecting heart murmurs and also of yielding information vital to the classification of
such murmurs. The methods used involved averaging of time-frequency plots, which
has intuitive value and can be interpreted in a theoretical fashion exploiting the
concepts of cyclo-stationarity.
Haghighi-Mood, and Torry (1997) have addressed the characteristics of heart
murmurs from signal theory point of view and suggests an appropriate signal
analysis method which is capable of describing the dynamics of heart murmurs. The
result of a pilot study using the proposed method indicates a distinctive pattern for
time-frequency distribution of heart murmurs which is expected to provide
information of diagnostic importance. The choice of time-frequency method is
mainly dictated by the time-bandwidth product of the signal under investigation.
While traditional Time-Frequency methods such as Short Time Fourier Transform
(STFT) and bank of filters (BoF) are known to be robust for signals of large time-
bandwidth product, their performance degrades when used with signals of fast-
varying spectra and short duration. An alternative approach to study such signals is
joint Time-Frequency Distribution (TFDs) methods which, unlike traditional
methods, make no assumption of stationarity at any time interval. For their study, a
11
number of TFD methods belonging to the Cohen class of distributions were applied
to heart murmurs. They found that the Choi-Williams distribution (CWD) with an
exponential kernel also can provide the best compromise between spectro-temporal
resolution and cross-term suppression and therefore the detection of time-frequency
dynamics of heart murmurs. Considering the generation mechanism and the
hydrodynamic models proposed for various types of murmurs, it is logical to predict
that the time-frequency pattern of murmurs contains valuable information which may
lead to non-invasive diagnosis of certain cardiac diseases such as valvular stenosis
and ventricular or atrial septal defect.
Cohen (1989) has presented a review and tutorial of the fundamental ideas
and methods of joint time-frequency distributions. He has introduced the types of
distributions and the method to obtain them. The objective of the field is to describe
how the spectral content of a signal is changing in time, and to develop the physical
and mathematical ideas needed to understand what a time-varying spectrum is. The
basic goal is to devise a distribution that represents the energy or intensity of a signal
simultaneously in time and frequency. The basic idea is to devise a joint function of
time and frequency, a distribution that will describe the energy density or intensity of
a signal simultaneously in time and frequency. This review is presented to be
understandable to the non-specialist with emphasis on the diversity of concepts and
motivations that have gone into the formation of the field.
2.2 Cross-terms Elimination (Signal-to-Interference Ratio)
Daliman and Sha’ameri (2003) have done the analysis of the heart sound and
murmurs using time frequency distribution method. The reason why they were using
time-frequency distribution because the heart sound signals are actually non-
stationary signals and time-varying signals that would be best analyzed in time-
frequency domain. The Windowed Wigner-Ville distribution (WWVD) and smooth
12
windowed Wigner-Ville distribution (SWWVD) have been used to obtain the time-
frequency representation of the signal. Determination of parameter setting of
WWVD and SWWVD will eliminate the cross-terms and improve time-frequency
representation. The accuracy of time-frequency representation is determined based
on the mainlobe width and signal-to-interference ratio. By comparing the two types
of the distributions, they found that the most accurate time-frequency representation
can be achieved using the SWWVD.
Sha’ameri and Salleh (2000) have performed an analysis of heart sounds and
murmur using time-frequency signal analysis. They used the technique using
Wigner-Ville distribution (WVD) and windowed Wigner-Ville distribution
(WWVD) that belonged to the bilinear class of time-frequency distribution. They
developed these techniques to provide high-resolution time-frequency representation
for time-varying signals. Due to the nonlinear operation involved, interference terms
were introduced in the time-frequency representation. The signals of' interest are
modeled as multicomponent signals and the characteristics of the signal in time-lag
plane are observed. From the time-lag plane, the interference components are
identified, and the appropriate window width is selected in the WWVD to remove
the interference. Results analyses showed that WWVD produced more accurate
time-frequency representation compared to the WVD and the signal-to-interference
is used to quantify the improvement. This happened because the WWVD has used
the window function to control the amount of interference present in the time-
frequency representation by removing the cross bilinear product terms.
2.3 B-Distribution
Sucic, Barkat and Boashash (1999) proved that B-distribution can resolve
close signals in the time-frequency domain that other members fail to do so. They
showed that the B-distribution is real, time and frequency shift invariant and its first
13
moment with respect to frequency yields the instantaneous frequency of the signal.
Using synthetic and real-life multicomponent signals, it has been shown that the B-
distribution achieves a better time-frequency resolution and energy concentration
around the instantaneous frequency of a signal, while still significantly suppressing
the cross-terms, than other commonly-chosen distributions for multicomponent
signals analysis. They have reviewed the fundamental concepts of the cross-terms
elimination using the ambiguity domain filtering, based on the B-distribution kernel.
The kernel has been defined in both the time-lag and the Doppler-lag domain and
they have proved that B-Distribution has satisfied most of the desirable properties
sought for a time-frequency distribution.
Boashash and Sucic (2000) have presented two novel results which are
significant for the application of time-frequency signal analysis techniques to real life
signals. First, they introduced a measure for comparing the resolution performance
of TFDs in separating closely spaced components in the time-frequency domain.
The measure takes into account key attributes of TFDs such as main-lobes, side-
lobes, and cross-terms. The introduction of this measure is an improvement of some
techniques that rely on visual inspection of plots. The performance comparison of
quadratic TFDs using the proposed resolution measure shows that the B-distribution
outperforms existing quadratic TFDs in resolving closely spaced components in the
time-frequency domain. The second part consists in proposing a methodology for
designing high resolution quadratic TFDs for the time-frequency analysis of
multicomponent signals when components are close to each other. By removing
limitations in the way desirable properties of quadratic TFDs were previously
chosen, a new set of design criteria has been defined. The combination of these two
results is an important break-through for the field of time-frequency signal analysis.
They have concluded that the B-Distribution outperforms other existing distributions
in terms of time-frequency resolution, as well as cross-terms suppression, when used
to represent signals with closely-spaced components in the time-frequency domain.
14
2.4 Singular Value Decomposition
Wall, Rechtsteiner and Rocha (2003) have described SVD methods for
visualization of gene expression data, representation of the data using a smaller
number of variables, and detection of patterns in noisy gene expression data. In
addition, they also described the precise relation between SVD analysis and Principal
Component Analysis (PCA). Their aimed is actually to provide definitions,
interpretations, examples, and references that will serve as resources for
understanding and extending the application of SVD and PCA to gene expression
analysis. An important capability distinguishing SVD and related methods from
other analysis methods is the ability to detect weak signals in the data. Even when
the structure of the data does not allow separation of data points, causing clustering
algorithms to fail, it may be possible to detect biologically meaningful patterns.
SVD allows obtaining the true dimensionality of the data, which is the rank r of
matrix X and a representation is using a reduced number of variables. This property
of the SVD is commonly referred to as dimensionality reduction.
2.5 Mel-Frequency Cepstrum Coefficient
Garcia and Garcia (2003) have done the experiments to present the
development of an automatic recognition system of infant cry, with the objective to
classify two types of cry: normal and pathological cry from dear babies. In this
study, they used acoustic characteristics obtained by the Mel-Frequency Cepstrum
technique and a feed-forward neural network as a classifier that was trained with
several learning methods, resulting better the Scaled Conjugate Gradient (SCG)
algorithm. The SCG method avoids a time consuming line-search per learning
iteration, which makes the algorithm faster than other second order Conjugate
Gradient algorithms. This work has shown good results, with the MFCC technique,
using neural network architecture.
15
Bahoura and Pelletier, (2004) have proposed a new tool based on the cepstral
analysis as feature extractor and the Gaussian Mixture Model for the classification
process. The Cepstral analysis is proposed with Gaussian Mixture Models (GMM)
method to classify respiratory sounds in two categories: normal and wheezing. The
sound signal is divided in overlapped segments, which are characterized by a reduced
dimension feature vectors using Mel-Frequency Cepstral Coefficients (MFCC) or
Subband based Cepstral parameters (SBC). The proposed schema is compared with
other classifiers: Vector Quantization (VQ) and Multi-Layer Perceptron (MLP)
neural networks. In order to improve the classification results, they also proposed the
postprocessing technique. There are also some explanation about two feature
extractors based on cepstral analysis (MFCC and SBC) are briefly presented in this
work. The best result is obtained by the MFCCGMM combination and note that the
postprocessing improves the classification result for all combinations.
Imai (1983) has presented a new technique of cepstral analysis synthesis on
the mel frequency scale. The log spectrum on the mel frequency scale (the mel log
spectrum) is considered to be an effective representation of the spectral envelope of
speech. This analysis synthesis system uses the mel log spectrum approximation
(MLSA) filter which was devised for the cepstral synthesis on the mel frequency
scale. The filter coefficients are easily obtained through a simple linear transform
from the mel cepstrum that defined as the Fourier cosine coefficients of the mel log
spectral envelope of speech. The MLSA filter has low coefficient sensitivity and
good coefficient quantization characteristics. The spectral distortion caused by
interpolation of the filter parameters of two successive frames is small. Accordingly,
the data rate of this system is very low. The log spectrum on a mel frequency scale
(the mel log spectrum) is considered to be a more effective representation of the
spectral envelope of speech than that on the linear frequency scale. The mel
cepstrum which is defined as the Fourier transform of a spectral envelope of the mel
log spectrum has a comparatively low order, hence it is an efficient parameter. The
Mel cepstrum also has the same good features as those of the conventional cepstrum.
By using the MLSA filter, a low bit rate mel cepstral analysis synthesis system was
16
obtained. In this system, the spectral envelope information is transmitted by the filter
parameter of the MLSA filter. The filter parameter is obtained by a simple linear
transform from the mel cepstrum which is defined as the Fourier cosine coefficients
of the mel log spectral envelope. The filter parameter has almost the same good
statistical properties as those of the mel cepstrum. Since the filter parameter
sensitivity of the mel log spectrum is very small, the filter parameter can be roughly
quantized. The system has fairly small spectral distortions.
2.6 Neural Network
Upadhyaya and Yan (1993) have explained about artificial neural networks
that are developed to simulate the most elementary functions of neurons in the
human brain, based on the present understanding of biological nervous systems.
These network models attempt to achieve good human-like performance such as:
learning from experiments and generalization from previous samples. A processing
element (PE) is analogous to a neuron in that it has many inputs from input signals or
from other PEs and combines (sum up) the values of the inputs, adjusted by their
weights. This sum is then subjected to a nonlinear transformation, often called a
transfer function that controls the output in accordance with the prescribed nonlinear
relationship. Back-propagation neural networks (BPN) were used to develop the
neural network models for artifact classification and defect parameters estimation.
There are several issues that need to be considered when utilizing the back-
propagation algorithm to train a neural network such as the selection of hidden layers
and nodes, and the learning options. The selection of number of hidden layers and
hidden nodes is one of the most important issues in back-propagation network
applications. The selection of hidden nodes for a fully-connected, feedforward
networks with one hidden layer is based on the two types of Rule-of-thumb.The
important algorithms for backpropagation network training are actually the learning
coefficient, momentum term that used to smooth the learning, the nonlinear transfer
function, the learning rule that specifies how connection weights are changed during
17
the learning process and Gaussian noise and RMS threshold value that adds a random
number within a special range to each node summation value in the layer.
CHAPTER 3
TIME-FREQUENCY DISTRIBUTION TECHNIQUE
3.1 Introduction
Time-frequency representations are used to analyze or characterize signals
whose energy distribution varies in time and frequency. They map the one-
dimensional time-domain signal into a two-dimensional function of time and
frequency. A time-frequency representation describes the variation of spectral
energy over time.
Time-frequency analysis provides time-localized spectral information for a
non-stationary signal as a distribution function in terms of time and frequency. Non-
stationary signal in this paper implies a signal which has time-varying frequency
components, not a statistically stationary signal. Due to the uncertainty principle,
which restricts the resolution in time and frequency, the time-frequency distribution
functions have been developed for the purpose of analysis. The basic idea is to
devise a joint function of time and frequency, a distribution, that will describe the
energy density or intensity of a signal simultaneously in time and frequency. In
addition, the method of relating a joint time-frequency distribution to a signal will be
a powerful tool for the construction of signals with desirable properties.
19
3.1.1 Time-Frequency Transforms
The Fourier Transform has become one of the most widely used signal-
analysis tools across many disciplines of science and engineering. The basic idea of
the Fourier Transform is that any arbitrary signal (of time, for instance) can always
be decomposed into a set of sinusoids of different frequencies. The Fourier
transform is generated by the process of projecting the signal onto a set of basis
functions, each of which is a sinusoid with a unique frequency. The resulting
projection values form the Fourier transform (or the frequency spectrum) of the
original signal. Its value at a particular frequency is a measure of the similarity of
the signal to the sinusoidal basis at that frequency. Therefore, the frequency
attributes of the signal can be revealed via the Fourier transform. In many
engineering applications, this has proven to be extremely useful in the
characterization, interpretation, and identification of signals.
While the Fourier transform is a very useful concept for stationary signals,
many signals encountered in real-world situations have frequency contents that
change over time. In this case, it is not always best to use simple sinusoids as basis
functions and characterize a signal by its frequency spectrum. Joint time-frequency
transforms were developed for the purpose of characterizing the time-varying
frequency content of a signal. The best-known time-frequency representation of a
time signal is known as the Short-Time Fourier Transforms (STFT). It is basically a
moving window Fourier transforms. By examining the frequency content of the
signal as the time window is moved, a two-dimensional time-frequency distribution
called the spectrogram is generated. The spectrogram contains information on the
frequency content of the signal at different time instances. One well-known
drawback of the STFT is the resolution limit imposed by the window function. A
shorter time window results in better time resolution, but leads to worse frequency
resolution, and vice versa. To overcome the resolution limit of the STFT, a wealth of
alternative time-frequency representations have been proposed.
20
There are various time-frequency transforms developed by researchers in the
signal processing community. They are broadly divided into two classes: linear
time-frequency transforms and quadratic (or bilinear) transforms. In this project, the
discussion is only concentrated on the quadratic time-frequency transform, that is B-
Distribution.
3.2 General Signal Representations
Heart sounds are actually generated by mechanical vibration of heart and
cardiovascular system. Commonly, vibration signals are represented in either the
time domain or the frequency domain. In general, the time domain representation of
a signal,
( ) ( ) ( ) ( )( )
+ ∫
==
t
i dfjtj etaetats 0
0 2 ττπφϕ (3.1)
allows a simple characterization of a signal in terms of its (instantaneous) energy,
( ) ( ) ( ) ( ) ( )tstststatEi *|| 22=== (3.2)
and instantaneous frequency,
( )( )dt
tdfi
ϕ
π2
1= (3.3)
that is, at time t the signal has an energy density of Ei(t) at a frequency of fi (t). For
multicomponent signals (i.e., signals which have more than one frequency at a given
instance of time) the ‘instantaneous frequency’ is the average frequency of the signal
at that time. The frequency domain representation of a signal,
21
( ) ( ) ( ) ( ) dtetsefAfS ftjfj πθ 2−∫== (3.4)
gives a perfect representation of a signal which consist of multiple harmonic
oscillators (i.e., with no amplitude or frequency modulation). However, it is not an
adequate representation of non-stationary signals (i.e., signals whose content change
with time).
A multicomponent non-stationary signal can be described as the
superposition of a number of monocomponent non-stationary signals, giving
( ) ( ) ( ) ( ) ( )( )
+ ∫
=== ∑∑∑
t
cc
c
dfj
c
ctj
c
c
c
c etaetatsts 0
2 ττπφϕ
(3.5)
In order to decompose (and understand) a signal of the form given, a joint time-
frequency domain representation of the signal is required.
3.3 General Form of Time-Frequency Distributions
In 1966, Cohen developed a generalized form of ‘phase-space’ distributions
from which all other time-frequency energy distributions could be derived. The
general form, which has since become known as Cohen’s class of distributions is
( ) ( ) ( ) τττ
τρ τπdvdudeuzuzvgft
fvtvuj −−
−
+= ∫∫∫
2
2*
2,, (3.6)
where g(ν, τ) is referred to as the kernel function that is used to define the properties
of the distribution and z(t) is the analytic signal associated with the real one. For
example, by setting the kernel g(ν, τ) = 1, equation (3.6) becomes
22
( ) ( ) τττ
ρ τπdvdudeuzuzft
fvtvuj −−
−
+= ∫∫∫
2
2*
2,
τττ τπ
detztzfj2
2*
2
−
−
+= ∫ (3.7)
which is the Wigner-Ville distribution and setting g(ν, τ) = |τ|βcosh
-2β(t) will give the
B-Distribution as shown below:
( )( )
( ) ττττ
ρ τπβ
β
dvdudeuzuzt
ftfvtvuj −−
−
+= ∫∫∫
2
2 2*
2cosh,
( )
ττττ τπ
β
β
dudeuzuzut
fj2
2 2*
2cosh
−
−
+
−= ∫∫ (3.8)
Rz (t,τ) is used to be the instantaneous autocorrelation function of z(t) and it is defined
as:
( )
−
+=
2*
2,
τττ tztztRz (3.9)
The symmetrical (“Sussman’s”) ambiguity function (AF), on the other hand, is
defined as the Fourier transform of Rz (t,τ) with respect to time t:
( ) dtetztzvAvtj
zπττ
τ 2
2*
2, −
−
+= ∫ (3.10)
3.3.1 Properties of the Cohen Class of Distributions
The number of distributions which can be generated from equation (3.6) is
infinite. Cohen proposed a number of restrictions on the properties of the
23
distributions to be studied and showed that certain constraints could be placed on the
kernel function, g(ν,τ) in equation (3.6) to ensure that a distribution meets the
restrictions. The constraints on the kernel function required to give certain desirable
properties have been studied and the range of desirable properties and the kernels
required to meet then is being continuously expended. An extensive range of
properties is given by [5] and [7]. Some of these are:
a) The signal energy is preserved
The signal energy is preserved if g(0,0)=1
b) The marginal condition in time
for integration over frequency=energy density in time, g(ν,0)=1
c) The marginal condition in frequency
for integration over time=energy density in frequency, g(0,τ)=1
d) Real valued distributions
the distribution will be real valued if g(ν,τ)=g* (ν,τ)
e) Invariance to time and frequency shifts
if two signals are identical except for a shift in time or frequency then the
distributions of the signals should also be identical except for a similar shift
in time or frequency. Cohen [7] showed this is true as long as the kernel
function is independent of time and frequency.
f) The first moment of the distribution equal the instantaneous frequency and
group delay
Boashash [5] showed that the first moment of a distribution in frequency is
equal to the instantaneous frequency and the first moment in time equals the
group-delay,
( ) ( )( )df
fdfg
θ
πτ
2
1−= (3.11)
24
( )
( )( )tf
dfft
dfftf
i=
∫∫
,
,
ρ
ρ and
( )
( )( )f
dtft
dtftt
gτρ
ρ=
∫∫
,
, (3.12)
if
( ) ( ),0|
,|
,00 =
∂
∂=
∂
∂== vt
v
tvg
t
tvg (3.13)
g(ν,0) = constant for all ν and g(0,τ) = constant for all τ
g) Recovery of signal
Cohen [7] showed that a signal could be recovered up to a constant phase
factor if kernel function g(ν, τ) is well defined at every point or has isolated
zeros but not regions where it is zero.
h) Finite time support
a distribution has finite time support if it is zero before the signal starts and
zero after the signal ends. For a signal to have finite time support the kernel
must met the condition:
( ) 0, 2 =−∫ dvevg vtj πτ for |τ|<2|t| (3.14)
3.3.2 Reduced Interference Distributions
Some of time-frequency distributions posses a number of mathematically
satisfying properties such as the Wigner-Ville distribution, however, it actually
produces large cross-terms when applied to multicomponent signals. This can make
interpretation difficult. Therefore, other type of distribution is used to overcome this
25
problem such as B-distribution that satisfying the properties as well as reducing
cross-terms.
3.3.2.1 The B-Distribution
The B-distribution outperforms the other distributions by achieving a better
time-frequency resolution than the spectrogram does, and by significantly
suppressing the interference that affects the Wigner-Ville distribution. Using real-
life multicomponent signals, it has been shown that the B-distribution achieves a
better time-frequency resolution and energy concentration around the instantaneous
frequency of a signal, while still significantly suppressing the cross-terms, than other
commonly-chosen distributions for multicomponent signals analysis.
The kernel function of the B-Distribution is chosen in the ambiguity domain
as a 2-dimensional function centered around the origin with sharp cut-off edges. In
this way, the kernel would allow to retain as many auto-terms energy as possible
while filtering out as much cross-terms “energy” as possible. The numbers of auto
terms and cross terms kept and filtered out are functions of the volume underneath
the two-dimensional function g(v,τ). This volume can be changed by varying a
single smoothing parameter, β. β is a real parameter that controls the sharpness of
cutoff of the two-dimensional filter in the ambiguity domain. The choice of its value
is application dependent; however, its range should be between zero and unity (0 < β
≤ 1). In general, it is found that small value of β gives the best results in terms of
cross term suppression and time-frequency resolution, but this value is given only as
an indication and is not optimal for all situations.
26
3.3.2.2 The B-Distribution Kernel
The property of the ambiguity function (AF) to have the auto-terms around
the origin in the ambiguity plane, and the cross-terms away from it, has inspired
searches for a kernel g(v, τ) such that the generalized ambiguity function, g(v, τ) Az(v,
τ), is enhanced in the vicinity of the origin and suppressed everywhere else. Barkat
and Boashash [4] recently proposed a kernel for a new quadratic TFD, known as the
B-distribution. They applied the technique of cross-terms filtering in the ambiguity
domain to the design of a low-pass filter-type kernel with a sharp cut-off, essential to
ensure the clear separation of the auto-terms from the cross-terms. The B-
distribution has its time-lag kernel defined in the following manner [4]:
( )( )
βτ
τ
=
ttG
2cosh, (3.15)
Application dependent parameter β, ( 0 ≤ β ≤1 ), controls the sharpness of the cut-off
of the filter and hence the trade-off between the cross-terms elimination and the time-
frequency resolution. The Doppler-lag kernel of B-distribution is defined as the
Fourier transform of the time-lag kernel with respect to time t:
( )( )
dtet
tvgvtj π
βτ 2
2cosh, −
∫
=
( )
dtt
evtj
∫−
=β
βτ
2
2
cosh (3.16)
Since the kernel is an even and a real-valued function, equation (3.16) can be further
simplified to:
( ) ( )( )
dtt
vttvg ∫
∞
=0
2cosh
2cos2,
β
β πτ (3.17)
Using [12] (3.985.1), the following formula is a possible solution to the integral
which must be calculated in (3.17):
27
( )( ) ( )
−Γ
+Γ
Γ=
−∞
∫ β
αγ
β
αγ
γββ
α γ
γ 2222
2
cosh
cos 2
0
jjdt
t
t (3.18)
for ℜ{β}>0, ℜ{γ}>0 and α>0.
Hence, the equation for the B-distribution kernel in the Doppler-lag domain may be
written in the following form:
( )( )
( ) ( )vjvjvg πβπββ
ττβ
β−Γ+Γ
Γ=
−
2
2,
12
(3.19)
3.3.2.3 The B-Distribution Properties
Ideally, as mentioned before, there are some desirable properties that a TFD
should satisfy [2, 14]. B-distribution has satisfied the following properties:
i) Realness
To show this property we must have [2]:
( ) ( )ττ −−= ,*, vgvg
In this case:
( )( )
( ) ( ) *}2
2{,*
12
vjvjvg πβπββ
ττβ
β+Γ−Γ
Γ−=−−
−
( )( )
( ) ( )vjvjvg πβπββ
ττβ
β−Γ+Γ
Γ=−−
−
2
2,*
12
(3.20)
Hence, we have ( ) ( )ττ −−= ,*, vgvg
ii) Time Shift Invariance
To prove that a distribution is time shift invariant, it is sufficient to note that
g(v,τ) does not depend on t [2]. From equation (3.19), the B-distribution
Doppler-lag kernel is not a function of time t.
28
iii) Frequency Shift Invariance
To prove this property, it is sufficient to note that g(v,τ) does not depend on f
[2]. Similarly, from equation (3.19), we see that the B-distribution Doppler-
lag kernel does not depend on frequency f.
iv) The Instantaneous Frequency
The first moment of the B-distribution yields the IF of the signal. That is,
( )
( )dfft
dfftff
z
z
i,
,
ρ
ρ∫=
This property is satisfied since [14]:
( )( )
( )( ) 0|
,|
,0,00,0 =
∂
∂=
∂
∂
τ
ττ vg
v
vg
and
g(v,τ) = constant ∀τ
3.4 Singular Value Decomposition
Depending on the nature of a given classification problem (the raw data, the
chosen features and the chosen classifier) dimension reduction may be fundamental
to successful classification performance. Dimensionality reduction is almost always
essential when time-frequency representations (TFRs) are used as a feature basis; the
descriptive ability of most TFRs comes at the expense of high dimensionality.
Dimensionality reduction is an essential step involved in many research areas
such as signal processing. It is difficult for researchers to manipulate and process
29
data with very high dimensions, and it is also difficult to visualize the data with more
than three dimensions. To overcome these shortcomings, many dimensionality
reduction methods have been explored, for instances, Principal Component Analysis
(PCA) and Singular Value Decomposition (SVD). To make these methods more
robust and general, the Kernel functions are adopted.
Singular Value Decomposition is an excellent unsupervised learning tool to
process source data in order to find clusters, reduce dimensionality, and find latent
variables. This means that field or experiment data can be converted into a form
which is free of noise or redundancy and that it can be better organized to reveal
hidden information.
3.4.1 Theories
Singular Value Decomposition (SVD) is the mathematical algorithm to
identify the pattern relationship in data. In addition, it is widely adopted to compress
data by reducing the number of dimensions without much loss of information.
The main procedure of SVD is to work out the eigenvalues and eigenvectors
of the covariance matrix of the entire dataset. The singular values in SVD provide
the variabilities of its respective eigenvectors. The larger the singular value is the
higher variability the respective eigenvector contains. Usually the first few
eigenvectors with higher variabilities are selected as significant Principal
Components (PCs) to compress the data into less number of dimensions [31]. Due to
the discarding of the rest feature elements, some information will lose when
recovering the original data.
30
3.4.2 Define Items
Let X represent the dataset, i.e. a m x n matrix. Let r indicates the rank of
matrix X. The equation for singular value decomposition of X is the following:
X = USVT (3.21)
where, U contains the left singular vectors. Each column vector uk in U is a left
singular vector and orthonormal to each other. V contains the right singular vectors.
Each column vector vk in V is a right singular vector and orthonormal to each other.
S is a diagonal matrix, and the non-zero elements on the diagonal are called singular
values. These elements are sorted in descending order, with the highest singular
value in the upper left index of S.
3.4.3 Formula and Criteria
To calculate U, S and V, we define the following functions:
eigValue(X) to get the eigenvalues of matrix X
eigVector(X) to get the eigenvectors of matrix X
We may calculate U, V and the singular value matrix S as followings:
)( TU XXeigValueS = , (3.22)
)( TXXeigVectorU = , (3.23)
( )XXeigValueST
V = (3.24)
( )XXeigVectorV T= (3.25)
31
It will only keep the real and positive singular values Sk in the results of equation
(3.23) and (3.25), and the corresponding uk and vk are kept. We ignore the rest
vectors in U and V. After removing the meaningless singular values in SU and SV, SU
or SV can be reduced to an r x r diagonal matrix. At last, we need to sort the singular
values Sk in descending order for both SU and SV to form SU new
and SV new
, and the
positions of the uk and vk are also changed accordingly. Now, from SUnew
and SV new
,
we may find the following:
S = SU new
= SV new
(3.26)
The singular values Sk indicate the importance of their corresponding uk and vk.
3.4.4 Benefits of using SVD
The principal motivation for dimensionality reduction is that it can help to
alleviate the worst effects of the curse of dimensionality. The main goal is to ensure
that as much of relevant information as possible is preserved in as few dimension as
possible. Other benefits of using Singular Value Decomposition are briefly
described as below:
i) Redundancy Reduction:
The sorted singular values in a Singular Value Decomposition help represent
the original matrix in less rows, by pushing the less significant rows to the
bottom of the orthogonal matrices. The matrix may thus be smoothed by
eliminating those vectors whose respective singular value is equal to, or
approaches zero. These are the vectors which least impact on the original
matrix.
32
ii) Exposing Components:
Singular Value Decomposition results in sorted eigenvectors (components).
These are very useful in analyzing patient encounters and revealing the most
predominant components. These are representative of the procedures which
most significantly describe the typical patient’s treatment for the chosen
patient profile. It is believed that the clusters formed when the source data is
projected in the dimensions of the first two components is indicative of their
constituents.
iii) Understanding Latent Variables:
Latent variables are hidden sources in the original data; they explain the
common traits in patient encounters, but cannot be measured directly. For the
observed data represented by matrix X, in n variables, the latent variables
may be expressed in l dimensions where l << n. If assume that the initial n
variables capture the whole domain, the product of SVD will be a set of
variables which includes the latent ones.
CHAPTER 4
MEL-FREQUENCY CEPSTRUM COEFFICIENT TECHNIQUE
4.1 Introduction
In this paper, the second technique approached is the Mel-Frequency
Cepstrum Coefficient (MFCC) feature. Mel Frequency Cepstrum Coefficients is a
representation defined as the real cepstrum of a windowed short-time signal derived
from the DFT of that signal. The difference from the real cepstrum is that a non-
linear frequency scale called mel-scale, need to be used for approximation. A
compact representation would be provided by a set of cepstrum coefficients that are
the results of a cosine transform of the real logarithm of the short-term energy
spectrum expressed on a mel-frequency scale [32]. The number of resulting mel-
frequency cepstrum coefficients is practically chosen relatively low, in the order of
12 to 20 coefficients. In many applications, the 0th coefficient of the MFCC
cepstrum is ignored because of its unreliability.
34
4.1.1 Discrete Fourier Transform
Fourier transform belongs to the family of linear transforms widely used in
DSP based on decomposing signal into sinusoids (sine and cosine waves). Usually
in DSP we use the Discrete Fourier Transform (DFT), a special kind of Fourier
transform used to deal with aperiodic discrete signals [26]. Actually there are an
infinite number of ways how signal can be decomposed but sinusoids are selected
because of their sinusoidal fidelity that means that sinusoidal input to the linear
system will produce sinusoidal output, only the amplitude and phase may change,
frequency and shape remain the same [26]. Discrete Fourier Transform changes an
Ns point input signal into two Ns /2 + 1 point output signals. The output signals
represent the amplitudes of the sine and cosine components scaled in a special way
that is represented by the equations:
[ ] ( )[ ] ( )sk
sk
NkiS
NkiC
/2sin
/2cos
π
π
⋅⋅=
⋅⋅= (4.1)
where Ck are Ns /2+1 cosine functions and Sk are Ns /2+1 sine functions, index k runs
from zero to Ns /2. These functions are called basis functions. Actually zero samples
in resulting signals are amplitudes for zero frequency waves, first samples for waves
which make one complete cycle in Ns points, second for waves which make two
cycles and so on. Signal represented in such a way is called to be in frequency
domain and obtained coefficients are called spectral coefficients or spectrum.
Frequency domain contains exactly the same information as the time domain and
every discrete signal can be moved back to the time domain, using operation called
Inverse Discrete Fourier Transform (IDFT). Because of this fact, the DFT is also
called Forward DFT [26]. Schematically DFT is represented in Figure 4.1.
35
Figure 4.1 Discrete Fourier Transform
The amplitudes for cosine waves are also called real part (denoted as Re[k])
and for sine waves are called imaginary part (denoted as Im[k]). This representation
of frequency domain is called rectangular notation. Alternatively, the frequency
domain can be expressed in the polar notation. In this form, real and imaginary parts
are replaced by magnitudes (denoted as Mag[k]) and phases (denoted as Phase[k])
respectively [26]. The equations for conversion from rectangular notation to the
polar notation are as follows:
[ ] [ ] [ ]( )
[ ] [ ][ ]
=
+=
k
kkPhase
xkkMag
Re
Imarctan
ImRe22
(4.2)
There are two main reasons why DFT became so popular in DSP. First is Fast
Fourier Transform (FFT) algorithm [26], developed by Cooley and Tukey in 1965,
which opened a new era in DSP because of the efficiency of the FFT algorithm. The
second reason is the convolution theorem [26], which states that convolution in time
domain is a multiplication in frequency domain and vice versa. This makes possible
to use high-speed convolution algorithm, which convolves two signals by passing
them through the Fast Fourier Transform, multiplying and using Inverse Fourier
Transform computing convolved signal.
N Samples
Input Signal
Inverse DFT
Time Domain
N s /2 +1 Samples
Sine Wave Amplitudes
N s /2 +1 Samples
Cosine Wave Amplitudes Forward DFT
Frequency Domain
Ns Samples
Input Signal
Inverse DFT
Time Domain
36
4.1.2 Convolution
An impulse is a signal composed of all zeros except one non-zero point.
Every signal can be decomposed into a group of impulses, each of them then passed
through a linear system and the resulting output components are synthesized or
added together [26]. The resulting signal is exactly the same as obtained by passing
the original signal through the system. Every impulse can be represented as a shifted
and scaled delta function, which is a normalized impulse, that is, sample number zero
has a value of one and all other samples have a value of zero. When the delta
function is passed through a linear system, its output is called impulse response. If
two systems are different they will have different impulse responses. According to
the properties of linear systems every impulse passed through it will result in the
scaled and shifted impulse response and scaling and shifting of the input are identical
to the scaling and shifting of the output [23, 26]. It means that knowing systems
impulse response will know everything about the system [9, 23, 26].
Convolution is a formal mathematical operation, which is used to describe
relationship between three signals of interest: input and output signals, and the
impulse response of the system. It is usually said that the output signal is the input
signal convolved with the system’s impulse response. Mathematical equation of
convolution for discrete signals is represented in the following (convolution is
denoted as a star):
[ ] [ ] [ ] [ ] [ ]jixjhihixiyM
j
−=∗= ∑−
=
1
0
(4.3)
where y[i] is the output discrete signal, x[i] is the input discrete signal and h[i] is M
samples long system’s impulse response flipped left-for-right. Index i goes through
the size of the output signal. Mathematics behind the convolution does not restrict
how long the impulse response is. It only says that the size of the output signal is the
size of the input signal plus the size of the impulse response minus one. Convolution
37
is very important concept in DSP. Based on the properties of linear systems it
provides the way of combining two signals to form a third signal [9, 26, 23].
4.2 Short-Term Analysis
Because of its nature, the heart sound signal is a time-varying signal or non-
stationary. It means that when it is examined over a sufficiently short period of time
(20-30 milliseconds) it has quite stable characteristics. It leads to the useful concept
of using an analysis called “short-term analysis”, where only a portion of the signal is
used to extract signal features at one time [9]. It works in the following way:
predefined length window (usually 20-30 milliseconds) is moved along the signal
with an overlapping (usually 30-50% of the window length) between the adjacent
frames. Overlapping is needed to avoid losing of information. Parts of the signal
formed in such a way are called frames. In order to prevent an abrupt change at the
end points of the frame, it is usually multiplied by a window function. The operation
of dividing signal into short intervals is called windowing and such segments are
called windowed frames (or sometime just frames). There are several window
functions can be used, but the most popular is Hamming window function, which is
described by the following equation:
( )
−−=
1
2cos46.054.0
wN
nnw
π (4.4)
where Nw is the size of the window or frame. A set of features extracted from one
frame is called feature vector. Overall overview of the short-term analysis approach
is represented in Figure 4.2.
38
Figure 4.2 Short-term analysis
4.3 Cepstrum
The cepstrum is representation of the signal normally having two components
that are resolved into two additive parts [9]. It is computed by taking the inverse
DFT of the logarithm of the magnitude spectrum of the frame. This is represented in
the following equation:
( ) ( )( )( )frameDFTIDFTframecepstrum log= (4.5)
Some explanation of the algorithm is therefore needed. Process of moving to the
frequency domain provides the operation changing from the convolution to the
multiplication. Then by taking logarithm, the operation is moving from the
39
multiplication to the addition. That is desired division into additive components.
Then it can be applied linear operator inverse DFT, knowing that the transform will
operate individually on these two parts and knowing what Fourier transform will do
with each part. The example of the process is as below:
Figure 4.3 Example of signal magnitude spectrum
From Figure 4.3, note that the signal magnitude spectrum is combined from slow and
quickly varying parts. But there is still one problem: multiplication is not a linear
operation. This can be solved by taking logarithm from the multiplication as
described earlier. Finally, the result of the inverse DFT is shown in Figure 4.4 [9]
where the two components are clearly distinctive now as the results of cepstrum
process.
Figure 4.4 Cepstrum
40
4.4 Mel-Frequency Cepstrum Coefficient
Mel-frequency cepstrum coefficient (MFCC) is also an example of feature
used to describe the non-stationary signal. It is based on the known evidence that the
information carried by low-frequency components of a signal is giving more
information rather than carried by high-frequency components [9]. The technique
used to compute MFCC is based on the short-term analysis where the signal is
divided into short time interval by windowing that produces the windowed frames,
and thus from each frame a MFCC vector is computed. MFCC extraction is similar
to the cepstrum calculation except that one special step is inserted, namely the
frequency axis is warped according to the mel-scale. Summing up, the process of
extracting MFCC is illustrated in Figure 4.5.
Figure 4.5 Computing of Mel Cepstrum
As described above, to place more emphasize on the low frequencies, one
special step before inverse DFT in calculation of cepstrum is inserted, namely mel-
scaling or mel-frequency warping. A “mel” is a unit of special measure or scale of
perceived pitch of a tone [9] and it is used to transform the power spectrum into Mel-
Frequency spectrum. It does not correspond linearly to the normal frequency, indeed
it is approximately linear below 1 kHz and logarithmic above [9]. The mapping from
linear frequency to Mel frequency is defined as:
Windowing DFT
Mel-
Frequency
Warping
Log Inverse
DFT
Input
Signals Windowed Frames
Magnitude
Spectrum Mel
Spectrum Log Mel
Spectrum MFCC
Input
Signals
41
( ) ( )700/1log2595 10 ffMel += (4.6)
One useful way to create mel-spectrum is to use a filter bank, one filter for each
desired mel-frequency component. The filter banks are linearly spaced in mel
frequencies. Every filter in this bank has triangular bandpass frequency response.
Such filters compute the average spectrum around each center frequency with
increasing bandwidths, as displayed in Figure 4.6. The triangular filters are actually
multiplied with the magnitude of the frame and log energy is computed (the resulted
warped power spectrum is then convolved with the triangular bandpass filter).
Figure 4.6 Triangular filter used to compute Mel Cepstrum
This filter bank is applied in frequency domain and therefore, it simply
amounts to taking these triangular filters on the spectrum. In practice the last step of
taking inverse DFT is replaced by taking discrete cosine transform (DCT) for
computational efficiency. The expression for computing the MFCC using the discrete
cosine transform is given as follows.
( ) ( )( )NjimNMFCCN
j
ji /5.0cos/21
−= ∑=
π (4.7)
42
where N is the number of filters, mj is the log band-pass filter output amplitudes and i
is number of cepstrum coefficients. The number of resulting mel-frequency
cepstrum coefficients is practically chosen relatively low, in the order of 12 to 20
coefficients. The zeroth coefficient is usually dropped out because it represents the
average log-energy of the frame and carries only a little information.
CHAPTER 5
NEURAL NETWORK
5.1 Introduction
Artificial Neural Networks (ANN) is composed of basic units somewhat
analogous to neurons. These units are linked to each other by connections whose
strength is modifiable as a result of a learning process algorithm. The most
commonly used of ANN architectures is the multilayer perceptron (MLP). In turn,
the learning algorithm which is almost always used to train the MLP is the
backpropagation algorithm, which is a stochastic approximation of the steepest
descent algorithm. When perceptrons are combined together in layers, it is more
common to use the logistic sigmoid function. A multilayer neural network with one
layer of hidden neurons is shown in Figure 5.1 (also called a two-layer network).
44
Figure 5.1 A two-layer network
5.2 The Multilayer Perceptron
The capabilities of single perceptrons however, are limited to linear decision
boundaries and are suitable only for problems requiring a simple linear
dichotomization of the pattern space. However, many problems require a nonlinear
partitioning of the pattern space. This can be achieved using a multilayer perceptron
network, which cascades two or more layers of perceptrons together, making it
possible to partition the pattern space with arbitrarily complex decision boundaries.
The individual perceptrons in the network are called neurons or nodes and usually
employ a logistic sigmoid function. A typical MLP network architecture is depicted
in below:
Figure 5.2 The architecture of a typical MLP network
45
The input vector feeds in to the first layer nodes; the outputs of this layer feed
into each of the second layer nodes and so on. Often, the nodes are fully connected
between layers. The terminology used here will not include the input vector as a
layer, so that the example network in the figure above is referred to as a three-layer
network. The multiple nodes in the output layer correspond to multiple classes in
pattern recognition problem. Many algorithms have been developed which adapt the
network weights so as to provide a suitable map between the set of input vectors and
the set of desired responses. In general, an algorithm is either supervised, in which
the desired response is available during the learning phase, or unsupervised, in which
clusters are formed from the input patterns. Normally, the supervised learning and
the backpropagation (BP) algorithm will be used to train the MLP.
5.3 The Logistic Activation (Sigmoid) Function
In many ANN applications, the activation functions play an important role as
in the early years, they are used to fire or unfire the neuron. However, in new neural
network paradigms, the activation functions are more sophisticatedly used.
Nowadays, many activation functions have been used to produce a continuous value
rather than discrete. One of the most popular activation functions used is the logistic
activation function or more popularly referred to as the sigmoid function. This
function is semilinear in characteristic, differentiable and produces a value between 0
and 1. The mathematical expression of this sigmoid function is:
( )jcnetj
enetf
−+
=1
1 (5.1)
where c controls the firing angle of the sigmoid. When c is large, the sigmoid
becomes like a threshold function and when c is small, the sigmoid becomes more
like a straight line (linear). When c is large learning is much faster but a lot of
information is lost, however when c is small, learning is very slow but information is
46
retained. Because this function is differentiable, it enables the BP algorithm to adapt
the lower layers of weights in a multilayer neural network. An example below shows
that the sigmoid activation function with different values of c.
Figure 5.3 Sigmoid activation function with different values of c
5.4 The Backpropagation (BP) Algorithm
The BP algorithm is perhaps the most popular and widely used neural
paradigm. The BP algorithm is based on the generalized delta rule. The BP
algorithm overcomes the limitations of the perceptron algorithm.
47
5.4.1 Learning Mode
Before the BP can be used, it requires target patterns or signals as it a
supervised learning algorithm. Training patterns are obtained from the samples of
the types of inputs to be given to the multilayer neural network and their answers are
identified by the researcher. The configuration for training a neural network using
the BP algorithm is shown in the figure below in which the training is done offline.
The objective is to minimize the error between the target and actual output and to
find ∆w. The error is calculated at every iteration and is backpropagated through the
layers of the ANN to adapt the weights. The weights are adapted such that the error
is minimized. Once the error has reached a justified minimum value, the training is
stopped, and the neural network is reconfigured in the recall mode to solve the task.
Figure 5.4 Configuration for training ANN using BP algorithm
5.4.2 The Generalized Delta Rule (G.D.R.)
The objective of the BP algorithm, like in other learning algorithms, is to find
the next value of the adaptation weights (∆w) which is also known as the
Generalized Delta Rule (G.D.R). Consider the following ANN model:
48
Figure 5.5 Typical ANN model
What is need is to obtain the following algorithm to adapt the weights between the
output (k) and hidden (j) layers:
( ) ( )TWOTW kjjkkj ∆+=+∆ αηδ1 G.D.R (5.2)
where the weights are adapted as follows:
( ) ( ) ( )11 +∆+=+ TWTWTW kjkjkj (5.3)
where T is the iteration number and δk is the error signal between the output and
hidden layers:
( )( )kkkkk OTOO −−= 1δ (5.4)
The adaptation between input (i) and hidden (j) layers as below:
( ) ( )TWOTW jijjji ∆+=+∆ αηδ1 G.D.R (5.5)
Then the new weight is thus become:
( ) ( ) ( )11 +∆+=+ TWTWTW jijiji (5.6)
49
where the error signal through layer j is:
( ) kj
k
kjjj WOO ∑−= δδ 1 (5.7)
Note that
( )
ji
j
jij
netjj
OWnet
enetfO
j
θ+=
+==
∑
−1
1
(5.8)
5.5 Summary of the BP Algorithm
The adaptation of the weights between output (k) and hidden (j) layers:
( ) ( ) ( )11 +∆+=+ TWTWTW kjkjkj
where ( ) ( )TWOTW kjjkkj ∆++∆ αηδ1
( )( )kkkkk OTOO −−= 1δ
And between the hidden (j) and input (i) layers:
( ) ( ) ( )11 +∆+=+ TWjiTWTW jiji
where ( ) ( )TWOtW jijjji ∆+=+∆ αηδ1
( )∑−=k
kjkjjj WOO δδ 1
50
Note that:
( )
( )k
j
netkk
kj
k
kjk
netjj
ji
j
jij
enetfO
OWnet
enetfO
OWnet
−
−
+=
+=
+=
+=
∑
∑
1
1
1
1
θ
θ
5.6 Parameters for Neural Network
Learning Rate: The learning rates can be uniform throughout the network,
or different for each layer or node. In general, it is difficult to determine the best
learning rate, but a useful rule of thumb is to make the learning rate for each node
inversely proportional to the average magnitude of vector feeding node. Many
schemes which adapt the learning rate as a function of the local curvature of the error
surface have been proposed [18]. The simplest approach and one that works quite
well in practice, is to add a momentum term of the form αm(wℓ,j,i(k) - wℓ,j,i(k-1)) to
each weight update, where 0< αm <1. This term makes the current search direction
an exponentially weighted average of past directions and helps keep the weights
moving across flat portions of the error surface after they have descended from steep
portion.
Hidden Layer Nodes: The optimum number of hidden layer nodes is
difficult to establish and is strongly dependent on the nature of the data. It is not
likely that this issue will be resolved for the general case, since each problem
demands different capabilities of the network. Having said this, choosing the proper
51
network size is important. If the network is too small, it will not capable of forming
a good model of the system. Conversely, if the network is too large, then it may be
too capable. That is, the network may be able to implement numerous solutions that
are consistent with the training data, but most of which are likely to be poor
approximations to the actual problem.
With little or no knowledge of the underlying details of the data, however,
one must determine the network size by trial and error. A methodical approach is
recommended. One may start with a very small network and gradually increase the
size until performance tends to level off, training each network independently.
Insofar as determining an upper bound on the number of hidden layer nodes, this
number should be much less than the number of training samples or the network will
simply “memorize” the training samples, resulting in poor generalization.
Stopping Criteria: The iterative process of computing the gradient and
adjusting the weights is continued until a minimum is found in the error surface (or a
point determined to be sufficiently close). Several measures are candidates for
stopping criteria. If the magnitude of the gradient falls below a chosen level, the
algorithm may be terminated, as this may indicate that the minimum is being
approached. Perhaps a more common stopping criterion is a lower threshold on the
sum squared error, J(w). This requires knowledge (or a decent estimate) of the
minimum value of J(w), which is not always available. One might consider stopping
when a chosen number of iterations have been performed.
Finally, the method of cross-validation can be used to monitor the
generalization performance during learning. This method entails splitting the data
into a training set and a test set. During learning, the performance of the network
upon the training set can only improve but the performance on the test data will only
improve to some point, beyond which it will start to degrade. It is at this point that
the network starts to overfit the data, and the algorithm should be terminated.
CHAPTER 6
METHODOLOGY
6.1 Introduction
This chapter provides an investigation of time-frequency technique together
with dimensionality reduction strategies and also Mel-frequency cepstrum coefficient
technique that are used to generate features for classification of the heart sound
signals. The performance of each case; feature extraction/dimensionality reduction
combination using TFD/SVD and feature extraction using MFCC will be examined
with respect to their statistical properties of the Neural Network parameters or
backpropagation algorithms. The feature sets under TFD are based upon time-
frequency domain features and the bilinear time-frequency transform that is the B-
distribution. While, the feature sets under MFCC are based on frequency domain
features and discrete cosine transform. The relative performance of each technique
will be compared, yielding a prescription of the best overall technique to extract
features of heart sound signals.
53
6.2 Methodology
Figure 6.1 Feature extraction techniques and verification
The intention of this work is to specify the most effective means of features
extraction technique of the heart sound signals. Figure 6.1 shows the related
procedure to achieve that objective proposed. This chapter will examine the
performance of the techniques approached based upon time-frequency domain; B-
Distribution and frequency domain; Mel-Frequency Cepstrum Coefficient. At the
OUTPUT:
TIME-FREQUENCY DOMAIN
B-DISTRIBUTION
VERIFICATION:
NEURAL NETWORK
OUTPUT: CEPSTRUM
COEFFICIENT
DATA:
HEART SOUNDS
SVD
MFCC
OUTPUT:
MFCC
PERFORMANCE
OUTPUT:
TFD
PERFORMANCE
54
final stage of this task, the neural network with multilayer perceptron network will be
employed. To complete all that tasks, it would require a number of significant data
that can demonstrate the extracted features techniques and relative performance
measure. There are two types of data used in this study:
i) Normal Heart Sounds:
These data were collected from the normal person where their heart
conduction system is in a well condition.
ii) Abnormal Heart Sounds:
Data are comprised of various irregularities of heart sound signals. They
were taken from the persons/patients that were suffering from various kinds
of diseases.
6.2.1 Time-Frequency Distribution Technique
Time-frequency representations (TFRs) combine time-domain and frequency
domain analyses to yield a potentially more revealing picture of the temporal
localization of a signal’s spectral characteristics. Time-frequency techniques are
broadly classified into two categories: Linear transforms and Quadratic (or bilinear)
transforms. As heart sound being a non-stationary signal of a relatively small time-
bandwidth product, classical methods or linear transforms methods for time-
frequency analysis, such as Short Time Fourier Transform is not capable of revealing
the time-frequency dynamics of heart sounds. Several bilinear transforms techniques
have been investigated to look at their suitability at identifying features in heart
sound signals. However, B-distribution has been chosen for this analysis technique
because it outperforms other distributions techniques. The steps involve in TFD
technique as below:
55
Figure 6.2 Technique using TFD/SVD
The general form of bilinear class of time-frequency distribution can be
defined as in equation below:
( ) ( ) ( ) τττ
τρ τπdvdudeuzuzvgft
fvtvuj −−
−
+= ∫∫∫
2
2*
2,, (6.1)
where g(ν, τ) is referred to as the kernel function and is used to define the properties
of the distribution and z(t) is the analytic signal associated with the real one. The
type of time-frequency distribution is actually determined by the choice of this kernel
function. The B-distribution can be derived using the types of time-lag kernel as in
equation (6.2).
Time-Frequency Domain
TFSA:
-B-Distribution
SVD
DATA:
Heart Sounds
-Normal
-Abnormal
VERIFICATION:
-Neural network
56
( )( )
βτ
τ
=
ttG
2cosh, (6.2)
Application dependent parameter β, ( 0 ≤ β ≤1 ), controls the sharpness of the cut-off
of the filter and hence the trade-off between the cross-terms elimination and the time-
frequency resolution. The Doppler-lag kernel is defined as the Fourier transform of
the time-lag kernel with respect to time t:
( ) ( ) dtetGvg vtj πττ 2,, −∫= (6.3)
( )
( ) ( )vjvj πβπββ
τβ
β−Γ+Γ
Γ=
−
2
2 12
(6.4)
Since in real-life analysis the practitioner is often required to deal with
discrete-time signals, therefore it has considered a discrete-time formulation of the
proposed TFD. Using (6.3) in the general formula (6.1), the proposed TFD can be
written as
( ) ( ) τττ
τρ τπdudeuzuztuGft
fj2
2*
2,, −
−
+−= ∫∫ (6.5)
The discrete-time version of the above expression is given by
( ) ( ) mfjM
Mm
M
Mp
em
pnzm
pnzmpGfnπρ 2
2*
2,, −
−= −=
−+
++= ∑ ∑ (6.6)
Here, the discrete-time expressions G(n,m) and z(n) are obtained by sampling G(t,τ)
and z(t) at a frequency fs = 1/T such that t = n⋅T and τ = m⋅T. The total signal
duration is Tz, with N being the total number of samples. For simplicity and without
loss of generality, assumed that T = 1. To derive the previous expression, the
commutative property of the convolution operation is needed to be used. An
interpolation of the analytic signal z(n) in (6.6) is necessary in order to obtain non-
57
integer values of the signal arguments. However, this interpolation can be avoided by
using a similar formulation as the one in [34], namely
( ) ( ) ( ) ( ) mfjM
Mm
M
Mp
empnzmpnzmpGfnπρ 2*,, −
−= −=
−+++= ∑ ∑ (6.7)
The resulting TFD in (6.7) is alias-free and periodic in f with a period equal to unity
[34]. Note that (2M+1) is the length of the rectangular window considered in the
implementation. From observation of the previous expression, the implementation
procedure of the proposed TFD requires three steps:
1) Formation of the quadratic quantity Kz(n,m) = z(n+m)⋅z*(n-m);
2) Discrete convolution in time n of Kz(n,m) with G(n,m);
3) Discrete Fourier transform (DFT), with respect to m, of the previous result.
6.2.2 SVD Implementation
In addition to an appropriate process, it is often essential that dimensionality
reduction be performed on the TFR, so that the information is sufficiently compact
for presentation to a classifier such as ANN. Although some dimensionality
reduction may occur as a result of transforming a signal into TFR, in general it does
not. Whereas, the goal of the TFR is to localize and concentrate the signal’s
information, it is the role of dimensionality reduction to optimally select or combine
the TFR coefficients. The main goal of SVD technique is to ensure that as much of
the relevant information as possible is preserved in as few dimensions as possible.
SVD techniques deal with singular matrices, or ones which are very close to
being singular. They are an extension of eigen decomposition to suit non-square
58
matrices. Any matrix may be decomposed into a set of characteristic eigenvector
pairs called the component factors, and their associated eigenvalues called the
singular values. The SVD equation for an (m x n) singular matrix X is:
X = USVT
(6.8)
where U is an (m x n) orthogonal matrix, V is an (n x n) orthogonal matrix and S is
an (m x n) diagonal matrix containing the singular values of X arranged in decreasing
order of magnitude. A vector in an orthogonal matrix can be expressed as a linear
combination of the other vectors. The equation for Singular Value Decomposition
needs to be further explained by showing the significance of the eigenvectors and the
singular values. Each of the orthogonal matrices is achieved by multiplying the
original (m x n) data matrix X by its transpose, once as XXT to get U, and once as X
TX
to get VT. The data is then available in square matrices, on which eigen
decomposition may be applied. This involves the repeated multiplication of a matrix
by itself, forcing its strongest feature to predominate. In SVD, this is done on both
the orthogonal matrices. The pairs of eigenvectors are the rows in U and the
columns in V. These are sorted in order of their strength; the respective singular
value acts as the required numerical indicator as further explained below.
The relationship between the singular values and the co-efficient matrix’s
eigenvalues:
XTX = (USV
T) T
(USVT) (6.9)
XTX = VS
TU
TUSV
T (6.10)
XTX = VS
TSV
T (6.11)
XTX = VS
2V
T (6.12)
S2 contains the eigenvalues of X
TX. Similarly, it may be shown that they are also the
eigenvalues of XXT. They may be arranged in decreasing order of magnitude:
λ1 ≥ λ2 ≥ λ3 ≥…….. ≥ λj ≥ 0 (6.13)
59
Singular values of a matrix X are defined as the square root of the corresponding
eigenvalues of the matrix XTX:
jj λσ = (6.14)
As with the eigenvalues, they are sorted according to their magnitude, and the
corresponding eigenvectors in the orthogonal matrices U and V follow the same
order. The SVD equation (6.8) takes the form:
X =
Some singular values are equal or very close to zero. A singular value σj describes
the importance of uj and vj , and when its value approaches zero it indicates that these
associated vectors are less significant.
To implement SVD using Matlab software, the SVD function was used to
process the dataset where:
[U,S,V] = svd(dataset);
This function produces a diagonal matrix S, of the same dimension as X and with
nonnegative diagonal elements in decreasing order, and unitary matrices U and V.
The first 2 or 3 columns matrix V were selected because these eigenvectors really
contain the most of the information among the samples.
60
6.2.3 Mel-Frequency Cepstrum Coefficient
Figure 6.3 Technique using MFCC
Figure 6.3 shows the method to extract features and to find the performance
accuracy with respect to the MFCC technique. The output to the MFCC is a signal
representation provided by a set of mel-frequency cepstrum coefficients. These
cepstrum coefficients are the result of a cosine transform of the real logarithm of the
short-time energy spectrum expressed on a Mel-frequency scale. In MFCC, the main
advantage is that it uses Mel-frequency scaling or Mel-frequency warping where the
high frequencies of the input signals are compressed since it is believed that the low
frequencies contain more useful information. The steps to compute MFCC are:
CHANNELS:
Heart Sounds
-Normal
-Abnormal
Cepstrum coefficients
EXTRACT
FEATURES:
MFCC
VERIFICATION:
-Neural network
61
Figure 6.4 Steps of computing MFCC
Preprocessing mainly includes framing, windowing and pre-emphasis. These
procedures are also known as the short-term analysis where the signals are divided
into short interval and they are examined over a short period of time, normally 20-30
milliseconds. Framing is used to cut the long-time signal to the short-time signal in
order to get relative stable frequency characteristics. Windowing is mainly to reduce
the aliasing effect, when cut the long signal to a short-time signal in frequency
domain. The windowing function that normally used is hamming window. Discrete
Fourier Transform (DFT) is used to convert the signal from time-domain to
frequency-domain. Therefore the analysis is based on the frequency-domain analysis
because some of the heart sound information is mainly depend on the frequency
characteristics of the signal, although the connection between them is complex.
The skill of frequency scaling is used to map linear frequency into Mel-
frequency scale which is linear below 1 kHz, and logarithmic above. Mel-scale
frequency analysis can be approximated by equation:
( ) ( )700/1log2595 10 ffMel += (6.15)
MFCC
Preprocessing:
Windowing
DFT
Mel-frequency
scaling/warping
Log Cepstrum:
Inverse DFT or Discrete
Cosine Transform
Windowed Frames
Mel
Spectrum Log Mel
Spectrum
Input
Signals
Magnitude
Spectrum
62
where f is the frequency based in hertz. Based on this assumption, the mel-frequency
cepstrum coefficient is proposed. The MFCC can be computed by the following
steps:
(1) The discrete Fourier transform (DFT) transforms the windowed signal
segment into the frequency domain. The real and imaginary components of the
short-term signal spectrum are squared and added to get the short-term power
spectrum P(f ) or magnitude spectrum.
[ ] [ ] NkjN
n
a enxkX/2
1
0
π−−
=∑= , 0 ≤ k ≤ N (6.16)
(2) The spectrum P(f ) is warped along its frequency axis f into the Mel-
frequency axis as P(M) where M is the Mel-frequency.
(3) Each resulted warped power spectrum is then convolved with the triangular
band-pass filter, P(M) into θ(M) and the results accumulated.
[ ] [ ]kHkXm j
N
k
aj
21
0
∑−
=
= 0 ≤ j ≤ p (6.17)
where Hj[k] is the transfer function of filter j.
(4) The MFCC is then the discrete cosine transform of the p filter outputs
computed as in Equation (6.18). Because the MFCC calculation compresses the
signal components into the lower dimensions, we often choose D<<N.
( ) ( )( )NjimNMFCCN
j
ji /5.0cos/21
−= ∑=
π , i=1…D (6.18)
63
6.2.4 Verification Technique
The focus of this work is not only upon the features extraction stage but also
upon the verification technique, where there are some important observations to be
made with respect to the neural algorithms. The representative network chosen is the
multilayer perceptron network because it is the simplest, most studied and most
widely-used of all neural network classifiers. The design of the signal verification
should be based on the training set. The separate test set can be employed to obtain
an unbiased estimate of the performance of the system thus designed. In order to
select design parameters with the intent of maximizing performance, the training set
must held back to form a validation set.
The learning algorithm which is almost always used to train the MLP is the
backpropagation algorithm, which is a stochastic approximation of the steepest
descent algorithm. The activation function is used to fire or unfire the neurons. One
of the most popular activation functions used is the logistic sigmoid function. For
better results, some of the neural network parameters are need to be taken into
consideration such as the learning rate, momentum term, hidden layer nodes and last
but not least the stopping criteria.
The learning rate and momentum coefficient must be set using the values
range between zeros to unity. The optimum number of hidden layer nodes is difficult
to establish and it is strongly dependent on the nature of the data. This can be test
out using trial and error method and the number should be much less than the number
of training sample. However, it should not be too small because it will not capable
of forming a good model system. Furthermore, it must not too large because it will
become too capable; numerous solutions that are consistent with the training data but
most likely to be poor approximations to the actual problem. The stopping criteria is
applied for computing the gradient and adjusting the weights until a minimum is
found in the error surface and it should be very minimum to achieve better
64
performance. The steps of backpropagation (BP) algorithm in the neural network
verification technique are as in figure 6.5:
Figure 6.5 Steps of backpropagation algorithm
NO
YES
Obtain a set of training patterns
Set up neural network model:
e.g. No. of Input neurons, Hidden neurons,
and Output Neurons.
Set learning rate η and momentum rate αm
Initialize all connection Wji, Wkj and bias
weights θj θk to random values
Set minimum error, Emin
Start training by applying input patterns
one at a time and propagate through
the layers then calculate total error.
Backpropagate error through hidden and
input layer and adapt weights
Backpropagate error through output and
hidden layer and adapt weights
Check if Error < Emin
Stop training
CHAPTER 7
RESULTS AND DISCUSSION
7.1 Introduction
The simulation of heart sound signals are done using some normal signals
from normal people and abnormal signals from some patients that suffer from
various kinds of diseases. The simulation is first tested using a technique known as
time-frequency distribution that involved B-Distribution technique to transform the
signals from time domain into time-frequency domain. After that, it is continued by
feature extraction procedure using Singular Value Decomposition. The second part
is using another technique which is known as the Mel-Frequency Cepstrum
Coefficient. It also used transformation technique but the signals are only
transformed from time domain into frequency domain. After features extraction, the
results are further simulating using neural network for verification. The most
important results from neural network simulations are the performances efficiency
contributed by both techniques. All simulations results from each of steps involved
are shown in Figure 7.1 until Figure 7.32.
66
7.2 Time-Frequency Distribution
Figure 7.1 until figure 7.12 show the results of the normal and abnormal heart
sounds in time-frequency domain. The raw data signals are transformed according to
bilinear transformation using B-distribution with smoothing parameter, β=0.01.
Simulations have shown that β = 0.01 gives visually most appealing results for
various multicomponent signals. It was found that such defined B-distribution
outperforms other existing distributions in terms of time-frequency resolution, as
well as cross-terms suppression, when used to represent signals with closely-spaced
components in the time-frequency domain.
7.2.1 B-Distribution of Normal Heart Sounds
Figure 7.1 Normal heart sound; e.g.1
67
Figure 7.2 Normal heart sound; e.g.2
In these examples (Figure 7.1 and Figure 7.2), the heart sound signals of
different length, N but same sampling frequency, fs (4000 Hz) are analyzed using B-
distribution with smoothing parameter, β = 0.01. These examples are resulted from
analysis of normal heart sounds taken from different healthy people. The rationale
behind the choice of this particular distribution is that it constitutes the two extremes
of the quadratic class in the sense of time-frequency resolution and cross-term
suppression that is, B-distribution is chosen because of its very high resolution in the
time-frequency domain in addition to maximum cross term reduction or suppression
between the closely-spaced components.
68
7.2.2 B-Distribution of Abnormal Heart Sounds
Figure 7.3 Abnormal heart sound; e.g.1
Figure 7.4 Abnormal heart sound; e.g.2
69
Figure 7.5 Abnormal heart sound; e.g.3
Figure 7.6 Abnormal heart sound; e.g.4
70
In these examples (Figure 7.3 until Figure 7.6), the heart sound signals of
different length, N with the same sampling frequency, fs (4000 Hz) are analyzed
using B-distribution with smoothing parameter, β = 0.01. These examples are
resulted from analysis of abnormal heart sounds taken from unhealthy people that
suffered from various kinds of diseases. Figure 7.3 is an example of the abnormality
due to the mitral regurgitation, Figure 7.4 is happened due to the pericardiatis, Figure
7.5 is caused by aortic regurgitation and Figure 7.6 is caused by hypertention. Their
patterns are slightly different from the normal heart sound in terms of their
components where they have some additional components in the signal. These extra
components are actually the abnormalities of the heart sounds or sometimes called
the murmurs.
7.2.3 SVD of Normal Heart Sounds
Figure 7.7 SVD of normal heart sound; e.g.1
71
Figure 7.8 SVD of normal heart sound; e.g.2
7.2.4 SVD of Abnormal Heart Sounds
Figure 7.9 SVD of abnormal heart sound; e.g.1
72
Figure 7.10 SVD of abnormal heart sound; e.g.2
Figure 7.7 until 7.10 illustrate plots that are derived from applying SVD to
normal and abnormal heart sounds data set. To perform the SVD, the data is pre-
processed by transforming into time-frequency domain using B-distribution. SVD is
always essential when time-frequency representations are used as a feature basis
because most of TFRs comes at the expense of high dimensionality. The plots in all
figures are the results from the first two columns of matrix V. These are because the
first two eigenvectors really contain the most of the information among the samples
as well as capture the similarity among the samples from the same class. The plots
reveal interesting patterns in the data. A pattern in the first two eigenvectors
normally shows the pattern with cyclic structure. Neither eigenvector is exactly like
the underlying sine or exponential pattern; each such pattern, however, is closely
approximated by a linear combination of the eigenvectors to form biologically
meaningful signals. The figures show that the plots of the abnormal heart sounds do
not give the smooth curve lines as compared to the normal plots. This eigenvectors
are dominated by high-frequency components that can only come from the abnormal
sounds or murmurs produced by the heart of unhealthy people.
73
7.3 Mel-Frequency Cepstrum Coefficient
Figure 7.11 until figure 7.16 show the Mel-Frequency Cepstrum Coefficient
of normal and abnormal heart sounds. The mel-frequency scale is understood as
linear frequency spacing below 1,000 Hz and a logarithmic spacing above 1,000 Hz,
so filters spaced linearly at low frequencies and logarithmically at high frequencies
have been used to capture the important characteristics of heart sound signals. Here
is the common used formula to approximately reflex the relation between the mel-
frequency and the physical frequency:
(7.1)
7.3.1 MFCC of Normal Heart Sounds
Figure 7.11 MFCC of normal heart sound; e.g.1
( )
+=
7001log2595 10
ffMel
74
Figure 7.12 MFCC of normal heart sound; e.g.2
Figure 7.13 MFCC of normal heart sound; e.g.3
Figure 7.11 until Figure 7.13 show the results of the Mel-Frequency cepstrum
coefficients technique that applied to the normal heart sound. In this experiment, it is
used feature vectors composed from 12 lowest Mel-frequency cepstral coefficients
75
(MFCC) computed using about 27 mel-spaced filters. However, these figures only
display the cepstrum coefficient 1 for viewing. The 0th coefficient was excluded
because it carries a little of signal specific information. Analysis frame was
windowed by Hamming window with some overlapping.
7.3.2 MFCC of Abnormal Heart Sounds
Figure 7.14 MFCC of abnormal heart sound; e.g.1
76
Figure 7.15 MFCC of abnormal heart sound; e.g.2
Figure 7.16 MFCC of abnormal heart sound; e.g.3
Figure 7.14 until Figure 7.16 are the results analysis of the abnormal heart
sound using MFCC. These figures only stated the cepstrum coefficient 1 for
viewing. In this analysis, the hearts sound signals are analyzed to extract the more
77
important features including in time domain. The heart sound wave is also filtered to
eliminate irrelevant or undesired information. Before transforming the heart sound
signals into its power spectrum, it must be divided into overlapping blocks. In this
work, the blocks consist in Hamming windows. Figure 7.14 is an example of the
abnormality due to mitral regurgitation, Figure 7.15 is happened due to aortic
regurgitation and Figure 7.16 is caused by ventricular septal defect and mitral
regurgitation.
7.4 Neural Network
Figure 7.17 until Figure 7.22 show the verification results of normal and
abnormal heart sounds after extracted features using TFD and MFCC. The minimum
error, Emin must be as small as possible. Therefore, this neural network simulation
technique has used the Emin=0.01 for better performances.
78
7.4.1 Verification of TFD
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40 45 50 55
Hidden Neurons
Perf
orm
an
ce (%
)
Figure 7.17 Verification of TFD ( number of hidden neurons )
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Learning Rate(lr)
Pe
rfo
rma
nc
e (
%)
Figure 7.18 Verification of TFD ( learning rate )
79
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Momentum Coefficient (mc)
Perf
orm
ance (%
)
Figure 7.19 Verification of TFD ( momentum coefficient )
Figure 7.17 until Figure 7.19 shows the results of verification
technique using TFD. They showed that the highest performance they can achieved
is up to 90% at hidden nodes = 20, learning rate = 0.05 and momentum coefficient =
0.1. Using too large or too small value of hidden nodes will give less performance
accuracy. The learning rate and momentum coefficient values should lies in between
zero and unity but the values cannot be too large because they will give a poor
performance. If the learning rate is set too high, the algorithm may oscillate and
become unstable. If the learning rate is too small, the algorithm will take too long to
converge. At each epoch, new weights and biases are calculated using the current
learning rate and momentum coefficient.
80
7.4.2 Verification of MFCC
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50 55
Hidden Neurons
Pe
rfo
rman
ce (
%)
Figure 7.20 Verification of MFCC ( number of hidden neurons )
0
10
20
30
40
50
60
70
80
90
0 0.05 0.1 0.15 0.2 0.25
Learning Rate (lr)
Perf
orm
an
ce (%
)
Figure 7.21 Verification of MFCC ( learning rate )
81
0
10
20
30
40
50
60
70
80
90
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Momentum Coefficient (mc)
Perf
orm
ance (%
)
Figure 7.22 Verification of MFCC ( momentum coefficient )
Figure 7.20 until Figure 7.22 shows the results of verification technique using
MFCC. They showed that the highest performance they can achieved is only up to
80% at hidden nodes = 20, learning rate = 0.05 and momentum coefficient = 0.1.
Using too large or too small value of hidden nodes will also give less performance
accuracy. The learning rate and momentum coefficient values should lies in between
zero and unity but the values also cannot be too large because they will affect the
performance.
82
7.5 Discussion
Table 7.1: Performance comparison between the TFD and MFCC
TFD
(B-DISTRIBUTION)
MFCC
INPUT TRAINING
295x50
345x50
INPUT TESTING
295x50
345x50
HIDDEN LAYER
(NEURONS)
20
20
LEARNING RATE (LR)
0.05
0.05
MOMENTUM
COEFFICIENT (MC)
0.1
0.1
PERFORMANCE
90.0%
80.0%
Performances of both techniques are depending on the specification of hidden
layer, learning rate and also momentum coefficient. Hidden layer will give the
number of hidden neurons for the network that gathers its weighted inputs and bias to
form its own scalar output. Note that it is common for the number of inputs to a
layer to be different from the number of neurons. A layer is not constrained to have
the number of its inputs equal to the number of its neurons. However, the number of
hidden neurons should not be exceeded the number of input. Furthermore, for a
better performance, hidden neurons should not be too high or too low. If such cases
happened, then the performances will slowly decreasing.
83
Learning rate is another factor that affected the efficiency of the techniques
used. The performances of both techniques are very sensitive to the proper setting of
the learning rate. Learning rate values normally in the range between 0 and 1.
However, if the learning rate is set too high, the algorithm of the extracted features
may oscillate and become unstable. If the learning rate is too small, the algorithm
will take too long to converge but still can achieve the minimum error, Emin. It is not
practical to determine the optimal setting for the learning rate before training, and, in
fact, the optimal learning rate changes during the training process, as the algorithm
moves across the performance surface. The performance of the algorithm can be
improved by allow the learning rate to change during the training process. The
learning rate chosen in these experiments are the optimum values for faster
convergence and for achieving the target.
Momentum coefficient is used to overcome the network stuck at the local
minimum of the error. Momentum allows a network to respond not only to the local
gradient, but also to recent trends in the error surface. Acting like a low-pass filter,
momentum allows the network to ignore small features in the error surface. Without
momentum a network may get stuck in a shallow local minimum. With momentum a
network can slide through such a minimum. The momentum coefficient, mc, can be
any number between 0 and 1. Both techniques proposed in this project had the
optimum mc values that satisfy the overall performance efficiency. The most
important thing is, never set the mc too high to get a better performance.
CHAPTER 8
CONCLUSIONS AND RECOMMENDATION
8.1 Conclusions
Heart sounds and murmurs are time-varying signals where they exhibit some
degree of non-stationary. Both techniques approached in this project are suitable for
analyzing and extracting the features of these non-stationary signals. This project is
actually proposed B-Distribution to perform the Time-Frequency Distribution
technique in order to resolve signal’s components in the time-frequency domain.
Another proposed technique is the Mel-Frequency Cepstrum Coefficient to obtain
the cepstrums coefficients that were resolving signal’s components in the frequency
domain. The important reason of choosing both mentioned techniques in this project
is because they are outperformed their own classes compared to others for example,
B-distribution is outperformed the other quadratic TFDs for signals with components
closely-spaced in the time-frequency plane. Other important points of using B-
Distribution are because it achieves a better time-frequency resolution and a better
energy concentration around the instantaneous frequency of the signal and still
significantly suppresses the cross-terms. This distribution is also satisfies most of the
desirable properties sought for a time-frequency distribution. The reasons of using
MFCC as the feature extraction technique are because it is good in error reduction
during the analyzing of the signals, it also helps to produce more robust features
85
when the input is affected by the noises or interferences and another advantage is it
performs more precise characterizing that can improve the performance.
The time-frequency algorithms can be divided into two categories which are
linear and bilinear (quadratic). Even though the linear time-frequency distributions
(TFD) map a signal between the time domain and the time-frequency domain, but the
bilinear transforms are even more powerful and useful for real-time applications
rather than the linear TFD. Therefore, in this paper, the revision about fundamental
concepts of the bilinear time-frequency distribution with the reducing interferences
distribution based on the B-distribution kernel was proposed. The kernel has been
defined in both the time-lag and the Doppler-lag domain, and some of important
properties that it satisfies have been presented. In addition, the singular value
decomposition (SVD) was introduced as a method to identify the pattern relationship
in data and to compress the data by reducing the number of dimensions without
much loss of information. The SVD was also introduced as a method which directly
decomposes a non-square matrix. This enables decomposition of a TFD without the
need for covariance representation. This has the advantage of increasing the
dynamic range of the decomposition. In addition to the advantage of rectangular
decomposition, the SVD also provides a characterization of the vector spaces
outlined above. This enables us to identify features for the row-space and column
space of a TFD as well as enabling us to ignore the null-space components.
Furthermore, the revision about fundamental principles of the second technique
(Mel-Frequency Cepstrum Coefficient) is presented in this paper with briefly
explained the related processing steps which leaded to this technique. It starts from
the basic definitions used in DSP which related to the processing part and then
followed by the explanations of feature extraction steps with a type of feature named,
MFCC. MFCC vector is a representation provided by a set of cepstrum coefficients
that are the results of a cosine transform of the real logarithm of the short-term
energy spectrum expressed on a mel-frequency scale. It actually has the ability to
capture the important characteristics of the signals. The difference from the real
cepstrum is that a non-linear frequency scale, called mel-scale, is used for
86
approximation. The cepstrum is one homomorphic transformation that allows the
separation of the source from the filter. The cepstrum of a signal segment can be
computed by windowing the signal.
In conclusion, this project has developed a work for investigating the
performances of the above two techniques by considering artificial neural network
analysis. It compares the performance accuracy of TFD method and MFCC method
with regard to the features extraction of heart sound signals. The performance result
obtained using MFCC technique was slightly lower than the TFD. This may be
attributed to the features used. Whereby, TFD is an approach that widely applied in
heart sounds analysis and resulted in high performance accuracy. Therefore, the
TFD technique is still chosen as the best technique to analyze and to extract features
of the non-stationary signals such as the heart sounds signals.
8.2 Recommendation
Basically, the MFCC technique gives slightly lower performance rather than
the TFD technique. Noticed that if the MFCC is being modified, its may be able to
perform well and increase the performance accuracy. Therefore, this modification is
recommended for future work. The steps involve in modified MFCC technique will
be further define in the next section.
87
Input signal
8.2.1 Modified MFCC
The method described here is based on both of MFCC and the masking effect
to modify MFCC in order to improve robustness as well as performance accuracy of
signals recognition system. The basic structure includes Preprocessing and DFT,
Masking Threshold, Decimation, Mel-frequency scale, Cepstrum. It’s shown in
figure 8.1.
Figure 8.1 Basic structure of modified MFCC
Modified MFCC
Preprocessing:
Windowing
DFT
Masking threshold
Decimation
Mel-frequency scaling
Log
Cepstrum
88
The differences between Modified-MFCC and original MFCC are the part of
Masking-Threshold and Decimation. The basic idea of these two parts is:
1) After the step DFT, the peak value of spectrum in each critical band is
calculated. According to the empirical formula of noise masking tone (Here is
assume that the peak value is tone), the masking threshold of the peak value is
determined. The final masking threshold would consider both of the masking
threshold and absolute threshold of hearing. The larger one between them is selected
for modifying spectrum.
2) After getting the threshold, the spectrum of signal is compared with the value
of threshold. The spectrum whose value is less than threshold is deleted. And the
new spectrum is obtained after deleting.
Finally, the same method used in MFCC is used to get the feature of the
signal. It throws away the useless spectrums. The effect of noise that is under the
threshold is ignored and then the robustness of system is improved.
89
REFERENCES
1. Abdi, H. (1994). A Neural Network Primer. Journal of Biological Systems.
2(3): 247-283.
2. Boashash, B. (1992). Time Frequency Signal Analysis: Methods and
Applications. Melbourne, Australia: Longman Cheshire.
3. Boashash, B. and Sucic, V. (2000). A Resolution Performance Measure for
Quadratic Time-Frequency Distributions. Proceedings of the Tenth IEEE
Workshop on Statistical Signal and Array Processing. August 14 -16. 584 -588.
4. Barkat B. and Boashash B. (2001). A High-Resolution Quadratic Time–
Frequency Distribution for Multicomponent Signals Analysis. IEEE
Transactions on Signal Processing. 49(10). October 2001. 2232-2239.
5. Boashash, B. (2003). Time Frequency Signal Analysis and Processing: A
Comprehensive Reference. Oxford, UK: Elsevier.
6. Bahoura, M. and Pelletier, C. (2004). Respiratory Sounds Classification using
Cepstral Analysis and Gaussian Mixture Models. Proceedings of the 26th
Annual International Conference of the IEEE EMBS. September 1-5. San
Francisco:IEEE. 9-12.
7. Cohen L. (1989). Time-Frequency Distributions: A Review. Proceedings of the
IEEE. 77(7). July 1989. 941-981.
8. Davis, S. B. and Mermelstein, P. (1980) Comparison of Parametric
Representations for Monosyllabic Word Recognition in Continuously Spoken
Sentences. IEEE Transactions on Acoustics, Speech and Signal Processing.
28(4). August 1980. 357-366.
9. Deller, J. R., Hansen, J. H. L. and Proakis, J. G. (2000). Discrete-Time
Processing of Speech Signals. Wiley-IEEE Press.
90
10. Daliman, S. and Sha’ameri, A. Z. (2003). Time-Frequency Analysis of Heart
Sounds using Windowed and Smooth Windowed Wigner-Ville Distribution.
Proceedings Seventh International Symposium on Signal Processing and its
Application. 2. July 1-4. 625-626.
11. Daliman, S. and Sha’ameri, A. Z. (2003). Time Frequency Analysis of Heart
Sounds and Murmurs. Proceedings of the 2003 Joint Conference of the Fourth
International Conference on Information, Communications and Signal
Processing 2003 and The Fourth Pasific Rim Conference on Multimedia.
December 15-18. Singapore:IEEE. 840-843.
12. Gradshteyn, I. S. and Ryzhik, I. M. (1980). Tables of Integrals, Series and
Products. Academic Press.
13. Garcia, J. O and Garcia, C. A. R. (2003). Mel-Frequency Cepstrum
Coefficients Extraction from Infant Cry for Classification of Normal and
Pathological Cry with Feed-Forward Neural Networks. Proceedings of the
International Joint Conference on Neural Network. July 20-24. 3140-3145.
14. Haykin, S. (1991). Advances in Spectrum Analysis and Array Processing.
Prentice Hall. 418-517.
15. Hlawatsch, F. and Boudreaux-Bartels, G. F. (1992). Linear and Quadratic
Time-Frequency Signal Representations. IEEE Signal Processing Magazine.
9(2). April 1992. 21-67.
16. Imai S. (1983). Cepstral Analysis Synthesis on the Mel Frequency Scale. IEEE
International Conference on ICASSP ’83, Acoustics, Speech and Signal
Conference. April 1983. 93-96.
17. Jozef Wartak, M.D. (1972). Phonocardiology: Integrated Study of Heart Sound
and Murmurs, Harper and Row Publisher.
18. Jacobs, R. A. (1988). Increased Rates of Convergence Through Learning Rate
Adaptation, Neural Network. 1(4). 295-308.
19. Lin, Z. (1997). An Introduction to Time-Frequency Signal Analysis.
EmeraldFulltext. 17 (1): 46-53.
20. Molau, S., Pitz, M., Schluter, R. and Ney, H. (2001). Computing Mel-
Frequency Cepstral Coefficients on the Power Spectrum. Proceedings (ICASSP
’01) 2001 IEEE International Conference on Acoustics, Speech and Signal
Processing. May 7-11. 73-76.
91
21. Mak, B. (2002). A Mathematical Relationship between Full-Band and
Multiband Mel-Frequency Cepstral Coefficients. IEEE Signal Processing
Letters. 9(8). August 2002. 241- 244.
22. Malarvili, MB. (2003). Heart Sound Segmentation Based on Instantaneous
Energy of Electrocardiogram. Computers in Cardiology. 327-330.
23. Proakis, J. G. and Manolakis, D. G. (1992). Digital Signal Processing,
Principles, Algorithms, and Applications. New York. Macmillan Publishing
Company.
24. Picone, J. W. (1993). Signal Modeling Techniques in Speech Recognition.
Proceedings of the IEEE. 81(9). September 1993. 1215-1247.
25. Sucic, V., Barkat B. and Boashash, B. (1999). Performance Evaluation of the
B-Distribution. Fifth International Symposium on Signal Processing and its
Applications. August 22-25. Brisbane, Australia: IEEE, 267-270.
26. Smith, S. W. (1999). The scientist and Engineer’s Guide to Digital
SignalProcessing. California Technical Publishing. http://www.dspguide.com
27. Sha’ameri, A. Z. and Salleh, S. H. S. (2000). Window Width Estimation and
the Application of the Windowed Wigner-Ville Distribution in the Analysis of
Heart Sounds and Murmurs. Proceedings of the TENCON 2000. September 24
-27. 114 -119.
28. Upadhyaya, B. R. and Yan, W. Hybrid Digital Signal Processing and Neural
Networks for Automated Diagnostics using NDE Methods. Master Thesis.
University of Tennessee. 1993.
29. Wood, J. C., Buda, A. J. and Barry D. T. (1992). Time-Frequency Transforms:
A New Approach to First Heart Sound Frequency Dynamics. IEEE
Transactions on Biomedical Engineering. 39(7). July 1992. 730-740.
30. White, P. R., Collis, W. B. and Salmon, A. P. (1996). Analysing Heart
Murmurs using Time-frequency Methods. Proceedings of the IEEE-SP
International Symposium on Time-Frequency and Time-Scale Analysis. June
18-21. 385-388.
31. Wall, M. E., Rechtsteiner, A. and Rocha, L. M. (2003). Singular Value
Decomposition and Principal Component Analysis. in D. P. Berrar, W.
Dubitzky, and M. Granzow, A Practical Approach to Microarray Data
Analysis, Kluwer: Norwell. 91-109.
92
32. Zheng, F. and Zhang G. (2000). Integrating the Energy Information into
MFCC. International Conference on Spoken Language Processing. October
16-20. Beijing, China: IEEE . I-389-292.
33. Zhang, L., Marron, J. S., Shen, H. and Zhu, Z. (2005). Singular Value
Decomposition and its Visualization. 1-19.
34. Jeong, J. and Williams, W. J. (1992). Alias-free generalized discrete-time time-
frequency distributions. IEEE Transactions on Signal Processing, 40(11).
November 1992. 2757–2765.