icassp 20041 a voice activity detector using the chi-square test beena ahmed1 and w. harvey holmes2...
TRANSCRIPT
![Page 1: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/1.jpg)
ICASSP 2004 1
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Beena Ahmed1 and W. Harvey Holmes2
1RMIT University, Melbourne, VIC 3001, Australia
2University of New South Wales, Sydney, NSW 2052, Australia
Reporter : Chen, Hung-Bin
![Page 2: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/2.jpg)
ICASSP 2004 2
Outline
• Introduction– This paper proposes
• Testing data– speech plus noise
• The Chi-Square test
• Noise estimation using the Chi-Square test
• Block diagram of the proposed VAD
• Experiments and results
• Summary and Conclusions
![Page 3: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/3.jpg)
ICASSP 2004 3
Introduction - VAD
• significance– Speech recognition systems designed for real world conditions, a
robust discrimination of speech from other sounds is a crucial step.
• advantage– Speech discrimination can also be used for coding or
telecommunication applications.
• proposed system– a feature set inspired by investigations of various stages of the auditory
system
![Page 4: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/4.jpg)
ICASSP 2004 4
This paper proposes
• This paper proposes a voice activity detector (VAD) that makes the speech/noise classification by applying the statistical chi-square test to each frame.
• It also uses a continuous update of the background noise estimate.
• If the chi-square test determines that they are close, the frame is declared to be noise, otherwise speech.
( but not mean that is a real speech data )
![Page 5: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/5.jpg)
ICASSP 2004 5
Testing data
• There are two ways of interpreting noise model:– 1. The ‘additive noise’ point of view;
i.e. the speech signal is corrupted by additive noise– 2. The ‘additive signal’ point of view;
i.e. the residual noise signal has speech added to it.
• Noise model in this paper, the speech signal s(t) is corrupted by uncorrelated additive noise w(t), giving the degraded composite signal y(t):– y(t) = s (t) + w(t) .
![Page 6: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/6.jpg)
ICASSP 2004 6
The Chi-Square test
• In this paper, the chi-square test has been applied to the two problems of noise estimation and voice activity detection.
• The chi-square test seeks to determine if there is a good fit between the frequencies of the noise observed data and the frequencies of the expected or theoretical data.
hypothesis is proposed for each frame:
1,,0 ˆ: pkpk wxH
pkpkpk swxH ,1,,1 ˆ:
:noise-only frame
:noise-pluse-speech frame
![Page 7: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/7.jpg)
ICASSP 2004 7
Noise estimation using the Chi-Square test
• To estimate the noise with improved spectral sensitivity, the input signal is first passed through a bandpass filterbank H1, …, HM, as shown in Figure.
• Each of these sub-banded signals is then divided into time frames of size L, where M<<L.
![Page 8: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/8.jpg)
ICASSP 2004 8
Noise estimation using the Chi-Square test
• Let , be the signal in sub-band k and frame p. • The samples in the noise vector , will be given by
• The noise estimates , from the previous frame, p-1, are first grouped into N bins.
• The vector o of observation is obtained in the same way from the current signal , , using the same N bins.
pkx ,
)(),...,1( ,,, Lwwx pkpkpk pkx ,
1, pkx
Ni eeee ,...,,...,1
Ni oooo ,...,,...,1
pkx ,
1i
K(i) (L) Wk,p
ei
![Page 9: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/9.jpg)
ICASSP 2004 9
Noise estimation using the Chi-Square test
• The chi-square test is then applied to these bins, where the chi-square statistic is given by
• The hypothesis test can thus be written as
• effectively– Background noise is usually assumed to have a Gaussian distribution.
– Speech is (relatively) non-stationary and is non-Gaussian.
– The chi-square test can be used to effectively identify the noise-only segments of the signal.
N
i i
ii
e
eo
1
22
1
02
H
Hthresthold
![Page 10: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/10.jpg)
ICASSP 2004 10
simple one-pole smoothing filter λ
• if the current frame is determined to be a noise-only frame similar to the current noise estimate, the pdf of the noise is updated using a simple one-pole smoothing filter with smoothing coefficientλ.
• Testing found that a smoothing coefficient of 0.95 gave the best results.
![Page 11: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/11.jpg)
ICASSP 2004 11
Block diagram of the proposed VAD
• The proposed VAD consists of three main components:– 1. Chi-square based noise estimator (as in Section 3),– 2. Noise reduction system, and– 3. Chi-square based decision module.– A block diagram of the complete system is shown in Figure
![Page 12: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/12.jpg)
ICASSP 2004 12
Implementation Details
• An analysis frame size of 256 samples was used, with a step size of 64 samples.
• The optimum band gains were estimated from the spectrally decomposed noisy signal using the Ephraim and Malah gain function with a smoothing parameter of 0.98.
• Noise estimates were obtained from the chi-square noise estimator described in Section 3.
![Page 13: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/13.jpg)
ICASSP 2004 13
Implementation Details
• The VAD decision was made using another chisquare detector on the synthesised noise-reduced signal.
• In it a frame size of 15 ms, i.e. 125 samples, was used with an overlap of 25 samples at a sampling frequency of 8192 Hz.
• The signal was divided into 8 equal sub-bands using a digital IIR filterbank of elliptic bandpass filters of order 10, with bandwidths of 487.5 Hz.
• The histogram of the current frame was calculated and divided into 7 classes (or bins)(ei).
• The first three frames were recursively averaged to provide the starting value of the noise vector.
![Page 14: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/14.jpg)
ICASSP 2004 14
Implementation Details
• The current input frame was combined with seven previous frames, giving frame sizes of 122 ms for noise estimation.
• The observed and expected distributions in each sub-band were divided into 7 classes (ei) and tested using the chi-square test.
• Only those frames in which the null hypothesis was accepted for all 8 sub-bands were declared to be noise.
• The noise estimate was then updated using a single-sided one-pole recursive filter.
• Testing found that a smoothing coefficient of 0.95 gave the best results.
![Page 15: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/15.jpg)
ICASSP 2004 15
Testing data
• The VAD was applied to 12 sentences with added babble, car, pink, and white noises at SNRs of 0, 5, and 10 dB.
• The chi-square detector manages to pick up short bursts of speech that are only a few hundred samples (20 ms) long.
![Page 16: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/16.jpg)
ICASSP 2004 16
Experimental Results
![Page 17: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/17.jpg)
ICASSP 2004 17
Experimental Results
![Page 18: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/18.jpg)
ICASSP 2004 18
ReferenceVAD proposed by Sohn et al. [5]
• The developed VAD employ the decision directed parameter estimation method for the likelihood ratio test.
![Page 19: ICASSP 20041 A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST Beena Ahmed1 and W. Harvey Holmes2 1RMIT University, Melbourne, VIC 3001, Australia 2University](https://reader035.vdocuments.site/reader035/viewer/2022072011/56649e1a5503460f94b082b5/html5/thumbnails/19.jpg)
ICASSP 2004 19
Summary and Conclusions
• The proposed VAD shows better performances in various environmental conditions, while requires only a few parameters to optimize when compared with the others.
• Applications such as – automatic classification
– segmentation of animal sounds
– an efficient encoding of speech and music