SIBILANT SPEECH DETECTION IN NOISEBY: HOSEIN BITARAFSUPERVISOR: DR. NASERSHARIF
INTRODUCTION
Sibilant speech is aperiodic. the fricatives /s/, /ʃ/, /z/ and /Ʒ/ and the
affricatives /tʃ/ and /dƷ/ we present a sibilant detection algorithm
robust to high levels of noise
Gaussian for noisy speech signal
Xk,i = power K = frequency i = time-frame µk,i = mean power
PSD for /ʃ/
Log-likelihood
µk,N1 = µk,N2 = ak
µk,S = ak + bk
Maximizing the log-likelihood
74% of sibilant within 60 and 130 ms. |t| < 30 ms high probability sibilant |t| > 65 ms high probability outside the
sibilant. reduces contribution of the transition region 30 ms < |t| < 65 ms
Maximizing the log-likelihood
Maximizing the log-likelihood
Maximizing the log-likelihood
Estimate noise and siblant
Estimated sibilant mean power
Maximum filter
W = 30
Normalization
To make the estimate independent of the overall speech level
Gaussian Mixture Model
For each frame has two Gaussian mix-ture models (GMMs):
one trained on non-sibilant speech and the other on sibilant speech.
EXPERIMENTS
Filter for1.5 kHz to 8 kHz. The weighting function used for three
Hamming windows
GMMs
The input for the GMMs was a 14-component vector
containing the estimated sibilant power spectrum from
1.5 kHz to 8 kHz every 500 Hz
Result
White Gaussian noise was added to the speech files
it is more difficult to detect sibilants in white noise than in other typical stationary noise
Result
Pmiss = miss probability
Pfa = false alarm probability
Result
Result
CONCLUSIONS
we have presented a sibilant detection algorithm with noise
sibilant mean power estimation stage likelihood ratio of two GMMs, Test in TIMIT . 80% classification accuracy for positive
SNRs.
For Future
it is possible that its classification accuracy could be further improved by applying temporal constraints to the classification decisions.
Thank you