auditory prosthesis
TRANSCRIPT
8/10/2019 Auditory Prosthesis
http://slidepdf.com/reader/full/auditory-prosthesis 1/3
Auditory Prosthesis
An auditory prosthesis is a device that substitutes for or enhances the ability to hear. It is more
commonly called a hearing aide.
To significantly improve speech-in-noise intelligibility.
Figure 1: Hearing Prosthetic System
Current speech enhancement algorithms improve speech quality, but not necessarily intelligibility.
While hearing-impaired listeners do benefit from improved speech quality, communication problems
still exist if intelligibility is not improved. The ideal binary mask is one algorithm specifically shown to
improve speech intelligibility. In [9], the speech intelligibility scores reported by normal hearing
listeners increased from 12% to 100% after speech embedded in four-talker babble was processed
by the ideal binary mask. Similarly, the ideal binary mask improved speech intelligibility from nearly
0% to 100% in the study described in [10].
BINARY MASK ALSORITHM
Speech is sparse in the time-frequency domain. If we assume that noise is also sparse in this
domain, then it very likely does not overlap with the speech. So, we can remove the noisy regions of
8/10/2019 Auditory Prosthesis
http://slidepdf.com/reader/full/auditory-prosthesis 2/3
the time-frequency plane (by applying the appropriate “binary mask”), which will leave us with intact,
noise-free speech [5]. The algorithm is effective even if the noise is not sparse in the time-frequency
domain; the overall signal-to-noise ratio (SNR) of the speech can be greatly improved by discarding
those regions of the time-frequency plane whose SNR fails to exceed a specified threshold.
Figure 2: Binary Mask Algorithm
A practical implementation of the algorithm generally has three stages—spectral analysis,
classification, and synthesis, as shown in Fig. 2. The spectral analysis stage uses the fast Fourier
transform (FFT) or a filter bank to map the original, noisy signal from the time domain to the time-frequency (TF) domain. In the classification stage, each TF unit is either identified as belonging to
class „1‟ (clean speech, a.k.a. “target”), or class „0‟ (noise). This classification creates a binary mask.
In the synthesis stage, the TF-domain version of the original, noisy signal is multiplied by the binary
mask, effectively removing all of the noise-containing portions of the signal. After the binary mask is
applied, the TF units are then recombined to form a speech signal that is clean (or at least of higher
SNR than before).
Generalization of supervised learningfor binary mask estimationMay, T. ; Centre for Appl. Hearing Res., Tech. Univ. of Denmark, Lyngby, Denmark ; Gerkmann, T.
This paper addresses the problem of speech segregation by estimating the
ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised
learning approach that incorporates a priori knowledge about the feature distribution
observed during training. The second method solely relies on a frame-based speech
presence probability (SPP) es-timation, and therefore, does not depend on the acoustic
condition seen during training. We investigate the influence of mismatches between the
acoustic conditions used for training and testing on the IBM estimation performance and
discuss the advantages of both approaches.
8/10/2019 Auditory Prosthesis
http://slidepdf.com/reader/full/auditory-prosthesis 3/3
A new mask-based objective measure for predicting the intelligibility
of binary masked speech
Chengzhu Yu ; Wojcicki, K.K. ; Loizou, P.C. ; Hansen, J.H.L.
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE
Mask-based objective speech-intelligibility measures have been successfully proposed for
evaluating the performance of binary maskingalgorithms. These objective measures were computed
directly by comparing the estimated binary mask against the ground truth idealbinary mask (IdBM).
Most of these objective measures, however, assign equal weight to all time-frequency (T-F) units. In
this study, we propose to improve the existing mask-based objective measures by weighting each T-
F unit according to its target or masker loudness. The proposed objective measure shows
significantly better performance than two other existing mask-based objective measures.
An algorithm combined with spectral subtraction andbinary masking formonaural speech segregation
Monaural speech segregation from complex concurrent noise is an extremely challenging
problem; binary mask is a method to solve this problem, however, the performance of binary mask is
limited by remaining the noise in the result. In this paper, an algorithm integrated Spectral
Subtraction and binary masking for speech separation and enhancement was proposed. It follows
the framework of computational auditory scene analysis (CASA). The energy of time-frequency (T-F)
unit was used as the clue to generate the binary mask; then the spectral subtraction algorithmwas
used to eliminate noise energy in original speech and an interim speech was obtained, after covered
the binary mask on the interim speech, the target speech can be achieved. Systematic evaluation
shows that the combined algorithm can stably improve the SNR and voice quality for noisy speech. It
performs better than existing binarymasking systems in most situations, especially when the noise
and the speech have the similar power spectrum.