♥♥♥♥ 1. intro. 2.spec.sub. 3.est. noise 4.intro.j& s 5.results 6 concl. ♠♠ ◄◄...
Post on 03-Jan-2016
229 Views
Preview:
TRANSCRIPT
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/19
IIT
Bo
mb
ayICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia[Wed, 25th Aug, R. 201 Speech processing & communication systems 2, 15.40]
Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation,
and Introduction of Jitter and Shimmer
Prem C. PandeyS. Khadar Basha
{pcpandey, basha}ee.iitb.ac.inhttp://www.ee.iitb.ac.in/~spilab
IIT Bombay, India
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 2/19
IIT
Bo
mb
ay OVERVIEW1. Introduction2. Spectral subtraction3. Estimation of noise spectrum4. Jitter, shimmer, & spectral compensation5. Results 6. Conclusion
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 3/19
IIT
Bo
mb
ay
1 INTRODUCTION
Glottal excitation to vocal tract
Intro. 1/4
Natural speech
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 4/19
IIT
Bo
mb
ay
Excitation to vocal tract from external vibrator
Electrolaryngeal speech
Intro. 2/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 5/19
IIT
Bo
mb
ay Problems with electrolarynx
• Dynamic control of level, voicing, & pitch not feasible
• Background noise due to leakage of acoustic energy, affecting the intelligibility
• Unnatural quality due to
▪ Low frequency spectral deficit
▪ Constant pitch & level
Intro. 3/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 6/19
IIT
Bo
mb
ay
Methods of noise reduction• Acoustic shielding of vibrator (Epsy-Wilson et al 1996)
• 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al 1996)
• Single input noise cancellation using spectral subtraction▪ Averaging based noise est. & pitch-synch. generalized spectral
subtraction (Pandey et al 2002)
▪ Quantile based noise estimation (Pandey et al 2004)
▪ Parameter adaptation using freq.domain auditory masking (Liu et al 2006)
▪ Min.statistics based noise estimation (Mitra & Pandey 2006, Kabir et al 2008)
Intro. 4/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 7/19
IIT
Bo
mb
ay
2 SPECTRAL SUBTRACION
Noise generation (Pandey et al 2002)
• Leakage of vibrations produced by vibrator membrane
• Improper coupling of vibrations to the neck tissue
Spec. sub 1/5
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 8/19
IIT
Bo
mb
ay
s(n) = e(n)*hv(n), l(n) = e(n)*hl(n), x(n) = s(n) + l(n)
Xn(ej) = En(ej) [Hvn(ej) + Hln
(ej)]
• Assuming hv(n) & hl (n) to be uncorrelated
Xn(ej)2 = En(ej)2[Hvn(ej)2 + Hln
(ej)2]
• For short-time spectra calculated using pitch-synchronous window, En(ej)2 may be considered as constant E(ej)2
• During non-speech intervals, s(n) will be negligible,
Xn(ej)2 = |Ln(ej)|2
= |E(ej)|2 |Hln(ej)|2
Spec. sub 2/5
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 9/19
IIT
Bo
mb
ay Generalized spectral subtraction (Berouti et al 1979) using
FFT
E(k) = | Xn(k)|γ - α|Ln(k)|γ
Clean mag. spectrum
|Y n(k)| = [E(k)](1/ γ), if E(k) > [β|Ln(k)|]γ
β|Ln(k)|, otherwise
( : subtraction, : spectral floor, : subtraction power)
yn(m) = IDFT [ Yn(k) ejθn(k)]
Spec. sub 3/5
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 10/19
IIT
Bo
mb
ay
Phase estimation
▪ Noisy phase : θn(k) = Xn(k)
▪ Zero Phase : θn(k) = 0
▪ Random phase: θn(k) = r (uniformly distr over [0, 2π]
▪ Min. phase calculationiterative tech. (Quatieri and Oppenheim 1981),
cepstrum based non-iterative calculation (Oppenheim & Schafer 1975, Rabiner & Schafer 1978, Yegnanarayana & Dhayalan 1981)
▪ Phase set for continuity across the frames
θn(k) = θn-1(k) + (2πndk)/N
where nd = window shift, N = FFT size
▪ Noisy phase resulted in better quality than others
Spec. sub 4/5
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 11/19
IIT
Bo
mb
ay
Block diagram of spectral subtraction
Spec. sub 5/5
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 12/19
IIT
Bo
mb
ay
3 ESTIMATION OF NOISE SPECTRUM
• Variation in noise due to change in the electrolarynx orientation
• Voice activity detection difficult in electrolaryngeal speech
• Averaging based noise est. (Pandey et al 2002) unsuitable for long term use
• Quantile based noise est. (Stahl et al 2000) used for electrol. speech (Pandey et al 2004) difficult to implement for real-time processing
• Minimum statistics based method (Martin 1994) used for elec. lary. speech (Mitra & Pandey 2006, Kabir et al 2008) not effective with fixed subtraction parameters.
Est. noise 1/1
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 13/19
IIT
Bo
mb
ay
• Introduction of jitter and shimmer, using LPC based analysis synthesis, after spectral subtraction for reducing unnaturalness• Spectral compensation for low-frequency spectral deficit
4 INTRO. OF JITTER & SHIMMER & SPECTRAL COMPENSATION
Intro. J & S 1/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 14/19
IIT
Bo
mb
ay
Implementation of shimmer
Impulse amplitude = a(1+sr1)
a = mean amplitude
s = peak-to-peak shimmer
r1= random number uniformly distributed over +0.5
Implementation of jitter
Impulse repetition period = N(1+jr2)
N = mean pitch period in number of samples
j = peak-to-peak jitter
r2 = random number uniformly distributed over +0.5
Intro. J & S 3/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 15/19
IIT
Bo
mb
ay
• Low frequency spectral deficit in electrolaryngeal speech
• High frequency spectral emphasis in resynthesized speech due to impulse train excitation in LPC analysis-synthesis,
• Spectral compensation filter designed by comparing LPC smoothened spectra of natural and resynthesized /a/, /i/, /u/. Inserted in the excitation path for spectral compensation.
Spectral compensation
Intro. J & S 4/4
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 16/19
IIT
Bo
mb
ay
5 RESULTS Results 1/2
γ = 1Averaging: α=10, β=0.001, Min.: α = 25, β=0.005, Median: α = 1.5, β = 0.001
Materal: “….Where were you a year ago? 1 2 3 4 5 6 7 8 9 10”Electrolarynx Solatone.
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 17/19
IIT
Bo
mb
ay
Jitter →---------------------
Shimmer ↓ 0% 6% 12% 20% 40%
0%
6%
12%
20%
40%
Electrolaryngeal speech Enhan. electrolar. speech after spec. sub. with MBNE
Material: “…Where were you a year ago? “, Electrolarynx: Solatone
Results 2/2
(α = 1.2, β = 0.001, γ = 1),
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 18/19
IIT
Bo
mb
ay
6 CONCLUSION
▪ Median based noise estimation could be used for noise suppression without varying the oversubtraction factor.
▪ Phase estimation based on minimum phase and phase continuity did not imrove the quality above that of the noisy speech.
▪ Introduction of shimmer did not improve speech quality.
▪ Introduction of peak-to-peak jitter of up to 6 % and spectral compensation helped in improving the quality.
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 19/19
IIT
Bo
mb
ay
Thank You
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 20/19
IIT
Bo
mb
ay
P. C. Pandey, S. K. Basha, “Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer”, Proc. 20th International Congress on Acoustics ( ICA 2010), 23-27 August 2010, Sydney, Australia.
Abstract -- An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limitation, the electrolaryngeal speech suffers from (i) presence of background noise caused by leakage of acoustic energy from the vibrator and vibrator-tissue interface, (ii) low-frequency spectral deficiency, and (iii) unnatural quality due to constant pitch and level. Background noise decreases the intelligibility, while the other two factors affect the speech quality. Present study involved investigations for improving the intelligibility and quality of electrolaryngeal speech. Pitch-synchronous application of generalized spectral subtraction was used for reducing the background noise. In order to track the variation in the spectrum of the leakage noise due to changes in vibrator orientation and pressure during speech production, a dynamic estimation of noise was carried out from a set of past frames. The estimated noise spectrum was subtracted from that of the noisy speech and the resulting magnitude spectrum was combined with the original phase spectrum. The speech signal was resynthesized using overlap-add method, with two-pitch period analysis frames and one period overlap. Estimation of phase spectrum by minimum-phase assumption and the assumption of phase continuity did not improve the speech quality. An introduction of jitter and shimmer in the speech signal, using LPC based analysis-synthesis, was investigated for improving its naturalness. The excitation for synthesis was an impulse train with the frequency equal to that of the vibrator, with random frequency and amplitude modulations for providing the jitter and the shimmer, respectively. An FIR filtering of the excitation was used to match the long-term average spectral envelope of the processed electrolaryngeal speech to that of the normal speech. A peak-to-peak jitter of up to 6 % increased the naturalness, while introduction of shimmer decreased the quality.
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 21/19
IIT
Bo
mb
ayREFERENCES
1 M. Weiss, G. Y. Komshian, and J. Heinz, “Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx,” J. Acoust. Soc. Am., 65, 1298-1308 (1979).
2 H. L. Barney, F. E. Haworth, and H. K. Dunn, “An experimental transistorized artificial larynx,” Bell Systems Tech. J., 38, 1337-1356 (1959).
3 Q. Yingyong and B. Weinberg, “Low frequency energy deficit in electrolaryngeal speech,” J. Speech Hearing Res., 34, 1250-1256 (1991).
4 C. Y. Espy-Wilson, V. R. Chari, and C. B. Haung, “Enhance ment of alaryngeal speech by adaptive filtering,” Proc. ICSLP, 764-771 (1996).
5 P. C. Pandey, S. M. Bhandarkar, G. K. Baccher, and P. K. Lehena, “Enhancement of alaryngeal speech using spectral subtraction,” Proc. 14th Int. Conf. Digital Signal Prcessing (DSP 2002), Santorini, Greece, 591-594 (2002).
6 P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise,” Proc. 18th Int. Congress Acoustics (ICA 2004), Kyoto, Japan, 3029-3032 (2004).
7 H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., 120, 398-406 (2006).
8 H. Liu, Q. Zhao, M. Wan and S. Wang, “Enhancement of electrolarynx speech based on auditory masking,” IEEE Trans. Biomed. Eng., 53, 865-874 (2006).
9 P. Mitra and P.C. Pandey, “Enhancement of electro laryngeal speech by spectral subtraction with minimum statistics-based noise estimation,” J. Acoust. Soc. Amer., 120, 3039 (2006).
10 R. Kabir, A. Greenblatt, K. Panetta, and S. Agaian, “Enhance ment of alaryngeal speech utilizing spectral subtract ion and minimum statistics,” Proc. 7th International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July (2008).
11 S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process, 27, 113-120 (1979).
12 M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.IEEE ICASSP’79, 208-211 (1979).
13 V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” Proc. IEEE ICASSP’00, 3, 1875-1878 (2000).
♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 22/19
IIT
Bo
mb
ay
14 R. Martin, “Spectral subtraction based on minimum statisic” Proc. 7th European Signal Processing Conf. (EUSIPCO–94), Edinburgh, Scoltland, 1182-1185 (1994).
15 T. F. Quatieri and A. V. Oppenheim,“Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., 29, 1187-1193 (1981).
16 B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or magnitude,” Proc. IEEE ICASSP, 639-642, (1983).
17 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. (Prentice-Hall, Englewood Cliffs, New Jersey, 1975).
18 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, (Prentice Hall, Englewood Cliffs, New Jersey, 1978).
top related