![Page 1: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/1.jpg)
Mingzi Li
Department of Electrical Engineering
Supervised by: Prof. Israel Cohen
November 2013
Multisensory speech enhancement
in noisy environments using
bone-conducted and air-conducted microphones
![Page 2: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/2.jpg)
Outline
Introduction
Review of existing methods
Probabilistic approach
Geometric extension approach
Summary
![Page 3: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/3.jpg)
Introduction • Speech enhancement
• Multi-sensory speech enhancement
• Bone-conducted microphone
• Research objectives
![Page 4: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/4.jpg)
Speech enhancement
Introduction
Mobile phones VoIP
Teleconferencing
Hearing
aids
Speech
recognition
Improve speech
quality
![Page 5: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/5.jpg)
Multi-sensory speech enhancement
Audio-visual speech processing (G. Potamianos et al,2004)
Air and throat microphones (M. Graciarena et al,2003)
Ear plug (O. M. Strand et al, 2003)
Stethoscope device (P. Heracleous et al, 2003)
Aliph’s Jawbone headsets
Electromagnetic motion sensor (GEMS) (G. C. Burnett,1999)
Physiological microphone (P-Mic) (M. V. Scanlon,1998)
Electroglottograph (EGG) (M. Rothenberg,1992)
Bone-Conducted Microphone (T. Yanagisawa et al, 1975)
Introduction
![Page 6: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/6.jpg)
Bone-conducted microphone
Introduction
![Page 7: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/7.jpg)
Conducting path (K. Kondo et al,2006)
Conducting path of AC and BC microphones
• Bone conducted:
• Less noise & low frequency
• Air conducted:
• More noise & complete frequency
,
t t t t
t t t t t t
t
t t
t
t
t
Y X V U AC
B H X GV W BC
X Clean
H G Transfer function
V Noise
U AC sensor Noise
W BC sensor Noise
Model
Introduction
![Page 8: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/8.jpg)
Signal and spectrogram (A. Subramanya et al,2008)
Waveforms and spectrograms of the signals captured by the ABC microphone. The first row
shows the signal captured by the air microphone and the second row shows the signal
captured by the bone microphone.
Introduction
![Page 9: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/9.jpg)
Research objectives
BC microphone as a dominant sensor:
• Geometric harmonics method
• Laplacian pyramid method
Compare the proposed methods with an existing method
Introduction
![Page 10: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/10.jpg)
Review of existing methods • BC microphone as a supplementary sensor
• BC microphone as a dominant sensor
![Page 11: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/11.jpg)
Methods
BC m as a supplementary sensor BC m as a dominant sensor
BC m for voice activity detection Equalization: IDFT; DFT; LMS
BC m for pitch detection Analysis and synthesize: LP; LSF (Neural Network)
BC m for low frequency enhancement Probabilistic: ML; MMSE (DBN)
Review of existing methods
![Page 12: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/12.jpg)
Voice activity detection (M.Zhu et al,2007)
Review of existing methods
Speech detection using bone sensors: The top figure illustrates the speech signal captured
by the bone sensor when two people are talking at the same time. The middle figure shows the signal
captured by the regular microphone and bottom figure presents the detection result.
![Page 13: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/13.jpg)
Pitch detection (M. S. Rahman et al, 2010)
Review of existing methods
Left: Pitch tracking of air-conducted speech in noiseless condition. Center: Speech
spectrogram. Right: Pitch tracking of bone-conducted speech in noiseless condition. The
experiments have been conducted on four speeches.
![Page 14: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/14.jpg)
Pitch detection (Cont.)
Review of existing methods
Pitch contours estimated from speech when corrupted by noise. a) pitch contours estimated
from air-conducted speech, b) pitch contours estimated from bone-conducted speech.
![Page 15: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/15.jpg)
BC m for low frequency enhancement (M. S. Rahman,2011)
Block diagram when BC Speech is used for low frequency enhancement.
![Page 16: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/16.jpg)
Equalization
Review of existing methods
IDFT (T. Shimamura et al,2005)
DFT (K. Kondo,2006)
Least Mean Square (LMS) filter (T. Shimamura,2006)
![Page 17: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/17.jpg)
Analysis and synthesize
Review of existing methods
Linear prediction (LP) filter (T. T. Vu,2006)
Line spectral frequency (LSF) filter (T. T. Vu,2008)
![Page 18: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/18.jpg)
Probabilistic
Review of existing methods
Maximum likelihood estimation (MLE) (Z.Liu et al, 2004)
MMSE estimator (A. Subramanya et al, 2005)
Dynamic Bayesian Network (DBN) (A. Subramanya et al,2008)
,
,
( , ) ( , , , )
( , , , ) ( , , ) ( , )
t t t t t t t t
s m
t t t t t t t t t t t t
s m
p X Y B p X S s M m Y B
p X Y B S s M m p M m Y B S s p S s Y B
2
2
, , / 2
, , , / 2
NV
n l W
Y l k X l kR
B l k H l k X l k
1
0
( ) , , ,ˆ ,
, , , , , ( )t
P T l t Y l k B l kX l k
E X l k Y l k B l k T l t
![Page 19: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/19.jpg)
Probabilistic approach • Model
• Network description
• Transfer function & leakage factor
• MMSE estimator
• Result
![Page 20: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/20.jpg)
Model
Air conducted(AC):
Bone conducted(BC):
Background noise:
AC Sensor noise:
BC Sensor noise:
Optimal linear mapping:
Leakage of noise:
Probabilistic approach
t t t tY X V U
t t t t t tB H X GV W
2~ (0, )t vV N
2~ (0, )t uU N
2~ (0, )t wW N
tH
tG
![Page 21: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/21.jpg)
Assumption
Feature: Magnitude-normalized complex spectra
Training: Speech model(k-means)
Method: dynamic Bayesian network (DBN)
( , , , , , , , , )
( , , ) ( , , ) ( ) ( , )
( , ) ( ) ( ) ( ) ( )
t t t t t t t t t
t t t t t t t t t t t t t
t t t t t t
p Y B X X V S M U W
p Y X V U p B X V W p X X p X M S
p M S p S p V p U p W
tt
t
XX
X
1 1, , ,..., , ..., , , 1 ,
, log log , 1
Tf f N N
i j i j i j i j
f f f f
i j i j
d X X d X X d X X d X X i j T
d X X X X f N
Probabilistic approach
![Page 22: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/22.jpg)
Network description
Dynamic Bayesian network
Speech/non-speech
Mixture index
Match the clean speech
Normalized speech
Optimal linear mapping
Leakage noise
Probabilistic approach
![Page 23: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/23.jpg)
Transfer function & leakage factor
Transfer function (non-speech)
Leakage factor (speech)
2
2 2 2 22 2 2 2 2 2 *
2 *
( ) ( ) 4
2
v v v
v
v t w t v t w t v w t tt N t N t N
t
v t tt N
B Y B Y B YG
B Y
2
2 2 2 22 2 2 2 2 2 *
2 *
( ' ) ( ' ) 4 ( ' )
2 ( ' )
s s s
s
v t w t v t w t v w t tt N t N t N
t t
v t tt N
B Y B Y B YH G
B Y
Probabilistic approach
![Page 24: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/24.jpg)
MMSE estimator & result
Estimator
Result
ˆ( , ) ( 0 , ) ( , , 0, 0)
( 0 , ) ( , , 1) ( , , 1, )
t t t t t t t t t t t t
t t t t t t t t t t t t
m
X E X Y B p S Y B E X Y B S M
p S Y B p M m Y B S E X Y B S M m
Probabilistic approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
![Page 25: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/25.jpg)
Geometric extension approach • Model
• Nyström extension method
• Geometric harmonics
• Laplacian pyramid estimation
• Result
![Page 26: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/26.jpg)
Model
Train: ( Mapping from concatenation of noisy AC and BC
speech to clean speech.)
Test: ( Extension of the mapping from concatenation of noisy
AC and BC speech to clean speech.)
512 256:t f R R
t t
t
YYB X
B
Geometric extension approach
512 256*
:* *
*
f R Rt
t t
t
YYB X
B
![Page 27: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/27.jpg)
Nyström extension method (C.T.H. Baker, 1977)
Goal: extend relevant “information” about a large dataset in a
high dimensional space.
Method: find a low-rank approximation to a symmetric,
positive semi-definite kernel.
In essence: only use partial information about the kernel to
solve a simpler eigenvalue problem, and then to extend the
solution using complete knowledge of the kernel.
Geometric extension approach
![Page 28: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/28.jpg)
Nyström extension method (C.T.H. Baker, 1977)
Eigen function approximation
Nyström extension
Geometric extension approach
Scheme of learning functions
(N. Rabin, 2012)
![Page 29: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/29.jpg)
Geometric harmonics (GH) (R.R. Coifman, 2006)
Definition
Example
Gaussian extension
Harmonic extension
Wavelet extension
Geometric extension approach
![Page 30: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/30.jpg)
Geometric harmonics (GH) (R.R. Coifman, 2006)
Eigenvector approximation
Extension
Geometric extension approach
![Page 31: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/31.jpg)
Comments of GH
Need to tune the parameters
Extension of the function is not the original function but the
projection of the function.
The extension range has relation to the complexity of the
function.
, .l
Geometric extension approach
![Page 32: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/32.jpg)
Laplacian pyramid (LP) (Burt and Adelson,1983)
Geometric extension approach
![Page 33: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/33.jpg)
Laplacian pyramid (LP) (N. Rabin, 2012)
Algorithm: Kernel:
Iteration:
Estimation:
21
0 0 0 0 0exp( / ) ( ) ( , )i j i i jW x x K q x w x x
0 01
1
1 0 1
210
1
( ) ( , ) ( )
( ) ( ) ( ) ( ) ( ) ( )
exp( / ) ( ) ( , )2
( ) ( , ) ( )
n
k i k ii
l
k k k l k k i ki
l i j l l i l i jl
n
l k l i k l ii
s x k x x f x
d x f x s x d x f x s x
W x x K q x w x x
s x k x x d x
( ) ( )k l klf x s x
Geometric extension approach
Approximation with
current kernel and
residual
Residual
previous-current
Start with a
coarse kernel
Stop after multiple
iterations
![Page 34: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/34.jpg)
Comments of LP
Kernel method
Improve(Iterate) Diffusion
Residual
LP
• Statistic analysis
Geometric extension approach
0
1
ˆ ( ) ( , ) ( )n
k i k i
i
f x k x x f x
1ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k l k l kf x f x K f x f x
1 0ˆ ˆ ˆ( ) ( ) ( ) ( )l k l k k l kf x f x K f x f x
1 0ˆ ˆ ˆ( ) ( ) ( )l k l k l kf x f x K I f x
2
21 1
0
1 1 1 1
2
0
1 1
2 2 2
0
1
ˆ ( ) ( )
( , ) ( ) ... ( , ) ( ) ( ) ( )
( , ) ... ( , )
( , ) ... ( ,
k k
n n l l
i k i i l i k i i i i i k s
i i i i
n n
i k i l i k i s
i i
n
i k i l i k
i
MSE E f x f x
E k x x f x n k x x f x n s x s x e
E k x x n k x x n e
E k x x n k x x
2 2
1
22 2 2 2 2
1 1 1
)
( 2 ln( ( , )))( , )
ln 2
n
i s
i
ln L nl i k
l i k i s i s
i l i
n e
Ei k x xE k x x n e e
1
1 1( ) ( , ) ( ) ( ) ( )
ˆ ( ) ( ) ( ) ( ) ( ) ( )
n l
l k l i k i i i i l ki i
k k l k k l k k s
E s x E k x x f x n s x s x
Bias E f x f x E s x f x s x f x e
Model : ( ) ( )k k ky x f x n
![Page 35: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/35.jpg)
Result (GH)
Geometric extension approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
![Page 36: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/36.jpg)
Result (LP)
Geometric extension approach
Spectrogram of clean, BC, noisy AC and reconstructed speech: Left: Gaussian noise, Right: interfering
speaker
![Page 37: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/37.jpg)
Comparison of Log Spectral Distortion
Summary
![Page 38: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/38.jpg)
Conclusion
Probabilistic approach scheme improves the quality of
reconstructed speech.
Geometric harmonics can not describe the map very well.
Laplacian pyramid method enable further noise reduction,
but at the cost of distortion for the reconstructed speech.
Summary
![Page 39: Multisensory speech enhancement in noisy …...Voice activity detection (M.Zhu et al,2007) Review of existing methods Speech detection using bone sensors: The top figure illustrates](https://reader036.vdocuments.site/reader036/viewer/2022070904/5f6fba9af3847d061a5e54fe/html5/thumbnails/39.jpg)
Future research
Geometric harmonics in a multi-scale manner.
Find the relation between iteration number and noise level
for Laplacian pyramids.
Further processing needs to reduce distortions of
reconstructed speech in geometric methods.
Summary