a model of binaural processing based on tree-structure filter-bank 길이만, 김영익, 김화길,...
TRANSCRIPT
A Model of Binaural Processing Based on Tree-Structure Filter-Bank
길이만 , 김영익 , 김화길 , 구임회한국과학기술원 응용수학전공
Motivation
• Design of auditory preprocessors motivated from the characteristics of biological auditory systems.
- robustness to noise- capturing the minute differences between signals (2 Hz difference)- wide dynamic range (140 dB)- selective attention- source localization using two ears
Design of Basilar Membrane (BM)
Types of BM Models
• Lyon and Mead - R. F. Lyon and C. Mead, An Analog Electronic Cochlea, IEEE Transactions on Acoustics, Speech and Signal Processing, 37(7), 1988.
• Liu - W. Liu, A. G. Andreou, and Jr. M. H. Goldstein, Voiced-Speech Representation by an Analog Silicon Model of the Auditory Periphery, IEEE Transactions on Neural Network, 3(3), 1992.
• Kates - J. M. Kates, A Time-Domain Digital Cochlear Model, IEEE Transaction on Signal Processing, 39(12), 1991.
• Hamming BPF - O. Ghitza, Robustness against Noise: the Role of Timing-Synchrony Measurement. IEEE International Conference on Acoustics, Speech and Audio Processing, 6.8, 1987.
HH
HHHHH H H
LHH
LLLH
L
L H L LL L L L L
HHH
H
HHHHH
HH H H H H H
HHHHHHHHHH
HHHHHHH
HHH
L L LL L L L L
L L LL L L L L
Design of Filter Bank
(1) Lyon & Mead
(2) Fully Cascaded BPF (3) TSFB
• Cascaded LPFs• Number of Filters:
• Cascaded LPFs & HPFs• Higher bandpass capability• Equal delay time• Number of Filters:
• Tree sructure• Cascaded LPFs & HPF• Higher bandpass capability• Equal delay time• Versatile Q control• Number of Filters:
)( 2NO
)(NO
)log( NNO
Binaural Processing Models
• EE (Excitation-Excitation) cells in medial superior olive (MSO)- interaural cross-correlation models
• EI (Excitation-Inhibition) cells in lateral superior olive (LSO) - equalization-cancellation (EC) theory
Interaural Cross-correlation Model (EE-type cells)
• Running interaural cross-correlation (Jeffress, 1948)
• Delay weighting (Colburn, 1977)
• Frequency weighting (Stern and Shear, 1996)
dteftlftrt Tttint/)(),(),(),(
2.2. ||for )033.0(
2.2, || 0.15for
0.15, ||for
)()
3.2
2.2||(
)6.0
15.0||(
eC
Ce
C
p
Hz. 1200 f 300for )(][)|( c ccc fCpCffp
Lindemann’s Model (EI-type cells)
• Contralateral inhibition mechanism
• Stationary-inhibition component
• Dynamic-inhibition component
),(),(),( ,, nminminmi drsrr
.1),(0 with
),(1),(,
nmlc
nmlcnmi
s
ssr
.10 ,1),(0 with
),(),(),(
)},({1),(,
nmkc
nmlnmrnmk
nmkcnmi
d
ddr
Breebaart Model (EI-type cells)
• EI-type cell
• Combined EI-type cell
• Temporal windowing
• Nonlinear saturation
Shamma’s Model
The Stereausis Network
Stereausis Processor
Left In
)(ty j
)(2 ty
)(1 ty
)(1 tx )(2 tx )(txi
2)]()([)( nynxnc jiij
Network output for time shifted 600Hz tonea) zero shift b) shift c) shift d) shift
3
3
2
Binaural Processing with TSFB
TSFB
feature extraction (ZCPA, F0)
++
+
++
noise
speech
left right
HRTR
HRTR
TSFB
Simulation for Binaural Processing
Signal
Noise
Noise
Noise
0
45
90
- Signal : TI46 (‘zero’ ~ ‘nine’) male speech samples
- Noise : Noisex samples
Simulation for Binaural Processing
Feature ZCPA
0 45 90 0 45 90
White
Gaussian
Noise
10 94.3 95.0 95.3 94.1 94.5 94.3
5 85.9 93.0 94.2 92.2 92.5 93.0
0 49.7 63.7 74.3 63.9 68.2 75.9
Op
Room
Noise
10 94.8 94.9 95.6 94.5 94.5 94.5
5 93.3 94.2 95.0 93.7 94.4 94.1
0 74.2 87.5 89.6 88.7 91.3 88.8
F16
Noise
10 94.7 95.2 95.2 94.5 94.4 94.2
5 90.7 93.5 93.6 92.1 92.2 92.1
0 51.0 67.9 71.2 66.4 64.7 64.7
0F
Simulation for Monaural Processing without HRTF
Feature ZCPA
White
Gaussian
Noise
10 97.4 95.8
5 96.8 95.1
0 82.4 86.2
Op
Room
Noise
10 97.4 95.7
5 96.0 94.9
0 76.8 91.2
F16
Noise
10 97.3 95.1
5 96.9 95.1
0 83.5 87.5
0F
Conclusion
• A model of binaural processing with TSFB has been suggested.
• Simulation results showed that the binaural processing could be advantageous in noisy environment.
• The HRTF could degrade the performance of speech recognition.
• A new feature combining binaural data will be investigated in the sense of noise robustness.