biomimicry-based spatial unmasking design in...
TRANSCRIPT
Biomimicry-based Spatial Unmasking Design in Beamforming
Andrew Dittberner, Ph.D., V.P. Research Group
Rob de Vries, Ph.D., Senior Research Scientist
Tobias Piechowiak, Ph.D., Research Scientist
Chang, Ma; Principal Scientist
Biomimicry-based Spatial Unmasking Design in Beamforming
Overview
• Introduction
• Spatial unmasking and a model of the acoustic mechanism
• Quantifying the acoustic aspect of spatial unmasking (BESA Metric)
• Perceptual relevance of the BESA metric
• Biomimicry binaural beamformer design
Introduction
1. It is accepted knowledge that having two ears are better than one when trying to listen to a signal of interest in the presence of spatially-separated noise sources (e.g. Blauert, 1997; Bregman, 1994; Zurek, 1993).
2. Models have been proposed purporting to the benefits of the head shadow effect, binaural interactions, and cognitive factors that explain how one can understand sound with linguistic or other contextual meaning better in the presence of spatially-separate noise sources (e.g. Levitt and Rabiner, 1967).
3. Zurek (1993) proposed and discussed at length on the directivity effects of binaural listening (e.g. Better Ear Strategy).
Focus of this study: Model acoustic aspects of spatial unmasking, design a beam former with spatial unmasking constraints (Biomimicry design), objective and subjective tests.
Spatial Unmasking and Beamforming Technology
2000 Hz
-20
-15
-10
-5
0
5 0,0
15,0 30,0
45,0
60,0
75,0
90,0
105,0
120,0
135,0 150,0
165,0 180,0
195,0 210,0
225,0
240,0
255,0
270,0
285,0
300,0
315,0 330,0
345,0
4000 Hz
-20
-15
-10
-5
0
5 0,0
15,0 30,0
45,0
60,0
75,0
90,0
105,0
120,0
135,0
150,0 165,0
180,0 195,0
210,0
225,0
240,0
255,0
270,0
285,0
300,0
315,0 330,0
345,0
Beamforming
Technology
0
1020
30
40
50
60
70
80
90
100
110
120
130
140
150160
170180
190200
210
220
230
240
250
260
270
280
290
300
310
320
330340
350
0
1020
30
40
50
60
70
80
90
100
110
120
130
140
150160
170180
190200
210
220
230
240
250
260
270
280
290
300
310
320
330340
350
Directivity Index 6 to 12 dB
e.g. Ricketts & Dittberner, 2002;
Dittberner & Bentler, 2008
Better-ear strategy ~ 7-8 dB SRT improvement
Target/masker Interaural Phase Differences ~ 2-3 dB
e.g. Hirsh, 1950; Carhart, 1965; Dirks and Wilson,
1969; Plomp and Mimpen, 1981; Bronkhorst and
Plomp, 1988; Yost et al., 1996; Bronkhorst, 2000
Spatial Unmasking
Target
Masker
Spatial Unmasking Model - Better ear strategy (re: Intelligibility)
Speech from front, one rotating noise source: focus on ear with best SNR
Speech from front, diffuse noise: brain adds both ears and devides by 2
Spatial Unmasking Model – Situational Awareness (re:Audibility)
Independent of number of sources: focus on ear with best SNR
Quantifying spatial unmasking: BESA METRIC
1. The Better Ear Index (BEI) is calculated as follows:
2. The situational Awareness Index (SAI) can be defined as:
3. BESA is then defined as:
7
𝑆𝐴𝐼 = 10 ⋅ 𝑙𝑜𝑔10𝑆𝑇𝐷(𝑃𝑘
𝑐𝑜𝑚𝑏)
𝑀𝐸𝐴𝑁(𝑃𝑘𝑐𝑜𝑚𝑏)
𝑃𝑘𝑐𝑜𝑚𝑏 = max(𝑃𝑘
𝑙 , 𝑃𝑘𝑟)
𝑃𝑘𝑐𝑜𝑚𝑏 = min(𝑃𝑘
𝑙 , 𝑃𝑘𝑟)
𝐵𝐸𝐼 = 10 ⋅ 𝑙𝑜𝑔10𝑃1
𝑐𝑜𝑚𝑏
𝑀𝐸𝐴𝑁(𝑃𝑘𝑐𝑜𝑚𝑏)
BESA = 𝐵𝐸𝐼 − 𝑆𝐴𝐼
Behavioral results (setup)
Desired speech
Distractor speech
Situational Awareness Speech intelligibility
• Testing is done with the help of a virtual sound
environment (MCRoomSim)
• Arbitrary beam-patterns (BESA) can be constructed
• Subjective sound perception is evaluated
• Source separation (Scale 0-5)
• Listening effort (Scale 0-5)
The room size can be changed
as desired but only shoebox
(rectangular) rooms are
supported.
Room Size
Frequency dependent absorption
and scattering coefficients can be
specified, brick, concrete, etc. This
makes it possible to simulate
different reverberation levels, e.g.
a living room or bathroom.
Absorption
An unlimited number of sources
with different radiation patterns
can be placed in the room, e.g.
omni or unidirectional. Sources
could be speakers, music,
noises, etc.
Sources The HRTF for different
microphones can also be specified
allowing for arbitrary HRTFs.
Receiver
- Subjective listening tests
- Localization tests
- DSP algorithm evaluation
- Beamformer design
- Parameter optimization
- Robustness measurements
Applications
MCRoomSim
Behavioral results
Subjective perception (n=12) SRT (n=12)
BESA #1: 6.6 dB BESA #2: 11.6 dB BESA #3: 21.9 dB BESA #4: 10.9 dB BESA #5: 18.1 dB
Monitor ear Monitor ear
Biomimicry binaural beamformer design (using monaural patterns)
Focus ear Focus ear
Maximize DI
Optimal in diffuse noise
Trade off between BEI and SAI:
• Till 2.5 kHz emphasize SAI
• Above 2.5 kHz weigh
BEI and SAI equally
Situational awareness from merging both ears
Biomimicry binaural beamformer design
Better ear effect from merging both ears
BEI: 5.8 dB
SAI: - 4.8 dB
BESA: 10.6 dB
Focus ear
Monitor ear
Conclusions
• It seems possible to design a binaural beamformer that provides high frontal speech intelligibility while preserving the possibility for spatial unmasking
• SAI and BEI correlate well with subjective perception of benefit (source separation and listening effort)
• Results indicate that BESA is a reasonable correlate for situational awareness and better-ear-listening (but could be dominated by either the BEI or SAI)