biomimicry-based spatial unmasking design in...

Biomimicry-based Spatial Unmasking Design in Beamforming

Andrew Dittberner, Ph.D., V.P. Research Group

Rob de Vries, Ph.D., Senior Research Scientist

Tobias Piechowiak, Ph.D., Research Scientist

Chang, Ma; Principal Scientist

Biomimicry-based Spatial Unmasking Design in Beamforming

Overview

• Introduction

• Spatial unmasking and a model of the acoustic mechanism

• Quantifying the acoustic aspect of spatial unmasking (BESA Metric)

• Perceptual relevance of the BESA metric

• Biomimicry binaural beamformer design

Introduction

1. It is accepted knowledge that having two ears are better than one when trying to listen to a signal of interest in the presence of spatially-separated noise sources (e.g. Blauert, 1997; Bregman, 1994; Zurek, 1993).

2. Models have been proposed purporting to the benefits of the head shadow effect, binaural interactions, and cognitive factors that explain how one can understand sound with linguistic or other contextual meaning better in the presence of spatially-separate noise sources (e.g. Levitt and Rabiner, 1967).

3. Zurek (1993) proposed and discussed at length on the directivity effects of binaural listening (e.g. Better Ear Strategy).

Focus of this study: Model acoustic aspects of spatial unmasking, design a beam former with spatial unmasking constraints (Biomimicry design), objective and subjective tests.

Spatial Unmasking and Beamforming Technology

2000 Hz

-20

-15

-10

-5

0

5 0,0

15,0 30,0

45,0

60,0

75,0

90,0

105,0

120,0

135,0 150,0

165,0 180,0

195,0 210,0

225,0

240,0

255,0

270,0

285,0

300,0

315,0 330,0

345,0

4000 Hz

-20

-15

-10

-5

0

5 0,0

15,0 30,0

45,0

60,0

75,0

90,0

105,0

120,0

135,0

150,0 165,0

180,0 195,0

210,0

225,0

240,0

255,0

270,0

285,0

300,0

315,0 330,0

345,0

Beamforming

Technology

0

1020

30

40

50

60

70

80

90

100

110

120

130

140

150160

170180

190200

210

220

230

240

250

260

270

280

290

300

310

320

330340

350

0

1020

30

40

50

60

70

80

90

100

110

120

130

140

150160

170180

190200

210

220

230

240

250

260

270

280

290

300

310

320

330340

350

Directivity Index 6 to 12 dB

e.g. Ricketts & Dittberner, 2002;

Dittberner & Bentler, 2008

Better-ear strategy ~ 7-8 dB SRT improvement

Target/masker Interaural Phase Differences ~ 2-3 dB

e.g. Hirsh, 1950; Carhart, 1965; Dirks and Wilson,

1969; Plomp and Mimpen, 1981; Bronkhorst and

Plomp, 1988; Yost et al., 1996; Bronkhorst, 2000

Spatial Unmasking

Target

Masker

Spatial Unmasking Model - Better ear strategy (re: Intelligibility)

Speech from front, one rotating noise source: focus on ear with best SNR

Speech from front, diffuse noise: brain adds both ears and devides by 2

Spatial Unmasking Model – Situational Awareness (re:Audibility)

Independent of number of sources: focus on ear with best SNR

Quantifying spatial unmasking: BESA METRIC

1. The Better Ear Index (BEI) is calculated as follows:

2. The situational Awareness Index (SAI) can be defined as:

3. BESA is then defined as:

7

𝑆𝐴𝐼 = 10 ⋅ 𝑙𝑜𝑔10𝑆𝑇𝐷(𝑃𝑘

𝑐𝑜𝑚𝑏)

𝑀𝐸𝐴𝑁(𝑃𝑘𝑐𝑜𝑚𝑏)

𝑃𝑘𝑐𝑜𝑚𝑏 = max(𝑃𝑘

𝑙 , 𝑃𝑘𝑟)

𝑃𝑘𝑐𝑜𝑚𝑏 = min(𝑃𝑘

𝑙 , 𝑃𝑘𝑟)

𝐵𝐸𝐼 = 10 ⋅ 𝑙𝑜𝑔10𝑃1

𝑐𝑜𝑚𝑏

𝑀𝐸𝐴𝑁(𝑃𝑘𝑐𝑜𝑚𝑏)

BESA = 𝐵𝐸𝐼 − 𝑆𝐴𝐼

Behavioral results (setup)

Desired speech

Distractor speech

Situational Awareness Speech intelligibility

• Testing is done with the help of a virtual sound

environment (MCRoomSim)

• Arbitrary beam-patterns (BESA) can be constructed

• Subjective sound perception is evaluated

• Source separation (Scale 0-5)

• Listening effort (Scale 0-5)

The room size can be changed

as desired but only shoebox

(rectangular) rooms are

supported.

Room Size

Frequency dependent absorption

and scattering coefficients can be

specified, brick, concrete, etc. This

makes it possible to simulate

different reverberation levels, e.g.

a living room or bathroom.

Absorption

An unlimited number of sources

with different radiation patterns

can be placed in the room, e.g.

omni or unidirectional. Sources

could be speakers, music,

noises, etc.

Sources The HRTF for different

microphones can also be specified

allowing for arbitrary HRTFs.

Receiver

- Subjective listening tests

- Localization tests

- DSP algorithm evaluation

- Beamformer design

- Parameter optimization

- Robustness measurements

Applications

MCRoomSim

Behavioral results

Subjective perception (n=12) SRT (n=12)

BESA #1: 6.6 dB BESA #2: 11.6 dB BESA #3: 21.9 dB BESA #4: 10.9 dB BESA #5: 18.1 dB

Monitor ear Monitor ear

Biomimicry binaural beamformer design (using monaural patterns)

Focus ear Focus ear

Maximize DI

Optimal in diffuse noise

Trade off between BEI and SAI:

• Till 2.5 kHz emphasize SAI

• Above 2.5 kHz weigh

BEI and SAI equally

Situational awareness from merging both ears

Biomimicry binaural beamformer design

Better ear effect from merging both ears

BEI: 5.8 dB

SAI: - 4.8 dB

BESA: 10.6 dB

Focus ear

Monitor ear

Conclusions

• It seems possible to design a binaural beamformer that provides high frontal speech intelligibility while preserving the possibility for spatial unmasking

• SAI and BEI correlate well with subjective perception of benefit (source separation and listening effort)

• Results indicate that BESA is a reasonable correlate for situational awareness and better-ear-listening (but could be dominated by either the BEI or SAI)

biomimicry-based spatial unmasking design in...

Documents