precedence-based speech segregation in a virtual auditory environment

1

Precedence-based speech segregation in a virtual auditory

environment

Brungart, Simpson & Freyman (2005)

2

The Precedence Effect

• Sounds produced in areas with multiple surfaces give rise to reflections. Many copies of a sound reach a listener’s ears. The direct sound arrives first.

• With complex sounds like speech, early reflections tend to perceptually “fuse” with the direct sound (the “Haas” effect).

• The direct sound dominates localisation – the precedence effect.

msDelay

= +/- 0.5 ms > “summing localisation”

> 1 ms > “precedence effect”

Perceived direction

> 20 ms > “echo threshold”

two sources perceived

3

Masking

• “…the amount of interference one stimulus can cause in the perception of another stimulus.” (Yost and Nielsen, 1977)

• The elevation in threshold of a target signal due to the presence of a masker.

• Energetic masking

– “…masking that results from competition between target and masker at the periphery of the auditory system, i.e., overlapping excitation patterns in the cochlea or auditory nerve (AN).” (Durlach et al., 2003)

• Informational masking

– Non-energetic masking

– Central masking

– “difficulty segregating the audible acoustic components of the target speech signal from the audible acoustic components of a perceptually similar speech masker.” (pp. 3241).

4

Some Assumptions

• Speech target• Random noise masker = purely energetic

masking?• Speech masker = energetic and informational

masking?• So if an experimental manipulation affects the

amount of masking produced by the speech masker but not the noise masker – this is due to a reduction in informational masking?

• Seems reasonable

5

The Basic Experiment

F-F – Baseline masking

F-R – Release from masking regardless of type of masker

F-RF – Release from masking with speech but NOT with noise masker

Freyman et al., 99 – free-field. Brungart et al. – virtual auditory space over headphones

6

Experiment 1

Adding delayed copy of noise to front presented stimulus drops performance to baseline

Adding delayed copy of speech to front hardly makes any difference

Note: using a speech recognition task which is resistant to energetic masking

- Therefore large informational masking component?

7

Interpretation

The precedence effect causes the listener to localise the RF masker off to the right, which helps auditory selective attention attend to the target speech, hence reducing informational masking.

This doesn’t affect the noise masker because it has no informational masking effect – adding it to the front just increases its energetic masking effect.

BUT – The effect is also observed when the delay is negative, so that the first copy of the masker comes from the front (i.e. F-FR). (Freyman et al. 1999)

Precedence should localise the masker to the front in this condition – so why the release from masking with a speech masker?

8

Experiment 2• What is the effect of varying the delay

between the two masker presentations between +/– 64 ms?

• For a noise masker? • Very little.• Some release from masking at delays

which cause “notches” in the spectrum of the masker far enough apart to be resolved by the ear

• For a single-speaker speech masker? • Little effect of delay, positive or

negative, until the “echo threshold” is exceeded

• For a two-speaker speech masker? Much more variation, but still substantial release from masking. Possibly some release from energetic masking effects

• Note that as speakers are added, multi-speaker babble approaches speech-shaped noise.

Baseline

Baseline

Baseline

F-RFF-FR SNR

- 8dB

0 dB

?

9

A Puzzle

• There is virtually no difference between positive and negative delays with the single-speaker masker and not much of an advantage with the two-speaker masker

• What is going on here?• Two possibilities (actually 3, but I’ll come back to this):

– 1) The effect is not based on perceived location, but on timbre or “ source width”

– 2) Even when the copy of the masker added to the front leads the one from the right, the one to the right “pulls” the perceived location off a little so that it is perceived somewhere between front and right

• If (2) is the case, then shifting the apparent location of the target to match that of the masker, should abolish the release from masking

10

Experiment 3

Position of target varied from 0o to 60o In 5o steps, at 7 different delay values from + to – 4ms.

• U-shaped performance curves for all 3 maskers at = 0 ms. Masker heard midway between front and right.

• For the two-speaker masker, when there is a lag (+ve > 0.5 ms, subjects do best when target is located near the front (0o). As expected

• When there is a lead (-ve > 0.5 ms, subjects do best when target is located to the right. • BUT – the minimum performance is found around 10o – NOT at 0o

11

Conclusions• This would appear to support the hypothesis mentioned earlier

– BUT – why is there not a similar minima around 50o when there is a positive delay?

– Also – energetic and informational masking do not seem to have been completely separated by this paradigm as was first thought

• AND – no mention is made of the phenomena of the BMLD:– Whenever the phase or level differences of the target signal at the 2

ears are not the same as those of the masker, ability to detect or identify the target improves

– Inversion of the signal at one ear gives better performance than delaying it – so not just segregation by spatial separation

– Large BMLD’s occur when target and masker are not subjectively well separated

– Hearing is sensitive to the profile of interaural decorrelation across frequency

– This could potentially explain why negative delays are as useful as positive delays – adding a delayed copy of the masker at the right changes the interaural correlation of the masker relative to the target

– But this still wouldn’t explain the difference between speech and noise…

precedence-based speech segregation in a virtual auditory environment

Documents

similar speech masker

informational masking

rf masker

masker presentations

ms precedence effect

target speech signal

delayed copy of speech

speech recognition task