methods for low bitrate coding enhancement part ii: spatial … · 2017. 9. 22. · stereo with...

© Fraunhofer IIS 1 08.09.2017

Methods for Low Bitrate Coding Enhancement Part II: Spatial Enhancement

Christian Uhle1,2, Patrick Gampp1, Oliver Hellmuth1, Peter Prokein1,

Jürgen Herre2,1, Sascha Disch1,2, Julia Havenstein1, Antonios Karampourniotis1

1 Fraunhofer IIS, Erlangen, Germany2 International Audio Laboratories Erlangen, Germany


Methods for Low Bitrate Coding Enhancement Part II: Spatial Enhancement

1. Introduction

2. System Overview

3. Ambient Sound Enhancement

4. Stereo Width Enhancement

5. Evaluation

6. Conclusion


1. IntroductionMotivation

Perceptual Audio Coding (PAC) is applied for storage and transmission of audio signals.

Perceptual transparency is achieved when bitrate is high enough. Original and coded/decoded signals are indistinguishable when listening in an

optimal listening environment. At low bitrates, artifacts can be introduced and the sound quality is reduced. Width of stereo image is reduced, e.g. due to Decreased difference signal (M/S Coding), Increased correlation between channel signals (Intensity Stereo Coding).


1. IntroductionMotivation

Aim is to apply post‐processing for improving the sound quality. Single‐ended, i.e. without having information about the coding (codec, bit rate). Criterion is pleasantness, not transparency.


2. System OverviewIntegration into the Automotive Sound System

Audio Decoder

Car Head Unit

(Degraded)PCM Audio

SignalAudio Source Manager

(Enhanced) PCM Audio

Signal

)))

)))

)))

))))))

(Compressed) Audio

Bitstream

PCM Loudspeaker

Signal

Car Amplifier

Car Sound Processing

Low Bitrate Coding Enhancement Suite

Spectral Restoration

Spatial Enhancement

...

Audio Decoder

Car Head Unit

(Degraded)PCM Audio

SignalX’Audio

Source Manager

Low Bitrate Coding Enhancement Suite

Spectral Restoration

Spatial Enhancement

(Enhanced) PCM Audio

Signal

)))

)))

)))

))))))

(Compressed) Audio

Bitstream

PCM Loudspeaker

Signal

...

Car Amplifier

Car Sound Processing

Operation in the Head Unit

Operation in the Amplifier


3. Ambient Sound EnhancementOverview

Improve the perceived stereo image by applying artificial decorrelation to the background signal components.

Background sounds: ambient sounds, background music (radio broadcast) and musical accompaniment.

Foreground sounds: singers, talkers, soloists, loud instruments (drums). Maintains the timbral qualities without introducing coloration and artifacts. Decorrelation can impair the sound quality when applied to foreground sounds

(e.g. speech, drums). Decorrelation is not required for foreground sounds (directional sounds are

locatable). The intensity of the decorrelation is controlled using a model of reverberance

(perceptual attribute that relates to the intensity of reverberation).


3. Ambient Sound EnhancementBlock Diagram


3. Ambient Sound EnhancementBlock Diagram

Background sounds are separated by attenuating transient and tonal signals.


3. Ambient Sound EnhancementSeparation of the Background Sounds

STFT, Spectral weighting, i.e. scaling of the spectral coefficients, Spectral weights (for each time‐frequency bin) to attenuate transient signal

components, Spectral weights for attenuating tonal signal components, Combination of these spectral weights (by taking the minimum of both), Inverse STFT.


3. Ambient Sound EnhancementAttenuation of Transient Signals

Signal model: Input signal is an additive mixture of a transient signal component and a sustained signal component (in the STFT domain, time frame index k and frequency bin index m):

The transient signal is attenuated by spectral weighting

The spectral weights are computed from estimates of the sustained signal and the transient signal

The sustained signal magnitude is estimated by means of low‐pass filtering of the sub‐band magnitudes along time and limiting the sustained signal by the input.

|Ytrns(k,m)| = Gtrns(k,m)|X(k,m)|

Gtrns =|X̂s| + |X̂t |

|X |

|X(k,m)| = |Xt(k,m)| + |Xs(k,m)|


3. Ambient Sound EnhancementAttenuation of Transient Signals (2)

Sound example: Input signal (black) overlaid by output signal (red)

Time [s]0 1 2 3 4 5

Ampl

itude

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4InputOutput


3. Ambient Sound EnhancementAttenuation of Tonal Signals

Attenuate spectral components that exceed an estimate of the noise floor, i.e. a locally flat magnitude spectrum.


3. Ambient Sound EnhancementDecorrelation of Background Sounds

Linear time‐invariant processing in the time domain with a dense and short impulse response.

Decorrelation filter structure is a trade‐off between sound quality and complexity (computational load, memory requirements and tuning effort).

Here: 3 nested all‐pass filters in parallel per output channel. The tuning of the parameters (delays and gains of the all‐pass filters) is of crucial

importance.


3. Ambient Sound EnhancementDecorrelator Gain Control

The perceived level of decorrelation (and reverberation) depends on both, the processing (impulse response) and the input signal. Lower effect intensity for stationary input signals than for transient signals or

frequency modulated signals (e.g. speech). Level of decorrelation is controlled using a model for the perceived intensity of

decorrelation. Modified version of a model of reverberance (Uhle et. al., 2011), Based on a model for partial loudness (Moore et. al., 1997). Partial loudness difference =

partial loudness of decorrelated signal (masked by the dry input)‐ partial loudness of dry input (masked by the decorrelated signal)


4. Stereo Width EnhancementOverview

Extending the width of the stereo image by enhancing inter‐channel level differences of direct sound components:1. Stereo Mid/Side Decomposition,2. Boost the stereo side signal.

STFT Stereo M/S- Decomposition iSTFT

x(t)X(k, m)

y(t)

S(k, m)

M(k, m)

w

Y(k, m)


4. Stereo Width EnhancementStereo Mid/Side Decomposition

Stereo side signal: S1 = G1X1

S2 = G2X2

Gi = max(0,|Xi |α − κ |D|α

|Xi |α)

β

• D: Downmix of the input signal• Tuning parameters for controlling the attenuation

with spectral weights


5. EvaluationListening Test

Listening test with multiple stimuli using loudspeakers. Conditions: Coded signal without any postprocessing, as known and hidden “reference”, Stereo With Enhancement (SWE), Ambient Sound Enhancement (ASE), SWE + ASE.

5 test signals of length between 8 s and 30 s each, loudness normalized (ITU‐R BS.1770).

Codecs: mp3 at 64kbps, AAC at 48 kbps.

12 listeners.


1. “How well the spatial image has been improved?”2. “Sound quality?”

5. EvaluationListening Test (2)


6. Conclusion

In perceptual audio coding, audible artifacts can be introduced when the bitrate is too low.

We have proposed a suite of algorithms each designed for mitigating common types of artifact.

Listening test: Both methods achieved a significant improvement, The combination of both methods is rated higher than the methods in isolation

(“slightly better”). These tools can be used to implement a Low Bitrate Coding Enhancement system. Future work: Assessment of the performance obtained with a combination of all proposed

enhancement tools (presented in Part 1 and Part 2).


Thank you for your attention!


Sonamic Enhancement Sound Demo

In Regency Ballroom

Listen also to

Symphoria 3D

Sonamic Loudness

methods for low bitrate coding enhancement part ii: spatial … · 2017. 9. 22. · stereo with...

Documents