methods for low bitrate coding enhancement part ii: spatial … · 2017. 9. 22. · stereo with...
TRANSCRIPT
© Fraunhofer IIS 1 08.09.2017
Methods for Low Bitrate Coding Enhancement Part II: Spatial Enhancement
Christian Uhle1,2, Patrick Gampp1, Oliver Hellmuth1, Peter Prokein1,
Jürgen Herre2,1, Sascha Disch1,2, Julia Havenstein1, Antonios Karampourniotis1
1 Fraunhofer IIS, Erlangen, Germany2 International Audio Laboratories Erlangen, Germany
© Fraunhofer IIS 2 08.09.2017
Methods for Low Bitrate Coding Enhancement Part II: Spatial Enhancement
1. Introduction
2. System Overview
3. Ambient Sound Enhancement
4. Stereo Width Enhancement
5. Evaluation
6. Conclusion
© Fraunhofer IIS 3 08.09.2017
1. IntroductionMotivation
Perceptual Audio Coding (PAC) is applied for storage and transmission of audio signals.
Perceptual transparency is achieved when bitrate is high enough. Original and coded/decoded signals are indistinguishable when listening in an
optimal listening environment. At low bitrates, artifacts can be introduced and the sound quality is reduced. Width of stereo image is reduced, e.g. due to Decreased difference signal (M/S Coding), Increased correlation between channel signals (Intensity Stereo Coding).
© Fraunhofer IIS 4 08.09.2017
1. IntroductionMotivation
Aim is to apply post‐processing for improving the sound quality. Single‐ended, i.e. without having information about the coding (codec, bit rate). Criterion is pleasantness, not transparency.
© Fraunhofer IIS 5 08.09.2017
2. System OverviewIntegration into the Automotive Sound System
Audio Decoder
Car Head Unit
(Degraded)PCM Audio
SignalAudio Source Manager
(Enhanced) PCM Audio
Signal
)))
)))
)))
))))))
(Compressed) Audio
Bitstream
PCM Loudspeaker
Signal
Car Amplifier
Car Sound Processing
Low Bitrate Coding Enhancement Suite
Spectral Restoration
Spatial Enhancement
...
Audio Decoder
Car Head Unit
(Degraded)PCM Audio
SignalX’Audio
Source Manager
Low Bitrate Coding Enhancement Suite
Spectral Restoration
Spatial Enhancement
(Enhanced) PCM Audio
Signal
)))
)))
)))
))))))
(Compressed) Audio
Bitstream
PCM Loudspeaker
Signal
...
Car Amplifier
Car Sound Processing
Operation in the Head Unit
Operation in the Amplifier
© Fraunhofer IIS 6 08.09.2017
3. Ambient Sound EnhancementOverview
Improve the perceived stereo image by applying artificial decorrelation to the background signal components.
Background sounds: ambient sounds, background music (radio broadcast) and musical accompaniment.
Foreground sounds: singers, talkers, soloists, loud instruments (drums). Maintains the timbral qualities without introducing coloration and artifacts. Decorrelation can impair the sound quality when applied to foreground sounds
(e.g. speech, drums). Decorrelation is not required for foreground sounds (directional sounds are
locatable). The intensity of the decorrelation is controlled using a model of reverberance
(perceptual attribute that relates to the intensity of reverberation).
© Fraunhofer IIS 7 08.09.2017
3. Ambient Sound EnhancementBlock Diagram
© Fraunhofer IIS 8 08.09.2017
3. Ambient Sound EnhancementBlock Diagram
Background sounds are separated by attenuating transient and tonal signals.
© Fraunhofer IIS 9 08.09.2017
3. Ambient Sound EnhancementSeparation of the Background Sounds
STFT, Spectral weighting, i.e. scaling of the spectral coefficients, Spectral weights (for each time‐frequency bin) to attenuate transient signal
components, Spectral weights for attenuating tonal signal components, Combination of these spectral weights (by taking the minimum of both), Inverse STFT.
© Fraunhofer IIS 10 08.09.2017
3. Ambient Sound EnhancementAttenuation of Transient Signals
Signal model: Input signal is an additive mixture of a transient signal component and a sustained signal component (in the STFT domain, time frame index k and frequency bin index m):
The transient signal is attenuated by spectral weighting
The spectral weights are computed from estimates of the sustained signal and the transient signal
The sustained signal magnitude is estimated by means of low‐pass filtering of the sub‐band magnitudes along time and limiting the sustained signal by the input.
|Ytrns(k,m)| = Gtrns(k,m)|X(k,m)|
Gtrns =|X̂s| + |X̂t |
|X |
|X(k,m)| = |Xt(k,m)| + |Xs(k,m)|
© Fraunhofer IIS 11 08.09.2017
3. Ambient Sound EnhancementAttenuation of Transient Signals (2)
Sound example: Input signal (black) overlaid by output signal (red)
Time [s]0 1 2 3 4 5
Ampl
itude
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4InputOutput
© Fraunhofer IIS 12 08.09.2017
3. Ambient Sound EnhancementAttenuation of Tonal Signals
Attenuate spectral components that exceed an estimate of the noise floor, i.e. a locally flat magnitude spectrum.
© Fraunhofer IIS 13 08.09.2017
3. Ambient Sound EnhancementDecorrelation of Background Sounds
Linear time‐invariant processing in the time domain with a dense and short impulse response.
Decorrelation filter structure is a trade‐off between sound quality and complexity (computational load, memory requirements and tuning effort).
Here: 3 nested all‐pass filters in parallel per output channel. The tuning of the parameters (delays and gains of the all‐pass filters) is of crucial
importance.
© Fraunhofer IIS 14 08.09.2017
3. Ambient Sound EnhancementDecorrelator Gain Control
The perceived level of decorrelation (and reverberation) depends on both, the processing (impulse response) and the input signal. Lower effect intensity for stationary input signals than for transient signals or
frequency modulated signals (e.g. speech). Level of decorrelation is controlled using a model for the perceived intensity of
decorrelation. Modified version of a model of reverberance (Uhle et. al., 2011), Based on a model for partial loudness (Moore et. al., 1997). Partial loudness difference =
partial loudness of decorrelated signal (masked by the dry input)‐ partial loudness of dry input (masked by the decorrelated signal)
© Fraunhofer IIS 15 08.09.2017
4. Stereo Width EnhancementOverview
Extending the width of the stereo image by enhancing inter‐channel level differences of direct sound components:1. Stereo Mid/Side Decomposition,2. Boost the stereo side signal.
STFT Stereo M/S- Decomposition iSTFT
x(t)X(k, m)
y(t)
S(k, m)
M(k, m)
w
Y(k, m)
© Fraunhofer IIS 16 08.09.2017
4. Stereo Width EnhancementStereo Mid/Side Decomposition
Stereo side signal: S1 = G1X1
S2 = G2X2
Gi = max(0,|Xi |α − κ |D|α
|Xi |α)
β
• D: Downmix of the input signal• Tuning parameters for controlling the attenuation
with spectral weights
© Fraunhofer IIS 17 08.09.2017
5. EvaluationListening Test
Listening test with multiple stimuli using loudspeakers. Conditions: Coded signal without any postprocessing, as known and hidden “reference”, Stereo With Enhancement (SWE), Ambient Sound Enhancement (ASE), SWE + ASE.
5 test signals of length between 8 s and 30 s each, loudness normalized (ITU‐R BS.1770).
Codecs: mp3 at 64kbps, AAC at 48 kbps.
12 listeners.
© Fraunhofer IIS 18 08.09.2017
1. “How well the spatial image has been improved?”2. “Sound quality?”
5. EvaluationListening Test (2)
© Fraunhofer IIS 19 08.09.2017
1. “How well the spatial image has been improved?”2. “Sound quality?”
5. EvaluationListening Test (2)
© Fraunhofer IIS 20 08.09.2017
6. Conclusion
In perceptual audio coding, audible artifacts can be introduced when the bitrate is too low.
We have proposed a suite of algorithms each designed for mitigating common types of artifact.
Listening test: Both methods achieved a significant improvement, The combination of both methods is rated higher than the methods in isolation
(“slightly better”). These tools can be used to implement a Low Bitrate Coding Enhancement system. Future work: Assessment of the performance obtained with a combination of all proposed
enhancement tools (presented in Part 1 and Part 2).
© Fraunhofer IIS 21 08.09.2017
Thank you for your attention!
© Fraunhofer IIS 22 08.09.2017
Sonamic Enhancement Sound Demo
In Regency Ballroom
Listen also to
Symphoria 3D
Sonamic Loudness