Download - Kolesnik Audio Compression
-
7/27/2019 Kolesnik Audio Compression
1/26
1
Audio Compression
Techniques
MUMT 611, January 2005
Assignment 2
Paul Kolesnik
-
7/27/2019 Kolesnik Audio Compression
2/26
2
Introduction
Digital Audio Compression Removal of redundant or otherwise irrelevant
information from audio signal
Audio compression algorithms are often referred to asaudio encoders
Applications Reduces required storage space
Reduces required transmission bandwidth
-
7/27/2019 Kolesnik Audio Compression
3/26
3
Audio Compression
Audio signaloverview Sampling rate (# of samples per second)
Bit rate (# of bits per second). Typically,uncompressed stereo 16-bit 44.1KHz signal has a1.4MBps bit rate
Number of channels (mono / stereo / multichannel)
Reduction by lowering those values or by datacompression / encoding
-
7/27/2019 Kolesnik Audio Compression
4/26
4
Audio Data Compression
Redundant information
Implicit in the remaining information
Ex. oversampled audio signal
Irrelevant information
Perceptually insignificant
Cannot be recovered from remaininginformation
-
7/27/2019 Kolesnik Audio Compression
5/26
5
Audio Data Compression
Lossless Audio Compression
Removes redundant data
Resulting signal is sameas originalperfectreconstruction
Lossy Audio Encoding
Removes irrelevant dataResulting signal is similarto original
-
7/27/2019 Kolesnik Audio Compression
6/26
6
Audio Data Compression
Audio vs. Speech Compression
Techniques
Speech Compression uses a human vocaltract model to compress signals
Audio Compression does not use this
technique due to larger variety of possiblesignal variations
-
7/27/2019 Kolesnik Audio Compression
7/26
7
Generic Audio Encoder
QuickTime and aTIFF (LZW) decompressor
are needed to see this picture.
-
7/27/2019 Kolesnik Audio Compression
8/26
8
Generic Audio Encoder
Psychoacoustic Model
Psychoacousticsstudy of how sounds are
perceived by humansUses perceptual coding
eliminate information from audio signal that is
inaudible to the ear
Detects conditions under which different audio
signal components maskeach other
-
7/27/2019 Kolesnik Audio Compression
9/26
9
Psychoacoustic Model
Signal Masking
Threshold cut-off
Spectral (Frequency / Simultaneous) Masking
Temporal Masking
Threshold cut-off and spectral masking
occur in frequency domain, temporalmasking occurs in time domain
-
7/27/2019 Kolesnik Audio Compression
10/26
10
Signal Masking
Threshold cut-off
Hearing threshold
levela function offrequency
Any frequency
components below the
threshold will not be
perceived by human
ear
QuickTime an d a
TIFF (LZW) decompressorare needed to see this picture.
-
7/27/2019 Kolesnik Audio Compression
11/26
11
Signal Masking
Spectral Masking
A frequency
component can bepartly or fully masked
by another component
that is close to it in
frequency
This shifts the hearing
threshold
QuickTime and a
TIFF (LZW) decompressorare needed to see this picture.
-
7/27/2019 Kolesnik Audio Compression
12/26
12
Signal Masking
Temporal MaskingA quieter sound can
be masked by a louder
sound if they aretemporally close
Sounds that occurboth (shortly) before
and aftervolumeincrease can bemasked
QuickTime an d a
TIFF (LZW) decompressorare needed to see this picture.
-
7/27/2019 Kolesnik Audio Compression
13/26
13
Spectral Analysis
Tasks of Spectral Analysis
To derive masking thresholds to determine
which signal components can be eliminatedTo generate a representation of the signal to
which masking thresholds can be applied
Spectral Analysis is done throughtransforms or filter banks
-
7/27/2019 Kolesnik Audio Compression
14/26
14
Spectral Analysis
Transforms
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT) - similar toFFT but uses cosine values only
Modified Discrete Cosine Transform (MDCT)[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3]overlapped and windowedversion of DCT
-
7/27/2019 Kolesnik Audio Compression
15/26
15
Spectral Analysis
Filter Banks
Time sample blocks are passed through a set
of bandpass filtersMasking thresholds are applied to resulting
frequency subband signals
Poly-phase and wavelet banks are mostpopular filter structures
-
7/27/2019 Kolesnik Audio Compression
16/26
16
Filter Bank Structures
Polyphase Filter Bank[used in all of the MPEG-1 encoders]
Signal is separated into subbands, the widthsof which are equal over the entire frequencyrange
The resulting subband signals are
downsampled to create shorter signals (whichare later reconstructed during decodingprocess)
-
7/27/2019 Kolesnik Audio Compression
17/26
17
Filter Bank Structures
Wavelet Filter Bank[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower forhigher frequencies)
This allows for better time resolution (ex. shortattacks), but at expense of frequencyresolution
-
7/27/2019 Kolesnik Audio Compression
18/26
18
Noise Allocation
System Task: derive and apply shifted hearingthreshold to the input signalAnything below the threshold doesnt need to be
transmitted
Any noise below the threshold is irrelevant
Frequency component quantization Tradeoff between space and noise
Encoder saves on space by using just enough bits foreach frequency component to keep noise under thethreshold - this is known as noise allocation
-
7/27/2019 Kolesnik Audio Compression
19/26
19
Noise Allocation
Pre-echo In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block afterdecoding
This is avoided by pre-monitoring audio data atencoding stage and separating audio into shorter
blocks in potential pre-echo case This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack(temporal masking)
-
7/27/2019 Kolesnik Audio Compression
20/26
20
Pre-echo Effect
QuickTime and a
TIFF (LZW) decompressorare needed to see this picture.
-
7/27/2019 Kolesnik Audio Compression
21/26
21
Additional Encoding Techniques
Other encoding techniques techniques are
available (alternative or in combination)
Predictive Coding
Coupling / Delta Encoding
Huffman Encoding
-
7/27/2019 Kolesnik Audio Compression
22/26
22
Additional Encoding Techniques
Predictive CodingOften used in speech and image compression
Estimates the expected value for each sample basedon previous sample values
Transmits/stores the difference between the expectedand received value
Generates an estimate for the next sample and then
adjusts it by the difference stored for the currentsample
Used for additional compression in MPEG2 AAC
-
7/27/2019 Kolesnik Audio Compression
23/26
23
Additional Encoding Techniques
Coupling / Delta encoding
Used in cases where audio signal consists of two or
more channels (stereo or surround sound) Similarities between channels are used for
compression
A sum and difference between two channels are
derived; difference is usually some value close tozero and therefore requires less space to encode
This is a case of lossless encoding process
-
7/27/2019 Kolesnik Audio Compression
24/26
24
Additional Encoding Techniques
Huffman Coding Information-theory-based technique
An element of a signal that often reoccurs in thesignal is represented by a simpler symbol, and itsvalue is stored in a look-up table
Implemented using a look-up tables in encoder and indecoder
Provides substantial lossless compression, butrequires high computational power and therefore isnot very popular
Used by MPEG1 and MPEG2 AAC
-
7/27/2019 Kolesnik Audio Compression
25/26
25
Encoding - Final Stages
Audio data packed into frames
Frames stored or transmitted
-
7/27/2019 Kolesnik Audio Compression
26/26
26
Conclusion
HTML Bibliography
http://www.music.mcgill.ca/~pkoles
Questions
http://www.music.mcgill.ca/~pkoleshttp://www.music.mcgill.ca/~pkoles