1 audio/video compression 4 zlecture 3: multimedia networks zlecture 4: audio/video compression...
TRANSCRIPT
1
Audio/Video Compression
4
Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression Image & Video Compression Standards Speech & Audio Compression Standards Wavelet Transform & its Application in Compression
2
Introduction to Audio/Video Compression
4
With today’s technology, only compression makes storage/transmission of digital audio/video streams possible
Redundancy exploitation for compression based on human perceptive features
3
Introduction to Audio/Video Compression
4
Spatial redundancy: Values of neighboring pixels strongly correlated in natural images
Temporal redundancy: Adjacent frames in a video sequence often show very little change, a strong audio signal in a given time segment can mask certain lower level distortion in future & past segments
4
Introduction to Audio/Video Compression
4
Spectral redundancy: In multispectral images, spectral values of same pixel across spectral bands correlated, an audio signal can completely mask a sufficiently weaker signal in its frequency-vicinity
Redundancy across scale: Distinct image features invariant under scaling
Redundancy in stereo: Correlations between stereo images/audio channels
5
Introduction to Audio/Video Compression
4
Spatial/spectral redundancies: Transform Coding Temporal redundancy: DPCM (differential pulse code
modulation), motion estimation/motion compensation First compression methods: lossless
Huffman coding Ziv-Lempel coding Arithmetic coding
Inadequate for transmission media of low bandwidth (e.g., ISDN) or for devices of low data throughput (e.g., CD-ROM)
6
Introduction to Audio/Video Compression
4
Lossless vs. lossy compression Intraframe vs. interframe compression Symmetrical vs. asymmetrical compression Real-time: Encoding-decoding delay<=50 ms Scalable: Frames coded at different resolutions or quality
levels Recent advanced compression methods reduce bandwidths
enormously without reduction of perceptive quality
7
Introduction to Audio/Video Compression
4
Entropy coding: Arithmetic coding, Huffman coding, Run-length coding
Source coding: DPCM, DCT, DWT, motion-estimation/motion compensation
Hybrid Coding: H.261, H.263, H.263+, JPEG, MPEG1, MPEG2, MPEG4, Perceptual Audio Coder
PreprocessingSourcecoding
Entropycoding
Uncompressed data
Hybrid coding = source coding + entropy coding
Compressed data
8
Wavelet Theory
4
A unified framework for analysis of non-stationary signals Wavelet transform (WT): Alternative to classical Short-Time
Fourier Transform (STFT) or Gabor Transform By contrast to STFT, WT does “constant-Q” or relative
bandwidth frequency analysis: short windows at high frequencies and long windows at low frequencies
9
Short-Time Fourier Transform
4
Fourier Transform (FT):
X(f): Projection of signal x(t) along exp(j2ft) How signal energy being distributed over frequencies
)2exp(),()2exp()()( ftjtxdtftjtxfX
10
Short-Time Fourier Transform
4
To know local energy distribution, STFT is introduced:
g(t): A window of finite support Around local time , how signal energy being distributed
over frequencies
dtftjtgtxfSTFT )2exp()(*)(),(
11
Short-Time Fourier Transform
4
Given f, STFT(,): Output of a bandpass filter having the window function (modulated to f) as its impulse response
Resolution in time/frequency by window g(t):
2
22
2
2
22
2
|)(|
|)(|
|)(|
|)(|
fG
fGff
tg
tgtt
12
Short-Time Fourier Transform
4
Uncertainty Principle (Heisenberg):
Once window g(t) chosen, resolution in time/frequency fixed
41
ft
13
Continuous Wavelet Transform (CWT)
4
If can be kept constant, resolution in frequency becomes arbitrarily good at low frequencies while resolution in time becomes arbitrarily good at high frequencies
CWT follows the above idea but all impulse responses of filter bank are defined as scaled versions of the same prototype or basic wavelet h(t)
14
Continuous Wavelet Transform (CWT)
4
Let
h(t): Any bandpass function
frequency) torelated (somehow scale :0a
)(1
)(
a
th
atha
dta
thtx
a
dtthtxaCWT a
)()(1
)()(),(
*
*
15
Continuous Wavelet Transform (CWT)
4
FT of ha(t):
)(
)2exp()(
)2exp()(1
)2exp()()(
afHa
dfajha
dtftja
th
a
dtftjthfH aa
16
Continuous Wavelet Transform (CWT)
4
Resolution in frequency of ha(t):
22
2
22
2
2
22
2
22
2
1
)(
)(1
)(
)(
)(
)(
ca
dffH
dffHf
a
dfafH
dfafHf
dffH
dffHff
a
a
17
Continuous Wavelet Transform (CWT)
4
Given a fixed frequency f0, if scale a is chosen as
const
Then
0
0
f
c
af
c
f
f
faf
18
Continuous Wavelet Transform (CWT)
4
By definition of CWT:
Scale a not linked to frequency modulation but related to time-scaling
dta
thatxa
dta
thtx
a
dtthtxaCWT a
)()(
)()(1
)()(),(
*
*
*
19
Continuous Wavelet Transform (CWT)
4
Signal x(at) seen through a constant length filter centered at /a
Larger scale a is, more contracted signal x(t) becomes Smaller scale a is, more dilated signal x(t) becomes Larger scales: CWT(,a) provides more global view of signal
x(t) Smaller scales: CWT(,a) provides more detailed view of
signal x(t)
20
Continuous Wavelet Transform (CWT)
4
Define wavelet ha,
:Inner product or correlation between x(t) and ha,
CWT(,a) called analysis stage (of signal x(t)) at scale a
)(1
)(, a
th
atha
,,),( ahxaCWT
21
Continuous Wavelet Transform (CWT)
4
x(t) can be recovered from multi-scale analysis if
02, )(),()(
: 0)( and
a
a a
dadthaCWTctx
dtthh
dffH
dffHfc 2
22
)(
)(
22
Continuous Wavelet Transform (CWT)
4
Energy conservation:
Signal energy distributed at scale a by:
: wavelet spectrogram, or scalogram, distribution of signal energy in time-scale plane (associated with area measure )
0
2
22),(
a a
dadaCWTx
2
2),(a
daCWT
2),( aCWT
2a
dad
23
Continuous Wavelet Transform (CWT)
4
Larger scales more global view courser resolutions Smaller scales more detailed view finer resolutions CWT decomposition of signal over scales signal energy
distribution with various resolutions
24
Discrete Wavelet Transform (DWT)
4
Two methods developed independently in late 70’s and early 80’s
Subband Coding Pyramid Coding or multiresolution signal analysis
25
Multiresolution Pyramid 4
Given an original sequence x(n), n Z, define a lower resolution signal:
k
knxkg
nxgny
)2()(
)2)(()(
Where g(n) : a halfband lowpass filter
26
Multiresolution Pyramid 4
An approximation of x(n) from y(n) :
Where y’(2n) = y(n), y’(2n+1) = 0g’(n) : an interpolative filter
k
knykg
nygna
)()(
))(()(
27
Multiresolution Pyramid 4
If g(n) and g’(n) are perfect halfband filters, i.e.,
then a(n) provides a perfect halfband lowpass approximation to x(n)
,2/2/0
,2/||1
)()(
or
eGeG jj
28
Multiresolution Pyramid 4
It can be proved :
2/or 2/ ,0
2/|| ),()(
)()()(
)]()()()([2/1)(2
2/12/12/12/1
jj eXeA
zGzYzA
zXzGzXzGzY
29
Multiresolution Pyramid 4
Letd(n) = x(n) - a(n)
Then x(n) = a(n) +d(n)
But redundancy between a(n) and d(n) : If x(n) uses sampling rate fs , d(n) and y(n)
use sampling rate fs or fs /2, respectively
30
Multiresolution Pyramid 4
Pyramid decomposition : a redundant representation
But redundancy upper bounded by :1 + 1/2 + 1/4 + … < 2 in one dimensional
systemx(n) y(n) y (n)
d(n) d (n)
1
1
31
Multiresolution Pyramid 4
For perfect halfband lowpass filters g(n) and g’(n),it is clear that d(n) contains frequencies above /2 of x(n), and thus can also be subsampled by two without loss of information.
In a pyramid, it is possible to take very good lowpass filters and derive visually pleasing course versions
In a subband scheme, critical sampling is accomplished at a price of a constraint filter design and a relatively poor lowpass version as a course approximation : undesirable if the course version is used for viewing in a compatible subchannel
32
Subband Coding 4
One stage of a pyramid decomposition a half rate low resolution signal +a full rate difference signal
# (samples) increased by 50% If filter g(n) and g’(n) meet certain conditions,
oversampling can be avoided Subband coding first popularized in speech
compression does not produce such redundancy
33
Subband Coding 4
A full-band one dimensional signal is decomposed into two subbands using an analysis filter bank
Ideally, the analysis filter bank consists of a lowpass filter and a highpass filter with nonoverlapping frequency responses and unit gain over their respective bandwidth
After filtering, lowpass and highpass signals each have only a half of original bandwidth or “frequency content”, and thus can be downsampled in half
But ideal filters are unrealizable
34
Subband Coding 4
By using overlapping responses, frequency gaps in subband signals can be prevented
Aliasing will be introduced when lowpass and highpass signals are downsampled in half
The aliasing effect can be eliminated to produce perfect reconstruction at synthesis stage
Lowpass and highpass signals will each have a bandwidth more than a half of original bandwidth
Quadrature Mirror Filters (QMF) for analysis/synthesis filtering
35
Subband Coding 4
Output signals from analysis bank after downsampling:y1(n)=(h1*x)(2n)
y2(n)=(h2*x)(2n)
After quantization, y1(n) and y2(n)
After upsampling, become:
)]()()()([)(
)]()()()([)(
21
21
21
21
21
21
21
21
2221
2
1121
1
zXzHzXzHzY
zXzHzXzHzY
)(ˆ and )(ˆ 21 nyny
)(ˆ and )(ˆ 21 nyny
0)12(~ ),(ˆ)2(~0)12(~ ),(ˆ)2(~
222
111
nynyny
nynyny
36
Subband Coding 4
Output signals from synthesis bank:
Reconstructed signal:
))(~( ),)(~( 2211 ngyngy
)()(ˆ)()(ˆ
)()(~
)()(~
)(ˆ
))(~())(~()(ˆ
22
212
1
2211
2211
zGzYzGzY
zGzYzGzYnX
ngyngynx
37
Subband Coding 4
Ignoring quantization or coding effect,
If H1(z), G1(z) are ideal lowpass filters and H2(z), G2(z) are ideal highpass filters,
)()]()()()([
)()]()()()([
)()()()()(ˆ
)()(ˆ ),()(ˆ
221121
221121
22
212
1
2211
zXzGzHzGzH
zXzGzHzGzH
zGzYzGzYzX
zYzYzYzY
) ( 0)()(
) ( 1)()(
211
211
jj
jj
eGeH
eGeH
38
Subband Coding 4
Then
) ( 1)()(
) ( 0)()(
222
222
jj
jj
eGeH
eGeH
)( 0)()()()(
)( 1)()()()(
2211
2211
jjjj
jjjj
eGeHeGeH
eGeHeGeH
39
Subband Coding 4
Implying
Indicating is the aliasing component when filters are not ideal, which is desired to be zero
)()(ˆ
0)()()()(
1)()()()(
21
2211
2211
zXzX
zGzHzGzH
zGzHzGzH
)()]()()()([ 221121 zXzGzHzGzH
40
Subband Coding 4
To have perfect reconstruction in non-ideal filtering case, the iff conditions are:
If H2(z)=H1(-z), G1(z)=2H1(z), G2(z)=-2H1(-z), the aliased term becomes zero and the reconstructed is given:
0)()()()(
2)()()()(
2211
2211
zGzHzGzH
zGzHzGzH
)(ˆ zX
)()]()([)()]()([)(ˆ 21
21
22
21 zXzHzHzXzHzHzX
41
Subband Coding 4
For perfect reconstruction, we need
or
Using symmetric linear phase FIR of length N for H1 results in
1)()( 21
21 zHzH
1)()()( 21
21 jjj eHeHeT
)1(2
112
1 )()1()()(
NjjNjj eeHeHeT
42
Subband Coding 4
As N=even,
QMF filters
QMFfor 1)(
)()()( )1(2
1
2
1
j
Njjjj
eT
eeHeHeT
/20
1)(1
jeH )(2jeH
43
Subband Coding 4
If subband filters Hi(z), Gi(z) satisfy three conditions
perfect reconstruction results, too Aliased term
2)()()()(
),()(
2,1 ),()(
2211
112
1
zGzHzGzH
zzHzH
izGzH ii
0
)()()()(
)()()()(
111
11
11
122
111
zHzzHzHzH
zHzHzHzH
z
44
Multiresolution Wavelet Representation and Approximation
4
Embedded linear spaces in L2(R):
Let Aj be an orthogonal projection on Vj:
Let Oj be the orthonormal complement of Vj in Vj+1 :
)(, 21 RLVVj jj
||||||||),()(,)( 2 ffAfgRLxfVxg
AAA
jj
jjj
1 j j jV O V
45
Multiresolution Wavelet Representation and Approximation
4
Let Dj be an orthogonal projection on Oj :
Then an original signal A0f can be decomposed as:
||||||||),()(,)( 2 ffDfgRLxfOxg
DDD
jj
jjj
fDfDfA
fDfDfA
fDfAfA
JJ 1
122
110
46
Multiresolution Wavelet Representation and Approximation
4
A-J f = the orthogonal projection of A0f on
D-j f = the orthogonal projection of A0f on O-j
D-j f and D-k f : orthogonal to each other or uncorrelated to each other
D-j f : orthogonal to A-J f , or uncorrelated to A-J f
A-J f : a coarse version of A0f
: details of A0f arranged from coarser to finer
0VV J
)( kj
)0( Jj
fDfD J 1,,
47
Multiresolution Wavelet Representation and Approximation
4
Let be an orthonormal basis of Vj :
Aj f can be characterized by the coefficients of orthonormal expansion:
The sequence denoted by and called a discrete approximation of f in Vj
}:)({ , Znxnj
n
njnjj ffARLf ,,2 ,),(
}:,{ , Znf nj fAdj
48
Multiresolution Wavelet Representation and Approximation
4
Let be an orthonormal basis of Oj
Dj f characterized by the coefficients
The sequence denoted by and called a discrete approximation of f in Oj
}:)({ , Znxnj
n
njnjj ffD ,,,
}:,{ , Znf nj fDd
j
49
Multiresolution Wavelet Representation and Approximation
4
Thus, A0f can be characterized by
can be further characterized by
This set of discrete signals is called orthogonal “wavelet” representation
is organized as a coarse version added by increasing fine details
The orthogonal representation: decorrelated representation
fAd0fAd0
},,,{ 1 fDfDfA ddJ
dJ
fAd0
50
Multiresolution Wavelet Representation and Approximation
4
If we require:
Aj f is band-limited such that it can be sampled by a rate of 2j, i.e., 2j samples per time or length unit
1)2()( jj VxfiffVxf
)2( xf
)(xf
5.0 1
51
Multiresolution Wavelet Representation and Approximation
4
Translation invariant with A0:
Translation invariant with produced by
))(())((
)()(let ),(,
00
2
kxfAxgA
kxfxgRLfZk
dA0 }:)({ ,0 Znxn
knn fg
kxfxgRLfZk
,0,0
2
,,
)()(let ),(,
52
Multiresolution Wavelet Representation and Approximation
4
Then ‘s can be constructed by a scaling function Furthermore, let
then
nj , 0,0 )()(,0 nxxn
)2(2)(, nxx jjnj
)()(~
)(),()( 121
nhnh
nxxnh
)2)(~
())((
,)2(~
,
1
,1,
nfAhnfA
fknhf
dj
dj
kkjnj
53
Multiresolution Wavelet Representation and Approximation
4
filtered by and downsampled by two
Let
fAfA dj
dj 1 h
~
1)()(
)()( ,1)0(
then
)()(
22
12
HH
nhH
enhH
n
n
jn
54
Multiresolution Wavelet Representation and Approximation
4
Let then ‘s can be constructed by
Let then
)()( 222
Hej
)2(2)(
)()(
,
,0
nxx
nxx
jjnj
n
nj ,
)()(~ ,)(),()( 121 ngngnxxng
)2(~)( 1 nfAgnfD dj
dj
55
Multiresolution Wavelet Representation and Approximation
4
filtered by and downsampled by two
From
H,G: Quadrature Mirror Filters
fAfD dj
dj 1 g~
)1()1()(
),()(1
222
nhng
Hen
j
56
Multiresolution Wavelet Representation and Approximation
4
)2)(~()2)(~
(
))()(2(~))()(2(~
))()(2())()(2(
,)2(,)2(,))((
)2()2(
)2(,,
)2(,,
,,,11
,,,1
2,01,,1
2,01,,1
nfDgnfAh
kfDkngkfAknh
kfDnkgkfAnkh
fnkgfnkhfnfA
nkgnkh
nkg
nkh
dj
dj
k
dj
k
dj
k
dj
k
dj
kjk
kjk
njdj
k kkjkjnj
nkkjnj
nkkjnj
57
Multiresolution Wavelet Representation and Approximation
4
)1()1()(
)()(,)(),()(
)()(,)(),()(
)()(~)()(
~)2)(~()2)(
~())((
)2)(~())((
)2)(~
())((
1
221
11
221
11
1
1
1
nhng
xnxxng
xnxxnh
ngng
nhnh
nfDgnfAhnfA
nfAgnfD
nfAhnfA
n
x
x
dj
dj
dj
dj
dj
dj
dj
58
Multiresolution Wavelet Representation and Approximation
4
)()(~
)()(~
)()(
1
1
1
zGzG
zHzH
zzHzG
Think of: Then, analysis stage for subband or wavelet
decomposition is the same Higher resolution signal Two low resolution signals
through filtering by and downsampling by two
ghhh ~,~
21
)~,~
(or , 21 ghhh
59
Multiresolution Wavelet Representation and Approximation
4
Synthesis stage for subband or wavelet decomposition is different
For subband: low resolution signals upsampled by two, followed by filtering by , followed by summation to reconstruct higher resolution signal
For wavelet: low resolution signals filtered by the same , and downsampled by two, followed by summation to reconstruct higher resolution signal
)()(),()( 2211 nhngnhng
h~
g~
60
Multiresolution Wavelet Representation and Approximation
4
After filtering at analysis stage, two produced signals have only a half resolution as the original signal
Downsampling by two is justifiable Before filtering at synthesis stage, upsampling by
two on two low resolution signals in subband decomposition seems not well justifiable
61
Multiresolution Wavelet Representation and Approximation
4
h
jjnj
x
nn
nn
nh
nxxnh
nxxx
VB
cnxcV
mndxmxnx
ZnnxB
)( Assume
)(),()(Let
)2(2)(),()(Let
for basis lorthonormaan is Then,
with :)(Let
)()()(
:lorthonorma be :)(Let
2
1
,221
1
00
20
0
62
Multiresolution Wavelet Representation and Approximation
4
1,
2
1
221
221
1,1,
1
2,
,,
,
)( Thus
)(But
)2()(),(
))2(()()()(
)2()2(22,
:
for basis lorthonormaan is Then
with :)(Let
)()()(
)2()2(2)()(
:lorthonorma is :)(Then
jnj
m
wu
jjjmjnj
jj
jj
nn
nnjnj
jjjmjnj
njj
Vx
nmh
nmhmunu
dwnmwdumun
dxmxnx
VV
VB
cxcV
mndumunu
dxmxnxdxxx
ZnxB
63
Multiresolution Wavelet Representation and Approximation
4
)2)(~
())((Then
)()(~
Let
,)2(
,,
,,,
:,
,
1
,1
,1,1,
,1,1,,
,
,,
nfAhnfA
nhnh
fnkh
f
ff
ZnffA
ffA
dj
dj
kjk
kkjkjnj
kjk
kjnjnj
njdj
nnjnjj