ispass 2004 © 2004 marilyn wolf multimedia algorithms marilyn wolf dept. of electrical engr....
TRANSCRIPT
ISPASS 2004
© 2004 Marilyn Wolf
Multimedia Algorithms
Marilyn WolfDept. of Electrical Engr.Princeton University
ISPASS 2004
© 2004 Marilyn Wolf
Outline
Compact disc player.Video compression.
ISPASS 2004
© 2004 Marilyn Wolf
The multimedia processing funnel
Datavolume
Dataabstraction
pixel processing
principal component analysis,hidden Markov models
Edge extraction
ISPASS 2004
© 2004 Marilyn Wolf
CD/MP3 player
AudioCPU
amp
Jogmemory
Errorcorrector
ServoCPU
Analogin
Analogout
FE, TE, amp
focus,tracking,sled,motor
head
drive
memory
memory
display
DAC
I2S
ISPASS 2004
© 2004 Marilyn Wolf
CD medium
Rotational speed: 1.2-1.4 m/s (CLV).Track pitch: 1.6 microns.Diameter: 120 mm.Pit length: 0.8 -3 microns.Pit depth: .11 microns.Pit width: 0.5 microns.Laser wavelength: 780 nm.
ISPASS 2004
© 2004 Wayne Wolf
CD mechanism
Laser, lens, sled:
lase
r
CD
detectorsdiffraction
gratingsled
track
track
focus
ISPASS 2004
© 2004 Marilyn Wolf
Laser focus
Focus controlled by vertical position of lens.
Unfocused beam causes irregular spot:
In focusOut of focus Out of focus
ISPASS 2004
© 2004 Marilyn Wolf
Laser pickup
A
B
C
D
F
E
Side spotdetectors
Level:A+B+C+DFocus error:(A+C)-(B+D)Tracking error:E-F
ISPASS 2004
© 2004 Marilyn Wolf
Servo control
Four main signals: focus (laser) @ 245 kHz; tracking (laser) @ 245 kHz; sled (motor): @ 800 Hz; Disc motor.
Optical pickup
ISPASS 2004
© 2004 Marilyn Wolf
EFM
Eight-to-fourteen modulation: Fourteen-bit code guarantees a
maximum distance between transitions.
00000011 00100100000000
ISPASS 2004
© 2004 Marilyn Wolf
Error correction
CD capacity: 6.99 GB raw, 700 MB formatted.
Reed-Solomon code: g(x) = (x-) (x- 2) … (x- n-k-1) (x- n-k)
Produces data, erasure bits.Time to solve varies greatly depending on
noise.CD interleaves Reed-Solomon blocks to
reduce effects of large data gaps.
ISPASS 2004
© 2004 Marilyn Wolf
Control and error correction
Skips caused by physical disturbance. Wait for disturbance to subside. Retry.
Read errors caused by disc/servo problems. Detect error. Choose location for retry. Retry. Fail and interpolate.
ISPASS 2004
© 2004 Marilyn Wolf
Audio output
Audio CD output straightforward. May perform D/A filtering in software.
MP3 decode is relatively straightforward. 10% of ARM7.
File system support for data CD is complex: PC/Mac. Arbitrary file structure.
ISPASS 2004
© 2004 Marilyn Wolf
MPEG audio standards
Layer 1: Lossless compression of subbands +
optional simple masking modelLayer 2:
More advanced masking model.Layer 3:
Additional processing for lower bit rates.
ISPASS 2004
© 2004 Marilyn Wolf
MPEG audio rates
Input sampling rates: 32, 44.1, 48 kHz.
Output bit rates: 23, 48, 64, 96, 112, 128, 192, 256, 384
kbits/sec.Output can be mono, dual-channel
(bilingual, etc.), stereo.
ISPASS 2004
© 2004 Marilyn Wolf
Other standards
Dolby Digital (AC-3): Uses modified discrete cosine
transform.ATRAC (MiniDisc):
Uses subband + modified DCT.MPEG-2 AAC.
ISPASS 2004
© 2004 Marilyn Wolf
Software implementations
Many standards with complex code. About 1 million lines of code required to
implement all major standards.Techniques are similar but details
vary. Variations from codec to codec. Parameter changes at run time---
window size, etc.
ISPASS 2004
© 2004 Wayne Wolf
MPEG Layer 1
384 samples/block at all frequencies. Equals 8 ms at 48 kHz.
Optional masking model. Driven by separate FFT for better
accuracy.
ISPASS 2004
© 2004 Wayne Wolf
MPEG Layer 1 data frame
Bit allocation codes specify word length in each subband.
Scale factors give gain for each band.
header CRCbit
allocationscale
factorssubband samples
auxdata
ISPASS 2004
© 2004 Wayne Wolf
MPEG Layer 1 encoder
Filterbank
ChooseScale factor
Maskingmodel
requantize*
FFT
mux
0101..
ISPASS 2004
© 2004 Marilyn Wolf
MPEG Layer 1 decoder
0101..
demux
Scalefactor
* *
Stepsize
Inversefilterbank
inversequantize
expand
ISPASS 2004
© 2004 Marilyn Wolf
MP3
Decoding is easier than encoding, but requires: decompression; filtering.
Basic CD standard for data discs.No standards for MP3 disc file
structure: player must understand Windows, Mac, Unix discs.
ISPASS 2004
© 2004 Marilyn Wolf
Basic principles of MPEG-style compression
Discrete cosine transform (DCT) used to select perceptually significant information from blocks.
Motion estimation identifies temporal redundancy in frames.
Lossless (channel) coding reduces bit rate.
ISPASS 2004
© 2004 Marilyn Wolf
MPEG-style compression engine
motionestimator
+ DCT Qvariablelengthcoder
buffer
Q-1
DCT-1
+
picturestore/
predictor
ISPASS 2004
© 2004 Marilyn Wolf
Spatial frequency in 1D
highlow
ISPASS 2004
© 2004 Marilyn Wolf
DCT
Discrete cosine transform: v(k) = (k) u(t) cos[(2t+1)k/2N]
2-D DCT can be computed from two 1-D DCTs.
1-D DCT can be computed in O(N log N) time.
ISPASS 2004
© 2004 Marilyn Wolf
8-point DCT flowgraph (Lee)
x0
x7
x1
x6
x3
x4
x2
x5
y0
y2
y4
y6
y1
y3
y5
y7
C1
C3
C7
C5
C2
C6
C2
C6
C4
C4
C4
C4
C4
ISPASS 2004
© 2004 Marilyn Wolf
DCT and quantization
DCT Q
ISPASS 2004
© 2004 Wayne Wolf
DCT coefficient quantization
DCT is used to throw out high spatial frequencies in an 8 x 8 block:
33 5 3 1 0 0 0 0
8 6 4 2 0 0 0 0
6
4
2
0
1
0
2 1 0 0 0 0 0
1 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
ISPASS 2004
© 2004 Marilyn Wolf
Channel coding
Lossless encoding is applied to final bit stream to reduce bit rate.
Huffman-style encoding is used: variable-length code for common
symbols; escape code + fixed-length code for
less-common symbols.
ISPASS 2004
© 2004 Marilyn Wolf
Block motion estimation
1
6
3
4 5
2 1
6
3
4 5
2
ISPASS 2004
© 2004 Marilyn Wolf
Motion estmation, cont’d.
Two-dimensional correlation of a 16 x 16 macroblock within search range.
Best fit: abs(pb - ps)
Results in a motion vector which shows displacement of macroblock in search area.
ISPASS 2004
© 2004 Marilyn Wolf
Search process
ISPASS 2004
© 2004 Marilyn Wolf
ODFS and PLS algorithms
ISPASS 2004
© 2004 Marilyn Wolf
CBAS and FE2SS
Center-biased adaptive search Fast and efficient 2 step search
ISPASS 2004
© 2004 Marilyn Wolf
3SS related algorithms
E3SS differs from N3SS in that:1. A small diamond patter is used
instead of a square in the central area
2. Unrestricted search step for the small diamond rather than a single movement for the small square.
3. Test sequences: Coastguard, Football, Salesman, Suzie
4. FS 3SS 4SS N3SS DS E3SS (1) Large search window: 31*31, E3SS
performs better in terms of MSE and search points than any other non-full search algorithms
(2) Small window: 15*15, E3SS is similar like DS and N3SS
ISPASS 2004
© 2004 Marilyn Wolf
4SS related algorithms
4SS1. Three 5*5 search windows and a final 3*3
window. First step uses 9 points. Second/third step uses three or five points. Final step uses 8 points.
2. Smaller search window 5*5 in the first step of 4SS VS 9*9 in 3SS related algorithms.
3. More regular search pattern than N3SS.4. 4SS has similar or worse image quality
than N3SS but less searching points
Other 4SS related algorithms: E4SSAverage Search points:
E4SS<4SS<N3SS<3SSMSE performance is similar like N3SS.
ISPASS 2004
© 2004 Marilyn Wolf
MPEG-1/2 frame types
I
frame t
P
frame t+3
B B
frame t+1 frame t+2