Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 11
ECEC 453Image Processing Architecture
Lecture 11, 2/19/2004
MPEG and FriendsOleh Tretiak
Drexel University
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 11
Lecture Outline Basic Video Coding Features of MPEG-1 Features of H261 MPEG-2 Introduction to MPEG-4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 3Lecture 11
Picture of LayersGOP-1GOP-NGOP-2IBBPBB ... PSlice-1Slice-NSlice-2Sequence LayerGOP layerPicture layermb-1mb-2mb-n012333YCrCbSlice layerMacroblock layerBlock layer
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 11
Video Compression: Picture Types
Group of Pictures: Three types I — intraframe coding only P — predictive coding B — bi-directional coding
IPB12345678
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 11
Typical MPEG coding parameters Typical sequence
IPBBPBBPBBPBBPBB (16 frames)
Picture Average size
Comp-ression
I 156000 6.5P 62000 16.4B 15000 67.6
Compression (GOP) = BitsPerFrameU ×NFramesPerGOP
BitsPerCodedGOPBitsPerCodedGOP=NI frames×(Bits/ Iframe)+NPframes×(Bits/Pframe)+
+NBframes×(Bits/Bframe)
Bits / Iframe =BitsPerFrameU/CI , Bits/ Pframe=BitsPerFrameU/CP
Bits /Bframe=BitsPerFrameU/CB
Compression (GOP) = NFramesPerGOP
NIframes / CI + NPframes /CP +NBframes/CB
= 161/ 6.5 + 5 / 16 .4 +10 / 67 .6
=26.4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 11
MPEG2 features Schemes for ‘frame’ and field coding. There are two fields in a frame, T (top) B (bottom) Either can be first
Frame prediction for frame pictures What’s there to say?
Field prediction for field pictures Target macroblock is in one field Prediction pixels come from one field Can be the same of different parity as target field
Field prediction for frame pictures Dual prime for P-pictures 16x8 macroblock for field pictures
Motion vectors coded at half-pel resolution
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 11
MPEG2 - Alternate Scan
Zig-zag scan Alternate scan
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 11
MPEG-4Multimedia Standard
Thumbnail Description
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 11
What Is Left for MPEG-4? Initial goals
Coding standards for lower-than-MPEG-1 rates Hidden agenda: Incorporate new coding methods
Wavelet, fractal Revised agenda: Object-based coding
MPEG-4 Architecture Input to coder consist of audio, video, and stored objects Decoder combines encoded objects with local objects Example: send text by sending character codes, receiver uses
character generator.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 10Lecture 11
EncoderStoredObjects
Muxand
Demux
Audio-VideoObjects
Muxand
Demux
DecoderStoredObjectsCompositor
Schematic Overview of MPEG-4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 11
MPEG-4 Ideas Video Object Plane (VOP)
A VOP can be a natural image from video camera or from a graphics database
A VOP can consist of several visual object. Visual objects do not have to have rectangular outline (arbitrary shape)
A scene consists of several VO’s and VOP’s with appropriate compositing
Different VOP’s can have their own motion In principle, a visual scene can be decomposed into video
objects by segmentation. Color and texture can be attributes of visual objects A viewer can manipulate VO’s.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 11
Animation Objects Facial animation Body animation 2-D animation meshes
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 11
2-D Animation Mesh
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 11
Sprite coding
Background Plane
Sprite
Sprite
Composite
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 11
Teleconferencing Standards Digital video areas
Broadcast television Recorded programs Two-way communications
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 11
Review: Action in the Video Arena The sponsors: ITU/T SG 15 and ISO/IEC MPEG The players: H.x standards and MPEG-x standards Standards, ITU-T (Telecom Guys)
H.261 (1990) H.263 (draft March 1995) New standards in the works
Standards, ISO/IEC (Entertainment Video) MPEG family
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 11
Review: Video Telephone System
H.320
H.200/AV.250 -Series
H.221H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 11
Review: H.261 Features Common Interchange Format
Interoperability between 25 fps and 30 fps countries 252 pix/line, 288 line, 30 fps noninterlace Terminal equipment converts frame and line numbers Y Cb Cr components, color sub-sampled by a factor of 2 in both
directions Coding
DCT, 8x8, 4 Y and 2 chrominance per masterblock I and P frames only, P blocks can be skipped Motion compensation optional, only integer compensation (Optional) forward error correction coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 11
H.324/H.263 H.324: Like H.320
H.261/H.263
G.723.1
H.245signaling
H.253, H.234encryption
H.223
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 11
Parts of H.324 H.263: Video coding for low rate communications G.723.1: Audio and speech for multimedia, 5.3 and 6.3 kbps H.223: Multiplexing protocol H.245: Control protocol. Can be used to specify standard, LAN,
and ATM networks
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 11
Features of H.263 Intended for lower rates than H.261, including 28.8 kbit/sec
modem Includes QCIF(176 x144) and sub-QCIF format (128 x 96 in Y
channel) Optional error correction for mobile channels Half-pixel accuracy motion compensation Differential encoding of motion vectors Improved coding of DCT coefficients Optional advanced coding options
better SNR at the same rate, lower rate at the same SNR 50% more complex than basic H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 11
Picture Formats for H.263
Image Size
Format Y Cb, Cr
sub-QCIF 128 x 96 64 x 48
QCIF 176 x 144 88 x 72
CIF 352 x 288 176 x 144
ACIF 704 x 576 352 x 288
16CIF 1408 x 1152 704 x 576
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 11
All JPEG, ~ 12 Kbytes551x369 389x261
231x155327x219
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 11
Experimental Procedure Original image subsampled (using ® Photoshop) to various
resolutions (pixel number from max to max/8) Each subsampled image JPEG coded to various quality levels
with ® Matlab A group of images with ~ 12 Kbytes per image is compared Result: Subsampling + JPEG coding is better, at given total bits,
than just JPEG coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 11
Future of Low-Rate Video Solution looking for a user ‘Picturephone’ - not popular
Liked by inventors, surveys of the public less then enthusiastic Videoconferencing: some success, but limited acceptance What is needed to make it successful?
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 11
Video Coding Trials MPEG-1 encoder
http://bmrc.berkeley.edu/frame/research/mpeg/mpeg_encode.html Set encoder parameters
Picture sequence Motion compensation search range Motion compensation algorithm Quantizer parameters for I, P, B
Three trials ibbpbbpbbp I=8, P=10, B=25 795096 ibbpbbpbbp 31 31 31 311856 ippppppppp 31 31 31 209952
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 11
‘High quality’ PATTERN: ibbpbbpbbp
RANGE: +/-10, HALF PSEARCH: LOGARITHMIC, BSEARCH: CROSS2 QSCALE: I=8, P=10, B=25
I FRAME SUMMARY Blocks: 330 ( 94083 bits) ( 285 bpb) Compression: 21:1 ( 1.1150 bpp)
P FRAME SUMMARY I Blocks: 89 ( 19554 bits) ( 219 bpb) P Blocks: 890 (111443 bits) ( 125 bpb) Skipped: 11 Compression: 46:1 ( 0.5182 bpp)
B FRAME SUMMARY I Blocks: 1 ( 148 bits) ( 148 bpb), B Blocks: 1883 ( 38486 bits) ( 20 bpb) B types: 173 ( 14 bpb) forw 291 ( 15 bpb) back 1419 ( 22 bpb) bi Skipped: 96 Compression: 309:1 ( 0.0775 bpp)
Total Compression: 76:1 ( 0.3137 bpp) 795096 bits/sec @ 30 fps
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 11
Show MPEG
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 11
‘Low Quality’ QSCALE: 31 31 31 Compression 195:1 ( 0.1230 bpp) Total Frames Per Second: 0.714286 (235 mi per frame) CPU Time: 1.388889 fps (458 mips) Total Output Bit Rate (30 fps): 311856 bits/sec
Show movie
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 31Lecture 11
All P frames Sequence: ippppppppp QSCALE: 31 31 31 P FRAME SUMMARY
I Blocks: 75 ( 6641 bits) ( 88 bpb) P Blocks: 1559 ( 39500 bits) ( 25 bpb) Skipped: 1336
Total Compression: 289:1 ( 0.0828 bpp) Total Frames Per Second: 1.428571 (471 mi/frame) CPU Time: 2.702703 fps (891 mips) Total Output Bit Rate (30 fps): 209952 bits/sec Show video
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 32Lecture 11
Digital Versatile Disk Digital Video (Versatile?) Disc (DVD) is a medium for the
distribution of from 4.7 to 17 billion bytes of digital data on a 120-mm (4.75 inch) disc. This huge volume of data (today's CD can store 680 million bytes of data) can be used to store up to nine hours of studio quality video and multi-channel surround-sound audio, highly interactive multimedia computer programs, 30 hours of CD-quality audio, or anything else that can be represented as digital data.
Same size as CD (compact disc)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 33Lecture 11
Physical parameters
CD: 1.6 µm track spacing, 0.83 µmbit spacing
DVD: 0.74 µm track spacing, 0.5 µmbit spacing
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 34Lecture 11
DVD: Thickness profile
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 35Lecture 11
Comparison: DVD vs. CD
DVD CD
Diameter 120mm 120 mm
Thickness 0.6 mm 1.2 mm
Track Pitch 0.74 µm 1.6 µm
Minimum Pit Length 0.40 µm 0.834 µm
Laser Wavelength 640 nm 780 nm
Data Capacity (per layer) 4.7 GB .68 GB
Layers 1,2,4 1
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 36Lecture 11
DVD production
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 37Lecture 11
DVD Player to Replace VHS
Estimated productions cost: $3.50 VHS, $1.00 DVD
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 38Lecture 11
Next Generation ‘DVD’ Consortium sets new DVD standard (Blu-Ray) 20 February 2002
By using a 405 nm semiconductor laser, the new video-recording format enables 27 (23?) Gbyte - equivalent to thirteen hours of TV broadcasting - to be contained on a single-sided, single-layer 12 cm DVD.
Increased recording density is achieved using a 0.85 numerical aperture lens in combination with the 405 nm laser. A 0.1 mm optical transmittance protection layer is also used to minimize aberration caused by disc-tilt and give a better readout.
The companies involved are: Hitachi, LG Electronics, Matsushita, Pioneer, Philips, Samsung, Sharp, Sony and Thomson Multimedia. Notably absent from the consortium are Toshiba, one of the first companies to commercialize DVDs, and JVC which has a vested interest in the conventional video format.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 39Lecture 11
News Nine Blu-ray Disc Founder Companies Begin Licensing
of Disc February 14, 2003 (9:32 a.m. EST) /PRNewswire-FirstCall/ -- Hitachi, Ltd., LG Electronics Inc., Matsushita
Electric Industrial Co., Ltd., Pioneer Corporation, Royal Philips Electronics, Samsung Electronics Co. Ltd., Sharp Corporation, Sony Corporation, and Thomson today announced the start of licensing of the rewritable format of "Blu-ray Disc", the large capacity optical disc utilizing blue-violet laser. Licensing will commence as of February 17, 2003. The introduction of products based on "Blu-ray Disc", the first optical disc format capable of recording High Definition broadcasts, will enable the enjoyment of even greater picture quality within the home.
http://www.blu-ray.com/
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 40Lecture 11
DVD(?) Format War HD DVD
HD-DVD, also known as AOD (Advanced Optical Disc) is the name of a competing next-generation optical disc format developed by Toshiba and NEC. The format is similar to Blu-ray and also utilizes blue-laser technology to achieve a higher storage capacity. The rewritable versions of the discs will be able to hold 20GB on a single-layer disc and 32GB on a dual-layer disc, while the read-only discs only will be able to hold 15GB on a single-layer disc and 30GB on a dual-layer disc. The read-only version of the format has been approved by the DVD Forum as the successor to the current DVD technology.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 41Lecture 11
Comparison
Parameters BD DVD HD-DVD
Recording capacity 27GB 4.7GB 20GB
Number of layers single-layer single-layer single-layer
Laser wavelength 405nm 650nm 405nm
Numerical aperture (NA) 0.85 0.60 0.65
Protection layer 0.1mm 0.6mm 0.6mm
Data transfer rate 36Mbps 11Mbps 36Mbps
Video compression MPEG-2 MPEG-2 MPEG-4 AVC
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 42Lecture 11
Red Laser Pixonics Inc. Backward-compatible technology (new disc plays on
standard DVD palyer). Pixonics boasts that 3.5 hours of high-definition
programming can be stored on a DVD-9 disc with a 9 gigabyte capacity.