multi media chapter 1_2_3

21
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 1 Multimedia Technology n Overview q Introduction q Chapter 1: Background of compression techniques q Chapter 2: Multimedia technologies n JPEG n MPEG-1/MPEG -2 Audio & Video n MPEG-4 n MPEG-7 (brief introduction) n HDTV (brief introduction) n H261/H263 (brief introduction) n Model base coding (MBC) (brief introduction) q Chapter 3: Some real-world systems n CATV systems n DVB systems q Chapter 4: Multimedia Network 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 2 Introduction n The importance of Multimedia technologies: Multimedia everywhere !! q On PCs: n Real Player, QuickTime, Windows Media. n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg, mov, ra, ram, mid, DIVX, etc) n Video/Audio Conferences. n Webcast / Streaming Applications n Distance Learning (or Tele-Education) n Tele-Medicine n Tele-xxx (Let’s imagine !!) q On TVs and other home electronic devices: n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting Terrestrial/Cable/Satellite) shows MPEG-2 superior quality over traditional analog TV !! n Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !! No need to wait for a PC to startup and shutdown !! n CD/VCD/DVD/Mp3 players q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !! 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 3 Introduction (2) n Multimedia network q The Internet was designed in the 60s for low-speed inter- networks with boring textual applications High delay, high jitter. q Multimedia applications require drastic modifications of the INTERNET infrastructure. q Many frameworks have been being investigated and deployed to support the next generation multimedia Internet . (e.g. IntServ, DiffServ) q In the future, all TVs (and PCs) will be connected to the Internet and freely tuned to any of millions broadcast stations all over the World. q At present, multimedia networks run over ATM (almost obsolete), IPv4, and in the future IPv6 should guarantee QoS (Quality of Service) !! 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 4 Chapter 1: Background of compression techniques n Why compression ? q For communication: reduce bandwidth in multimedia network applications such as Streaming media, Video-on- Demand (VOD), Internet Phone q Digital storage (VCD, DVD, tape, etc) Reduce size & cost, increase media capacity & quality. n Compression factor or compression ratio q Ratio between the source data and the compressed data. (e.g. 10:1) n 2 types of compression: q Lossless compression q Lossy compression

Upload: sushibk

Post on 17-May-2015

390 views

Category:

Documents


0 download

DESCRIPTION

Multimedia Technology n Overview q Introduction q Chapter 1: Background of compression techniques q Chapter 2: Multimedia technologies n JPEG n MPEG-1/MPEG -2 Audio & Video n MPEG-4 n MPEG-7 (brief introduction) n HDTV (brief introduction) n H261/H263 (brief introduction) n Model base coding (MBC) (brief introduction) q Chapter 3: Some real-world systems n CATV systems n DVB systems q Chapter 4: Multimedia Network

TRANSCRIPT

Page 1: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 1

Multimedia Technologyn Overview

q Introductionq Chapter 1: Background of compression techniquesq Chapter 2: Multimedia technologies

n JPEGn MPEG-1/MPEG -2 Audio & Videon MPEG-4n MPEG-7 (brief introduction)n HDTV (brief introduction)n H261/H263 (brief introduction)n Model base coding (MBC) (brief introduction)

q Chapter 3: Some real-world systemsn CATV systemsn DVB systems

q Chapter 4: Multimedia Network

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 2

Introductionn The importance of Multimedia technologies:à Multimedia everywhere !!

q On PCs:n Real Player, QuickTime, Windows Media.n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,

mov, ra, ram, mid, DIVX, etc)n Video/Audio Conferences.n Webcast / Streaming Applicationsn Distance Learning (or Tele-Education)n Tele-Medicinen Tele-xxx (Let’s imagine !!)

q On TVs and other home electronic devices:n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting –

Terrestrial/Cable/Satellite)à shows MPEG-2 superior quality overtraditional analog TV !!

n Interactive TVà Internet applications (Mail, Web, E -commerce) on a TV !!à No need to wait for a PC to startup and shutdown !!

n CD/VCD/DVD/Mp3 playersq Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 3

Introduction (2)n Multimedia network

q The Internet was designed in the 60s for low-speed inter-networks with boring textual applications à High delay,high jitter.

q à Multimedia applications require drastic modificationsof the INTERNET infrastructure.

q Many frameworks have been being investigated anddeployed to support the next generation multimediaInternet. (e.g. IntServ, DiffServ)

q In the future, all TVs (and PCs) will be connected to theInternet and freely tuned to any of millions broadcaststations all over the World.

q At present, multimedia networks run over ATM (almostobsolete), IPv4, and in the future IPv6 à shouldguarantee QoS (Quality of Service) !!

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 4

Chapter 1: Background of compressiontechniquesn Why compression ?

q For communication: reduce bandwidth in multimedianetwork applications such as Streaming media, Video-on-Demand (VOD), Internet Phone

q Digital storage (VCD, DVD, tape, etc)à Reduce size &cost, increase media capacity & quality.

n Compression factor or compression ratioq Ratio between the source data and the compressed data.

(e.g. 10:1)

n 2 types of compression:q Lossless compressionq Lossy compression

Page 2: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 5

Information content and redundancyn Information rate

q Entropy is the measure of information content.n à Expressed in bits/source output unit (such as bits/pixel).

q The more information in the signal, the higher theentropy.

q Lossy compression reduce entropy while losslesscompression does not.

n Redundancyq The difference between the information rate and bit

rate.q Usually the information rate is much less than the bit

rate.q Compression is to eliminate the redundancy.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 6

Lossless Compression

n The data from the decoder is identical to thesource data.q Example: archives resulting from utilities such as

pkzip or Gzipq Compression factor is around 2:1.

n Can not guarantee a fix compression ratioàThe output data rate is variableà problemsfor recoding mechanisms or communicationchannel.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 7

Lossy Compression

n The data from the expander is not identical tothe source data but the difference can not bedistinguished auditorily or visually.q Suitable for audio and video compression.q Compression factor is much higher than that of

lossless. (up to 100:1)

n Based on the understanding ofpsychoacoustic and psychovisual perception.

n Can be forced to operate at a fixedcompression factor.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 8

Process of Compression

n Communication (reduce the cost of the datalink)q Data ? Compressor (coder) ? transmission

channel ? Expander (decoder) ? Data'

n Recording (extend playing time: in proportionto compression factorq Data ? Compressor (coder) ? Storage device

(tape, disk, RAM, etc.) ? Expander (decoder) ? Data‘

Page 3: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 9

Sampling and quantizationn Why sampling?

q Computer can not process analog signal directly.n PCM

q Sample the analog signal at a constant rate anduse a fixed number of bits (usually 8 or 16) torepresent the samples.

q bit rate = sampling rate * number of bits persample

n Quantizationq Map the sampled analog signal (generally, infinite

precision) to discrete level (finite precision).q Represent each discrete level with a number.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 10

Predictive codingn Prediction

q Use previous sample(s) to estimate the currentsample.

q For most signal, the difference of the predictionand actual values is small.àWe can use smallernumber of bits to code the difference whilemaintaining the same accuracy !!

q Noise is completely unpredictablen Most codec requires the data being preprocessed or

otherwise it may perform badly when the data containsnoise.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 11

Statistical coding: the Huffman code

n Assign short code to the most probable datapattern and long code to the less frequentdata pattern.

n Bit assignment based on statistic of thesource data.

n The statistics of the data should be knownprior to the bit assignment.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 12

Drawbacks of compressionn Sensitive to data error

q Compression eliminates the redundancy which is essentialto making data resistant to errors.

n Concealment required for real time applicationq Error correction code is required, hence, adds redundancy

to the compressed data.

n Artifactsq Artifacts appear when the coder eliminates part of the

entropy.

q The higher the compression factor, the more the artifacts.

Page 4: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 13

A coding example: Clustering color pixels

n In an image, pixel values are clustered in severalpeaks

n Each cluster representing the color range of oneobject in the image (e.g. blue sky)

n Coding process:1. Separate the pixel values into a limited number of data

clusters (e.g., clustered pixels of sky blue or grass green)2. Send the average color of each cluster and an

identifying number for each cluster as side information.3. Transmit, for each pixel:n The number of the average cluster color that it is close to.n Its difference from that average cluster color. (à can be

coded to reduce redundancy since the differences are oftensimilar !!)à Prediction

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 14

Frame-Differential Codingn Frame-Differential Coding = prediction from a

previous video frame.n A video frame is stored in the encoder for

comparison with the present frameà causesencoding latency of one frame time.

n For still images:q Data can be sent only for the first instance of a frameq All subsequent prediction error values are zero.q Retransmit the frame occasionally to allow receivers that

have just been turned on to have a starting point.n à FDC reduces the information for still images, but

leaves significant data for moving images (e.g. amovement of the camera)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 15

Motion Compensated Predictionn More data in Frame-Differential Coding can

be eliminated by comparing the presentpixel to the location of the same objectin the previous frame. (à not to thesame spatial location in the previous frame)

n The encoder estimates the motion in theimage to find the corresponding area in aprevious frame.

n The encoder searches for a portion of aprevious frame which is similar to the partof the new frame to be transmitted.

n It then sends (as side information) amotion vector telling the decoder whatportion of the previous frame it will use topredict the new frame.

n It also sends the prediction errorso thatthe exact new frame may be reconstituted

n See top figureà without motioncompensation – Bottom figure à Withmotion compensation

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 16

Unpredictable Information

n Unpredictable information from the previousframe:

1. Scene change (e.g. background landscapechange)

2. Newly uncovered information due to objectmotion across a background, or at the edges of apanned scene. (e.g. a soccer ’s face uncoveredby a flying ball)

Page 5: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 17

Dealing with unpredictable Informationn Scene change

q à An Intra-coded picture (MPEG I picture) must be sentfor a starting point à require more data than Predictedpicture (P picture)

q I pictures are sent about twice per secondàTheir time andsending frequency may be adjusted to accommodatescene changes

n Uncovered informationq Bi-directionally coded type of picture, or B picture. q There must be enough frame storage in the system to wait

for the later picture that has the desired information. q To limit the amount of decoder’s memory, the encoder

stores pictures and sends the required referencepictures before sending the B picture.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 18

Transform Codingn Convert spatial image pixel values to

transform coefficient valuesn à the number of coefficients produced is

equal to the number of pixels transformed.n Few coefficients contain most of the

energy in a picture à coefficients may befurther coded by lossless entropy coding

n The transform process concentrates theenergy into particular coefficients(generally the “low frequency” coefficients )

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 19

Types of picture transform codingn Types of picture coding:

q Discrete Fourier (DFT)q Karhonen-Loeveq Walsh-Hadamardq Lapped orthogonalq Discrete Cosine (DCT) à used in MPEG-2 !q Waveletsà New !

n The differences between transform coding methods:q The degree of concentration of energy in a few coefficientsq The region of influence of each coefficient in the

reconstructed pictureq The appearance and visibility of coding noise due to coarse

quantization of the coefficients

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 20

DCT Lossy Coding

n Lossless coding cannot obtain highcompression ratio (4:1 or less)

n Lossy coding = discard selective informationso that the reproduction is visually or aurallyindistinguishable from the source or havingleast artifacts.

n Lossy coding can be achieved by:q Eliminating some DCT coefficientsq Adjusting the quantizing coarseness of the

coefficients à better !!

Page 6: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 21

Masking

n Masking make certain types of codingnoise invisible or inaudible due to somepsycho-visual/acoustical effect.q In audio, a pure tone will mask energy of higher

frequency and also lower frequency (with weakereffect).

q In video, high contrast edges mask random noise.

n Noise introduced at low bit rates falls in thefrequency, spatial, or temporal regions

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 22

Variable quantizationn Variable quantization is the main technique of lossy

codingà greatly reduce bit rate.n Coarsely quantizing the less significant coefficients

in a transform (à less noticeable / low energy / lessvisible/audible)

n Can be applied to a complete signal or to individualfrequency components of a transformed signal.

n VQ also controls instantaneous bit rate in order to:q Match average bit rate to a constant channel bit rate.

q Prevent buffer overflow or underflow.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 23

Run-Level coding

n "Run-Level" coding = Coding a run-length ofzeros followed by a nonzero level.q à Instead of sending all the zero values

individually, the length of the run is sent.q Useful for any data with long runs of zeros.q Run lengths are easily encoded by Huffman code

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 24

Key points:n Compression processn Quantization & Samplingn Coding:q Lossless & lossy codingq Frame-Differential Codingq Motion Compensated Predictionq Variable quantizationq Run level coding

n Masking

Page 7: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 25

Chapter 2: Multimedia technologiesq Roadmapn JPEG

n MPEG-1/MPEG-2 Videon MPEG-1 Layer 3 Audio (mp3)n MPEG-4n MPEG-7 (brief introduction)n HDTV (brief introduction)

n H261/H263 (brief introduction)n Model base coding (MBC) (brief introduction)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 26

JPEG (Joint Photographic Experts Group)n JPEG encoder

q Partitions image into blocks of 8 * 8 pixelsq Calculates the Discrete Cosine Transform (DCT) of each block.q A quantizer rounds off the DCT coefficients according to the

quantization matrix.à lossy but allows for large compression ratios.q Produces a series of DCT coefficients using Zig-zag scanningq Uses a variable length code (VLC) on these DCT coefficientsq Writes the compressed data stream to an output file (*.jpg or *.jpeg).

n JPEG decoderq File à input data streamà Variable length decoderà IDCT (Inverse

DCT) à Image

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 27

JPEG – Zig-zag scanning

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 28

JPEG - DCTn DCT is similar to the Discrete Fourier Transformà

transforms a signal or image from the spatial domain tothe frequency domain.

n DCT requires less multiplications than DFT

n Input image A:q The input image A is N2 pixels wide by N1 pixels high;q A(i,j) is the intensity of the pixel in row i and column j;

n Output image B:q B(k1,k2) is the DCT coefficient in row k1 and column k2 of

the DCT matrix

Page 8: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 29

JPEG - Quantization Matrixn The quantization matrix is the 8 by 8 matrix of step sizes

(sometimes called quantums) - one element for each DCTcoefficient.

n Usually symmetric.n Step sizes will be:

q Small in the upper left (low frequencies),q Large in the upper right (high frequencies)q A step size of 1 is the most precise.

n The quantizer divides the DCT coefficient by its correspondingquantum, then rounds to the nearest integer.

n Large quantums drive small coefficients down to zero.n The result:

q Many high frequency coefficients become zeroà remove easily.q The low frequency coefficients undergo only minor adjustment.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 30

JPEG Coding process illustrated

1255 -15 43 58 -12 1 -4 -6

11 -65 80 -73 -27 -1 -5 1

-49 37 -87 8 12 6 10 8

27 -50 29 13 3 13 -6 5

-16 21 -11 -10 10 -21 9 -6

3 -14 0 14 -14 16 -8 4

-4 -1 8 -13 12 -9 5 -1

-4 2 -2 6 -7 6 -1 3

78 -1 4 4 -1 0 0 0

1 -5 6 -4 -1 0 0 0

-4 3 -5 0 0 0 0 0

2 -3 1 0 0 0 0 0

-1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Q

Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

à Easily coded by Run-length Huffman coding

DCT Coefficients Quantization result

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 31

MPEG (Moving Picture Expert Group)

n MPEG is the heart of:q Digital television set-top boxesq HDTV decodersq DVD playersq Video conferencingq Internet video, etc

n MPEG standards:q MPEG-1, MPEG-2, MPEG-4, MPEG-7q (MPEG-3 standard was abandoned and became

an extension of MPEG-2)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 32

MPEG standardsn MPEG-1 (Obsolete)

q A standard for storage and retrieval of moving pictures and audioon storage media

q application: VCD (video compact disk)

n MPEG-2 (Widely implemented)q A standard for digital televisionq Applications: DVD (digital versatile disk), HDTV (high definition

TV), DVB (European Digital Video Broadcasting Group), etc.

n MPEG-4 (Newly implemented – still beingresearched)q A standard for multimedia applicationsq Applications: Internet, cable TV, virtual studio, etc.

n MPEG-7 (Future work – ongoing research)q Content representation standard for information search

( “Multimedia Content Description Interface”)q Applications: Internet, video search engine, digital library

Page 9: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 33

MPEG-2 formal standardsn The international standard ISO/IEC 13818-2

"Generic Coding of Moving Pictures andAssociated Audio Information”

n ATSC (Advanced Television SystemsCommittee) document A/54 "Guide to the Use ofthe ATSC Digital Television Standard”

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 34

MPEG video data structuren The MPEG 2 video data stream is constructed in

layers from lowest to highest as follows:q PIXEL is the fundamental unitq BLOCK is an 8 x 8 array of pixelsq MACROBLOCK consists of 4 luma blocks and 2 chroma

blocksq Field DCT Coding and Frame DCT Codingq SLICE consists of a variable number of macroblocksq PICTURE consists of a frame (or field) of slicesq GROUP of PICTURES (GOP) consists of a variable

number of picturesq SEQUENCE consists of a variable number of GOP’sq PACKETIZED ELEMENTARY STREAM (opt)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 35

Pixel & Block

n Pixel = "picture element".q A discrete spatial point sample of an image.q A color pixel may be represented digitally as a

number of bits for each of three primary colorvalues

n Blockq = 8 x 8 array of pixels.q A block is the fundamental unit for the DCT coding

(discrete cosine transform).

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 36

Macroblockn A macroblock = 16 x 16 array of luma (Y) pixels ( =

4 blocks = 2 x 2 block array).n The number of chromapixels (Cr, Cb) will vary

depending on the chroma pixel structureindicated in the sequence header (e.g. 4:2:0, etc)

n The macroblock is the fundamental unit for motioncompensation and will have motion vector(s)associated with it if is predictively coded.

n A macroblock is classified asq Field coded (à An interlaced frame consists of 2 field)q Frame codedà depending on how the four blocks are extracted from the

macroblock.

Page 10: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 37

Slice

n Pictures are divided into slices.n A slice consists of an arbitrary number of

successive macroblocks (going left to right),but is typically an entire row of macroblocks. A slice does not extend beyond one row.

n The slice header carries address informationthat allows the Huffman decoder toresynchronize at slice boundaries

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 38

Picturen A source picture is a contiguous rectangular array of pixels.n A picture may be a complete frame of video ("frame picture") or

one of the interlaced fields from an interlaced source ("fieldpicture").

n A field picture does not have any blank lines between its activelines of pixels.

n A coded picture (also called a video access unit) begins with astart code and a header. The header consists of:q picture type (I, B, P)q temporal reference informationq motion vector search rangeq optional user data

n A frame picture consists of:q a frame of a progressive source orq a frame (2 spatially interlaced fields) of an interlaced source

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 39

I, P, B PicturesEncoded pictures are classified into 3 types: I, P, and B.n I Pictures = Intra Coded Pictures

q All macroblocks coded without predictionq Needed to allow receiver to have a "starting point" for prediction after

a channel change and to recover from errors

n P Pictures = Predicted Picturesq Macroblocks may be coded with forward prediction from references

made from previous I and P pictures or may be intra codedn B Pictures = Bi-directionally predicted pictures

q Macroblocks may be coded with forward prediction from previous Ior P references

q Macroblocks may be coded with backward prediction from next I orP reference

q Macroblocks may be coded with interpolated prediction from pastand future I or P references

q Macroblocks may be intra coded (no prediction)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 40

Group of pictures (GOP)n The group of pictures layer is optional in MPEG-2.n GOP begins with a start code and a headern The header carries

q time code informationq editing informationq optional user data

n First encoded picture in a GOP is always an I picturen Typical length is 15 pictures with the following structure (in display order):

q I B B P B B P B B P B B P B Bà Provides an I picture with sufficientfrequency to allow a decoder to decode correctly

I B B P PB B B B P BTime

Forward motion compensation

Bidirectional motion compensation

Page 11: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 41

Sequencen A sequence begins with a unique 32 bit start code followed by

a header.n The header carries:

q picture sizeq aspect ratioq frame rate and bit rateq optional quantizer matricesq required decoder buffer sizeq chroma pixel structureq optional user data

n The sequence information is needed for channel changing.n The sequence length depends on acceptable channel change

delay.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 42

Packetized Elementary Stream (PES)n Video Elementary Stream (video ES), consists of all

the video data for a sequence, including the sequenceheader and all the subparts of a sequence.

n An ES carries only one type of data (video or audio)from a single video or audio encoder.

n A PES, consists of a single ES which has been splitinto packets, each starting with an added packetheader.

n A PES stream contains only one type of data fromone source, e.g. from one video or audio encoder.

n PES packets have variable length, not correspondingto the fixed packet length of transport packets, andmay be much longer than a transport packet.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 43

Transport streamn Transport packets (fixed length) are formed from a PES stream,

including:q The PES headerq Transport packet header. q Successive transport packet’s payloads are filled by the remaining

PES packet content until the PES packet is all used. q The final transport packet is filled to a fixed length by stuffing with

0xFF bytes (all ones).n Each PES packet header includes:

q An 8-bit stream ID identifying the source of the payload.q Timing references: PTS (presentation time stamp), the time

at which a decoded audio or video access unit is to bepresented by the decoder

q DTS (decoding time stamp) the time at which an access unitis decoded by the decoder

q ESCR (elementary stream clock reference).

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 44

Intra Frame Codingn Intra coding only concern with information within the current

frame, (not relative to any other frame in the video sequence)n MPEG intra-frame coding block diagram (See bottom Fig) à

Similar to JPEG (àLet’s review JPEG coding mechanism !!)n Basic blocks of Intra frame coder:

q Video filterq Discrete cosine transform (DCT)q DCT coefficient quantizerq Run-length amplitude/variable length coder (VLC)

Page 12: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 45

Video Filtern Human Visual System (HVS) is

q Most sensitive to changes in luminance,q Less sensitive to variations in chrominance.

n MPEG uses the YCbCr color space to represent thedata values instead of RGB, where:q Y is the luminance signal ,q Cb is the blue color difference signal,q Cr is the red color difference signal.

n What is “4:4:4”, “4:2:0”, etc, video format ?q 4:4:4 is full bandwidth YCbCr video à each macroblock

consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks àwaste of bandwidth !!

q 4:2:0 is most commonly used in MPEG-2

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 46

Applications of chromaformats

ØComputer graphicsYYYYCbCrCbCrCbCrCbCr4:4:4(12 blocks)

ØStudio productionenvironmentsØProfessional editingequipment,

YYYYCbCrCbCr4:2:2(8 blocks)

ØMain stream television,ØConsumer entertainment.YYYYCbCr4:2:0

(6 blocks)

ApplicationMultiplex order (time)within macroblock

chroma_format

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 47

MPEG Profiles & levelsn MPEG-2 is classified into several profiles.n Main profile features:

q 4:2:0 chroma sampling formatq I, P, and B picturesq Non-scalable

n Main Profile is subdivided into levels.q MP@ML (Main Profile Main Level):

n Designed with CCIR601 standard for interlaced standard digitalvideo.

n 720 x 576 (PAL) or 720 x 483 (NTSC)n 30 Hz progressive, 60 Hz interlacedn Maximum bit rate is 15 Mbits/s

q MP@HL (Main Profile High Level):n Upper bounds:n 1152 x 1920, 60Hz progressiven 80 Mbits/s

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 48

MPEG encoder/ decoder

Page 13: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 49

Predictionn Backward prediction is done by

storing pictures until the desiredanchor picture is available beforeencoding the current stored frames.

n The encoder can decide to use:q Forward prediction from a previous

picture,q Backward prediction from a following

picture,q or Interpolated prediction

à to minimize prediction error.n The encoder must transmit pictures in

an order differ from that of sourcepictures so that the decoder has theanchor pictures before decodingpredicted pictures. (See next slide)

n The decoder must have two framestored.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 50

I P B Picture Reorderingn Pictures are coded and decoded in a different order

than they are displayed.n à Due to bidirectional prediction for B pictures.n For example we have a 12 picture long GOP:n Source order and encoder input order:

q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)B(12) I(13)

n Encoding order and order in the coded bitstream:q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)

B(12)n Decoder output order and display order (same as

input):q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

B(12) I(13)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 51

DCT and IDCT formulasn DCT:

q Eq 1à Normal formq Eq 2à Matrix form

n IDCT:q Eq 3à Normal formq Eq 4à Matrix form

n Where:q F(u,v) = two-dimensional

NxNDCT.q u,v,x,y = 0,1,2,...N-1q x,y are spatial coordinates in

the sample domain.q u,v are frequency coordinates

in the transform domain.q C(u), C(v) = 1/(square root

(2)) for u, v = 0.q C(u), C(v) = 1 otherwise.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 52

DCT versus DFTn The DCT is conceptually similar to the DFT, except:

q DCT concentrates energy into lower order coefficientsbetter than DFT.

q DCT is purely real, the DFT is complex (magnitude andphase).

q A DCT operation on a block of pixels produces coefficientsthat are similar to the frequency domain coefficientsproduced by a DFT operation.n An N-point DCT has the same frequency resolution as a 2N-

point DFT.n The N frequencies of a 2N point DFT correspond to N points

on the upper half of the unit circle in the complex frequencyplane.

q Assuming a periodic input, the magnitude of the DFTcoefficients is spatially invariant (phase of the input doesnot matter). This is not true for the DCT.

Page 14: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 53

Quantization matrixn Note à DCT

coefficients are:q Small in the upper left

(low frequencies),q Large in the upper right

(high frequencies)à Recall the JPEG

mechanism !!n Why ?q HVS is less sensitive

to errors in highfrequency coefficientsthan it is for lowerfrequencies

q à higher frequenciesshould be morecoarsely quantized !!

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 54

Result DCT matrix (example)

n After adaptivequantization, theresult is a matrixcontaining manyzeros.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 55

MPEG scanningn Left à Zigzag scanning (like JPEG)n Right à Alternate scanning à better for interlaced frames !

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 56

Huffman/ Run-Level Codingn Huffman coding in combination with Run-Level

coding and zig-zag scanning is applied toquantized DCT coefficients.

n "Run-Level" = A run-length of zeros followed by anon-zero level.

n Huffman coding is also applied to various types of side information.

n A Huffman code is an entropy code which isoptimally achieves the shortest average possiblecode word length for a source.

n à This average code word length is >= the entropyof the source.

Page 15: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 57

Huffman/ Run-Level coding illustratedn Using the DCT output

matrix in previous slide,after being zigzagscanned à the outputwill be a sequence ofnumber: 4, 4, 2, 2, 2, 1,1, 1, 1, 0 (12 zeros), 1, 0(41 zeros)

n These values are lookedup in a fixed table ofvariable length codesq à The most probable

occurrence is given arelatively short code,

q à The least probableoccurrence is given arelatively long code.

10EOBEOB

0010 0010 0112

11010

11010

11010

11010

0100 020

0100 020

0100 020

0000 110040

0000 110040

110 10008 (DC Value)N/A

MPEGCode ValueAmplitudeZero

Run-Length

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 58

Huffman/ Run-Level coding illustrated (2)

n à The first run of 12 zeroes has been efficientlycoded by only 9 bits

n à The last run of 41 zeroes has been entirelyeliminated, represented only with a 2-bit End OfBlock (EOB) indicator.

n à The quantized DCT coefficients are nowrepresented by a sequence of 61 binary bits (Seethe table).

n Considering that the original 8x8 block of 8-bitpixels required 512 bits for full representation, àthe compression rate is approx. 8,4:1.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 59

MPEG Data Transportn MPEG packages all data into fixed-size 188-byte packets for transport.n Video or audio payload data placed in PES packets before is broken up

into fixed length transport packet payloads.n A PES packet may be much longer than a transport packetà Require

segmentation:q The PES header is placed immediately following a transport headerq Successive portions of the PES packet are then placed in the pay loads of

transport packets.q Remaining space in the final transport packet payload is filled with stuffing

bytes = 0xFF (all ones).q Each transport packet starts with a sync byte = 0x47.q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not

processed, but is replaced by a different sync symbol especially suited to RFtransmission.

q The transport packet header contains a 13-bit PID (packet ID), whichcorresponds to a particular elementary stream of video, audio, o r other programelement.

q PID 0x0000 is reserved for transport packets carrying a program associationtable (PAT).

q The PAT points to a Program Map Table (PMT)à points to particular elementsof a program

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 60

MPEG Transport packet

n Adaptation Field:q 8 bits specifying the length of the

adaptation field.q The first group of flags consists of

eight 1-bit flags:q discontinuity_indicatorq random_access_indicatorq elementary_stream_priority_in

dicator

q PCR_flagq OPCR_flagq splicing_point_flagq transport_private_data_flagq adaptation_field_extension_flagq The optional fields are present if

indicated by one of the preceding flags.q The remainder of the adaptation field is

filled with stuffing bytes (0xFF, allones).

Page 16: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 61

Demultiplexing a Transport Stream (TS)n Demultiplexing a transport stream involves:

1. Finding the PAT by selecting packets with PID = 0x00002. Reading the PIDs for the PMTs3. Reading the PIDs for the elements of a desired program

from its PMT (for example, a basic program will have aPID for audio and a PID for video)

4. Detecting packets with the desired PIDs and routing themto the decoders

q A MPEG-2 transport stream can carry:§ Video stream§ Audio stream§ Any type of dataà MPEG-2 TS is the packet format for CATV downstream

data communication.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 62

Timing & buffer control n Point A:Encoder inputàConstant/specified rate

n Point B:EncoderoutputàVariable rate

n Point C:Encoder bufferoutputàConstant rate

n Point D:Communicationchannel +decoder bufferà Constantrate

n Point E:Decoder inputà Variable rate

n Point F:Decoder outputàConstant/specified rate

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 63

Timing - Synchronizationn The decoder is synchronized with the encoder by time stampsn The encoder contains a master oscillator and counter, called the

System Time Clock (STC). (See previous block diagram.) q à The STC belongs to aparticular program and is the master

clock of the video and audio encoders for that program. q à Multiple programs, each with its own STC, can also be

multiplexed into a single stream.n A program component can even have no time stampsà but

can not be synchronized with other components.n At encoder input, (Point A), the time of occurrence of an input

video picture or audio block is noted by sampling the STC. n A total delay of encoder and decoder buffer (constant) is

added to STC, creating a Presentation Time Stamp (PTS),q à PTS is then inserted in the first of the packet(s) representing

that picture or audio block, at Point B.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 64

Timing – Synchronization (2)n Decode Time Stamp (DTS) can optionally combined into the bit

stream à represents the time at which the data should be takeninstantaneously from the decoder buffer and decoded. q DTS and PTS are identical except in the case of picture reordering for B

pictures. q The DTS is only used where it is needed because of reordering.

Whenever DTS is used, PTS is also coded.q PTS (or DTS) inserted interval = 700 mS. q In ATSCà PTS (or DTS) must be inserted at the beginning of each

coded picture (access unit ).n In addition, the output of the encoder buffer (Point C) is time

stamped with System Time Clock (STC) values, called:q System Clock Reference (SCR) in a Program Stream. q Program Clock Reference (PCR) in a Transport Stream.

n PCR time stamp interval = 100mS. n SCR time stamp interval = 700mS.n PCR and/or the SCR are used to synchronize the decoder STC

with the encoder STC.

Page 17: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 65

Timing – Synchronization (3)n All video and audio streams included in a program must get their

time stamps from a common STC so that synchronization of thevideo and audio decoders with each other may be accomplished.

n The data rate and packet rate on the channel (at the multiplexeroutput) can be completely asynchronous with the System TimeClock (STC)

n PCR time stamps allows synchronizations of differentmultiplexed programs having different STCs while allowing STCrecovery for each program.

n If there is no buffer underflow or overflowà delays in the buffersand transmission channel for both video and audio areconstant.

n The encoder input and decoder output run at equal and constantrates.

n Fixed end-to-end delay from encoder input to decoder outputn If exact synchronization is not required, the decoder clock can be

free runningà video frames can be repeated / skipped asnecessary to prevent buffer underflow / overflow , respectively.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 66

HDTV (High definition television)n High definition television (HDTV) first came to

public attention in 1981, when NHK, theJapanese broadcasting authority, firstdemonstrated it in the United States.

n HDTV is defined by the ITU-R as:q 'A system designed to allow viewing at about

three times the picture height, such that thesystem is virtually, or nearly, transparent to thequality or portrayal that would have beenperceived in the original scene ... by a discerningviewer with normal visual acuity.'

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 67

HDTV (2)n HDTV proposals are for a screen which is wider than the conventional

TV image by about 33%. It is generally agreed that the HDTV aspectratio will be 16:9, as opposed to the 4:3 ratio of conventional TVsystems. This ratio has been chosen because psychological tests haveshown that it best matches the human visual field.

n It also enables use of existing cinema film formats as additional sourcematerial, since this is the same aspect ratio used in normal 35 mm film.Figure 16.6(a) shows how the aspect ratio of HDTV compares with thatof conventional television, using the same resolution, or the samesurface area as the comparison metric.

n To achieve the improved resolution the video image used in HDTVmust contain over 1000 lines, as opposed to the 525 and 625 providedby the existing NTSC and PAL systems. This gives a much improvedvertical resolution. The exact value is chosen to be a simple multiple ofone or both of the vertical resolutions used in conventional TV.

n However, due to the higher scan rates the bandwidth requirement foranalogue HDTV is approximately 12 MHz, compared to the nominal 6MHz of conventional TV

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 68

HDTV (3)n The introduction of a non-compatible TV transmission format for

HDTV would require the viewer either to buy a new receiver, or tobuy a converter to receive the picture on their old set.

n The initial thrust in Japan was towards an HDTV format which iscompatible with conventional TV standards, and which can bereceived by conventional receivers, with conventional quality.However, to get the full benefit of HDTV, a new wide screen, highresolution receiver has to be purchased.

n One of the principal reasons that HDTV is not already common isthat a general standard has not yet been agreed. The 26th CCIRplenary assembly recommended the adoption of a single, worldwidestandard for high definition television.

n Unfortunately, Japan, Europe and North America are all investingsignificant time and money in their own systems based on their own,current, conventional TV standards and other nationalconsiderations.

Page 18: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 69

H261- H263n The H.261 algorithm was developed for the purpose of image

transmission rather than image storage.n It is designed to produce a constant output of p x 64 kbivs, where

p is an integer in the range 1 to 30.q This allows transmission over a digital network or data link of

varying capacity.q It also allows transmission over a single 64 kbit/s digital

telephone channel for low quality video-telephony, or at higher bitrates for improved picture quality.

n The basic coding algorithm is similar to that of MPEG in that it isa hybrid of motion compensation, DCT and straightforwardDPCM (intra-frame coding mode), without the MPEG I, P, Bframes.

n The DCT operation is performed at a low level on 8 x 8 blocks oferror samples from the predicted luminance pixel values, withsub-sampled blocks of chrominance data.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 70

H261-H263 (2)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 71

H261-H263 (3)n H.261 is widely used on 176x 144 pixel images.n The ability to select a range of output rates for the algorithm

allows it to be used in different applications.n Low output rates ( p = 1 or 2) are only suitable for face-to-face

(videophone) communication. H.261 is thus the standard used inmany commercial videophone systems such as the UKBT/Marconi Relate 2000 and the US ATT 2500 products.

n Video-conferencing would require a greater output data rate ( p >6) and might go as high as 2 Mbit/s for high quality transmissionwith larger image sizes.

n A further development of H.261 is H.263 for lower fixedtransmission rates.

n This deploys arithmetic coding in place of the variable lengthcoding (See H261 diagram), with other modifications, the datarate is reduced to only 20 kbit/s.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 72

Model Based Coding (MBC)n At the very low bit rates (20 kbit/s or less) associated with video

telephony, the requirements for image transmission stretch thecompression techniques described earlier to their limits.

n In order to achieve the necessary degree of compression theyoften require reduction in spatial resolution or even theelimination of frames from the sequence.

n Model based coding (MBC) attempts to exploit a greater degreeof redundancy in images than current techniques, in order toachieve significant image compression but without adverselydegrading the image content information.

n It relies upon the fact that the image quality is largely subjective.Providing that the appearance of scenes within an observedimage is kept at a visually acceptable level, it may not matter thatthe observed image is not a precise reproduction of reality.

Page 19: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 73

Model Based Coding (2)n One MBC method for producing an artificial image of a head sequence

utilizes a feature codebook where a range of facial expressions,sufficient to create an animation, are generated from sub-images ortemplates which are joined together to form a complete face.

n The most important areas of a face, for conveying an expression, arethe eyes and mouth, hence the objective is to create an image in whichthe movement of the eyes and mouth is a convincing approximation tothe movements of the original subject.

n When forming the synthetic image, the feature template vectors whichform the closest match to those of the original moving sequence areselected from the codebook and then transmitted as low bit rate codedaddresses.

n By using only 10 eye and 10 mouth templates, for instance, a total of100 combinations exists implying that only a 6-bit codebook addressneed be transmitted.

n It has been found that there are only 13 visually distinct mouth shapesfor vowel and consonant formation during speech.

n However, the number of mouth sub-images is usually increased, toinclude intermediate expressions and hence avoid step changes in theimage.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 74

Model Based Coding (3)n Another common way of representing objects in three-

dimensional computer graphics is by a net ofinterconnecting polygons.

n A model is stored as a set of linked arrays which specifythe coordinates of each polygon vertex, with the linesconnecting the vertices together forming each side of apolygon.

n To make realistic models, the polygon net can beshaded to reflect the presence of light sources.

n The wire-frame model [Welch 19911 can be modified tofit the shape of a person's head and shoulders. Thewire-frame, composed of over 100 interconnectingtriangles, can produce subjectively acceptable syntheticimages, providing that the frame is not rotated by morethan 30" from the full -face position.

n The model, (see the Figure) uses smaller triangles inareas associated with high degrees of curvature wheresignificant movement is required.

n Large flat areas, such as the forehead, contain fewertriangles.

n A second wire-frame is used to model the mouthinterior.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 75

Model based coding (4)n A synthetic image is created by texture mapping detail from an

initial full-face source image, over the wire-frame, Facialmovement can be achieved by manipulation of the vertices of thewire-frame.

n Head rotation requires the use of simple matrix operations uponthe coordinate array. Facial expression requires the manipulationof the features controlling the vertices.

n This model based feature codebook approach suffers from thedrawback of codebook formation.

n This has to be done off-line and, consequently, the image isrequired to be prerecorded, with a consequent delay.

n However, the actual image sequence can be sent at a very lowdata rate. For a codebook with 128 entries where 7 bits arerequired to code each mouth, a 25 frameh sequence requiresless than 200 bit/s to code the mouth movements.

n When it is finally implemented, rates as low as 1 kbit/s areconfidently expected from MBC systems, but they can onlytransmit image sequences which match the stored model, e.g.head and shoulders displays.

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 76

Key points:n JPEG coding mechanism à DCT/ Zigzag Scanning/ Adaptive

Quantization / VLCn MPEG layered structure:

q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream(PES)

n MPEG compression mechanism:q Predictionq Motion compensationq Scanningq YCbCr formats (4:4:4, 4:2:0, etc)q Profiles @ Levelq I,P,B pictures & reorderingq Encoder/ Decoder process & Block diagram

n MPEG Data transportn MPEG Timing & Buffer control

q STC/SCR/DTSq PCR/PTS

Page 20: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 77

Technical termsn Macro blocksn HVS = Human Visual Systemn GOP = Group of Picturesn VLC = Variable Length Coding/Codern IDCT/DCT = (Inverse) Discrete Cosine Transformn PES = Packetized Elementary Streamn MP@ML = Main profile @ Main Leveln PCR = Program Clock Referencen SCR = System Clock Referencen STC = System Time Clockn PTS = Presentation Time Stampn DTS = Decode Time Stampn PAT = Program Association Tablen PMT = Program Map Table

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 78

Chapter 3. CATV systems

n Overview:qA brief historyqModern CATV networksqCATV systems and equipments

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 79

A Brief History:q CATV appeared in the 60s in the US, where high

buildings are the great obstacles for thepropagation of TV signal.

q Old CATV networksàn Coaxial onlyn Tree-and-Branch onlyn TV only

n No return path (à high-pass filters are installed incustomer’s houses to block return low frequency noise)

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 80

Modern CATV networksn Key elements:

q CO orMasterHeadend

q Headends/Hub

q Servercomplex

q CMTSq TV content

providerq Optical

Nodesq Tapsq Amplifiers

(GNA/TNA/LE)

Page 21: Multi media chapter 1_2_3

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 81

Modern CATV networks (2)n Based on Hybrid Fiber-Coaxial architecture à also referred to

as “HFC networks”n The optical section is based on modern optical communication

technologiesàq Star/ring/mesh, etc topologiesq SDH/SONET for digital fibersq Various architectures à digital, analog or mixed fiber cabling

systems.n Part of forward path spectrum is used for high-speed Internet

accessn Return path is exploited for Digital data communicationà the

root of new problems !!q 5-60 MHz band for upstreamq 88-860 MHz band for downstream

n 88-450 MHz for analog/digital TV channelsn 450-860 MHz for Internet access

q FDM

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 82

Spectrum allocation of CATV networks

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 83

CATV systems and equipments

4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 84

Vocabulary

n Perception = Su nhan thucn Lap = Phu len