a guide to mpeg fundamentals and protocol analysis to mpeg.pdf · figure 1.1a shows that in...

Copyright © 1997, Tektronix, Inc. All rights reserved.

A Guide to MPEGFundamentals andProtocol Analysis(Including DVB and ATSC)

Section 1 Introduction to MPEG . . . . . . . . . . . . . . .3

1.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.2 Why compression is needed . . . . . . . . . . . . . . . . . .31.3 Applications of compression . . . . . . . . . . . . . . . . .31.4 Introduction to video compression . . . . . . . . . . . . .41.5 Introduction to audio compression . . . . . . . . . . . . .61.6 MPEG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . .61.7 Need for monitoring and analysis . . . . . . . . . . . . . .71.8 Pitfalls of compression . . . . . . . . . . . . . . . . . . . . . .7

Section 2 Compression in Video . . . . . . . . . . . . . . .8

2.1 Spatial or temporal coding? . . . . . . . . . . . . . . . . . .82.2 Spatial coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.3 Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.4 Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112.5 Entropy coding . . . . . . . . . . . . . . . . . . . . . . . . . . .112.6 A spatial coder . . . . . . . . . . . . . . . . . . . . . . . . . . .112.7 Temporal coding . . . . . . . . . . . . . . . . . . . . . . . . .122.8 Motion compensation . . . . . . . . . . . . . . . . . . . . . .132.9 Bidirectional coding . . . . . . . . . . . . . . . . . . . . . . .142.10 I, P, and B pictures . . . . . . . . . . . . . . . . . . . . . . .142.11 An MPEG compressor . . . . . . . . . . . . . . . . . . . .162.12 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . .192.13 Profiles and levels . . . . . . . . . . . . . . . . . . . . . . .202.14 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

Section 3 Audio Compression . . . . . . . . . . . . . . .22

3.1 The hearing mechanism . . . . . . . . . . . . . . . . . . . .223.2 Subband coding . . . . . . . . . . . . . . . . . . . . . . . . . .233.3 MPEG Layer 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .243.4 MPEG Layer 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .253.5 Transform coding . . . . . . . . . . . . . . . . . . . . . . . . .253.6 MPEG Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . .253.7 AC-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Section 4 Elementary Streams . . . . . . . . . . . . . . .26

4.1 Video elementary stream syntax . . . . . . . . . . . . . .264.2 Audio elementary streams . . . . . . . . . . . . . . . . . .27

Contents

Section 5 Packetized Elementary Streams (PES) . . .28

5.1 PES packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .285.2 Time stamps . . . . . . . . . . . . . . . . . . . . . . . . . . . .285.3 PTS/DTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

Section 6 Program Streams . . . . . . . . . . . . . . . . .29

6.1 Recording vs. transmission . . . . . . . . . . . . . . . . .296.2 Introduction to program streams . . . . . . . . . . . . .29

Section 7 Transport streams . . . . . . . . . . . . . . . .30

7.1 The job of a transport stream . . . . . . . . . . . . . . . .307.2 Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .307.3 Program Clock Reference (PCR) . . . . . . . . . . . . . .317.4 Packet Identification (PID) . . . . . . . . . . . . . . . . . .317.5 Program Specific Information (PSI) . . . . . . . . . . .32

Section 8 Introduction to DVB/ATSC . . . . . . . . . . .33

8.1 An overall view . . . . . . . . . . . . . . . . . . . . . . . . . . .338.2 Remultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . .338.3 Service Information (SI) . . . . . . . . . . . . . . . . . . . .348.4 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . .348.5 Channel coding . . . . . . . . . . . . . . . . . . . . . . . . . .358.6 Inner coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .368.7 Transmitting digits . . . . . . . . . . . . . . . . . . . . . . . .37

Section 9 MPEG Testing . . . . . . . . . . . . . . . . . . .38

9.1 Testing requirements . . . . . . . . . . . . . . . . . . . . . .389.2 Analyzing a Transport Stream . . . . . . . . . . . . . . . .389.3 Hierarchic view . . . . . . . . . . . . . . . . . . . . . . . . . .399.4 Interpreted view . . . . . . . . . . . . . . . . . . . . . . . . . .409.5 Syntax and CRC analysis . . . . . . . . . . . . . . . . . . .419.6 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .419.7 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .429.8 Elementary stream testing . . . . . . . . . . . . . . . . . .439.9 Sarnoff compliant bit streams . . . . . . . . . . . . . . . .439.10 Elementary stream analysis . . . . . . . . . . . . . . . .439.11 Creating a transport stream . . . . . . . . . . . . . . . .449.12 Jitter generation . . . . . . . . . . . . . . . . . . . . . . . . .449.13 DVB tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

SECTION 1 INTRODUCTION TO MPEGMPEG is one of the most popularaudio/video compression tech-niques because it is not just asingle standard. Instead it is arange of standards suitable fordifferent applications but basedon similar principles. MPEG isan acronym for the MovingPicture Experts Group which wasset up by the ISO (InternationalStandards Organization) to workon compression.

MPEG can be described as theinteraction of acronyms. As ETSIstated "The CAT is a pointer toenable the IRD to find the EMMsassociated with the CA system(s)that it uses." If you can under-stand that sentence you don'tneed this book.

1.1 Convergence

Digital techniques have maderapid progress in audio andvideo for a number of reasons.Digital information is more robustand can be coded to substantiallyeliminate error. This means thatgeneration loss in recording andlosses in transmission are elimi-nated. The Compact Disc wasthe first consumer product todemonstrate this.

While the CD has an improvedsound quality with respect to itsvinyl predecessor, comparison ofquality alone misses the point.The real point is that digitalrecording and transmission tech-niques allow content manipula-tion to a degree that is impossiblewith analog. Once audio or videoare digitized they become data.

Such data cannot be distinguishedfrom any other kind of data;therefore, digital video andaudio become the province ofcomputer technology.

The convergence of computersand audio/video is an inevitableconsequence of the key inventionsof computing and Pulse CodeModulation. Digital media canstore any type of information, soit is easy to utilize a computerstorage device for digital video.The nonlinear workstation wasthe first example of an applicationof convergent technology thatdid not have an analog forerunner.Another example, multimedia,mixed the storage of audio, video,graphics, text and data on thesame medium. Multimedia isimpossible in the analog domain.

1.2 Why compression is needed

The initial success of digitalvideo was in post-productionapplications, where the high costof digital video was offset by itslimitless layering and effectscapability. However, production-standard digital video generatesover 200 megabits per second ofdata and this bit rate requiresextensive capacity for storageand wide bandwidth for trans-mission. Digital video could onlybe used in wider applications ifthe storage and bandwidthrequirements could be eased;easing these requirements is thepurpose of compression.

Compression is a way ofexpressing digital audio andvideo by using less data.Compression has the followingadvantages:

A smaller amount of storage isneeded for a given amount ofsource material. With high-density recording, such as withtape, compression allows highlyminiaturized equipment for consumer and Electronic NewsGathering (ENG) use. The accesstime of tape improves with com-pression because less tape needsto be shuttled to skip over agiven amount of program. Withexpensive storage media such asRAM, compression makes newapplications affordable.

When working in real time, com-pression reduces the bandwidthneeded. Additionally, compres-sion allows faster-than-real-timetransfer between media, forexample, between tape and disk.

A compressed recording formatcan afford a lower recordingdensity and this can make therecorder less sensitive to environmental factors and maintenance.

1.3 Applications of compression

Compression has a long associa-tion with television. Interlace isa simple form of compressiongiving a 2:1 reduction in band-width. The use of color-differencesignals instead of GBR is anotherform of compression. Becausethe eye is less sensitive to colordetail, the color-difference signalsneed less bandwidth. Whencolor broadcasting was intro-duced, the channel structure ofmonochrome had to be retainedand composite video was devel-oped. Composite video systems,such as PAL, NTSC and SECAM,are forms of compression becausethey use the same bandwidth forcolor as was used for monochrome.

3

Figure 1.1a shows that in tradi-tional television systems, theGBR camera signal is convertedto Y, Pr, Pb components for pro-duction and encoded into ana-logue composite for transmission.Figure 1.1b shows the modernequivalent. The Y, Pr, Pb signalsare digitized and carried as Y,Cr, Cb signals in SDI formthrough the production processprior to being encoded withMPEG for transmission. Clearly,MPEG can be considered by thebroadcaster as a more efficientreplacement for compositevideo. In addition, MPEG hasgreater flexibility because the bitrate required can be adjusted tosuit the application. At lower bitrates and resolutions, MPEG canbe used for video conferencingand video telephones.

DVB and ATSC (the European-and American-originated digital-television broadcasting standards)would not be viable withoutcompression because the band-width required would be toogreat. Compression extends theplaying time of DVD (digitalvideo/versatile disc) allowingfull-length movies on a standardsize compact disc. Compressionalso reduces the cost of ElectronicNews Gathering and other contri-butions to television production.

In tape recording, mild compres-sion eases tolerances and addsreliability in Digital Betacam andDigital-S, whereas in SX, DVC,DVCPRO and DVCAM, the goalis miniaturization. In magneticdisk drives, such as the TektronixProfile® storage system, that areused in file servers and networks(especially for news purposes),

compression lowers storage cost.Compression also lowers band-width, which allows more usersto access a given server. Thischaracteristic is also importantfor VOD (Video On Demand)applications.

1.4 Introduction to video compression

In all real program material,there are two types of componentsof the signal: those which arenovel and unpredictable andthose which can be anticipated.The novel component is calledentropy and is the true informa-tion in the signal. The remainderis called redundancy because itis not essential. Redundancy maybe spatial, as it is in large plainareas of picture where adjacentpixels have almost the samevalue. Redundancy can also betemporal as it is where similaritiesbetween successive pictures areused. All compression systemswork by separating the entropyfrom the redundancy in theencoder. Only the entropy isrecorded or transmitted and thedecoder computes the redundancyfrom the transmitted signal.Figure 1.2a shows this concept.

An ideal encoder would extractall the entropy and only this willbe transmitted to the decoder.An ideal decoder would thenreproduce the original signal. Inpractice, this ideal cannot bereached. An ideal coder wouldbe complex and cause a verylong delay in order to use tem-poral redundancy. In certainapplications, such as recordingor broadcasting, some delay isacceptable, but in videoconfer-

encing it is not. In some cases, avery complex coder would betoo expensive. It follows thatthere is no one ideal compres-sion system.

In practice, a range of coders isneeded which have a range ofprocessing delays and complexi-ties. The power of MPEG is thatit is not a single compressionformat, but a range of standard-ized coding tools that can becombined flexibly to suit a rangeof applications. The way inwhich coding has been performedis included in the compresseddata so that the decoder canautomatically handle whateverthe coder decided to do.

MPEG coding is divided intoseveral profiles that have differentcomplexity, and each profile canbe implemented at a differentlevel depending on the resolutionof the input picture. Section 2considers profiles and levels in detail.

There are many different digitalvideo formats and each has a different bit rate. For example ahigh definition system mighthave six times the bit rate of astandard definition system.Consequently just knowing thebit rate out of the coder is notvery useful. What matters is thecompression factor, which is theratio of the input bit rate to thecompressed bit rate, for example2:1, 5:1, and so on.

Unfortunately the number ofvariables involved make it verydifficult to determine a suitablecompression factor. Figure 1.2ashows that for an ideal coder, ifall of the entropy is sent, thequality is good. However, if thecompression factor is increasedin order to reduce the bit rate,not all of the entropy is sent andthe quality falls. Note that in acompressed system when thequality loss occurs, compressionis steep (Figure 1.2b). If theavailable bit rate is inadequate,it is better to avoid this area byreducing the entropy of theinput picture. This can be doneby filtering. The loss of resolu-tion caused by the filtering issubjectively more acceptablethan the compression artifacts.

4

AnalogComposite

Out

(PAL, NTSCor SECAM)

BG

R

YPrPb

DigitalCompressed

OutMatrix ADC Production

ProcessBG

R

YPrPb

YCrCb

YCrCb

SDI

MPEGCoder

a)

b)

MatrixCamera

Camera

CompositeEncoder

Figure 1.1.

To identify the entropy perfectly,an ideal compressor would haveto be extremely complex. Apractical compressor may be lesscomplex for economic reasonsand must send more data to besure of carrying all of the entropy.Figure 1.2b shows the relationshipbetween coder complexity andperformance. The higher the com-pression factor required, the morecomplex the encoder has to be.

The entropy in video signalsvaries. A recording of anannouncer delivering the newshas much redundancy and is easyto compress. In contrast, it ismore difficult to compress arecording with leaves blowing inthe wind or one of a footballcrowd that is constantly movingand therefore has less redundancy(more information or entropy). Ineither case, if all the entropy isnot sent, there will be quality loss.Thus, we may choose between aconstant bit-rate channel withvariable quality or a constantquality channel with variable bitrate. Telecommunications networkoperators tend to prefer a constantbit rate for practical purposes,but a buffer memory can be usedto average out entropy variationsif the resulting increase in delayis acceptable. In recording, avariable bit rate maybe easier tohandle and DVD uses variablebit rate, speeding up the discwhere difficult material exists.

Intra-coding (intra = within) is atechnique that exploits spatialredundancy, or redundancywithin the picture; inter-coding(inter = between) is a techniquethat exploits temporal redundancy.Intra-coding may be used alone,as in the JPEG standard for stillpictures, or combined with inter-coding as in MPEG.

Intra-coding relies on two char-acteristics of typical images.First, not all spatial frequenciesare simultaneously present, andsecond, the higher the spatialfrequency, the lower the ampli-tude is likely to be. Intra-codingrequires analysis of the spatialfrequencies in an image. Thisanalysis is the purpose of trans-forms such as wavelets and DCT(discrete cosine transform).Transforms produce coefficientswhich describe the magnitude ofeach spatial frequency.

Typically, many coefficients willbe zero, or nearly zero, and thesecoefficients can be omitted,resulting in a reduction in bit rate.

Inter-coding relies on findingsimilarities between successivepictures. If a given picture isavailable at the decoder, the nextpicture can be created by sendingonly the picture differences. Thepicture differences will beincreased when objects move,but this magnification can be offset by using motion compen-sation, since a moving objectdoes not generally change itsappearance very much from onepicture to the next. If the motioncan be measured, a closer approx-imation to the current picturecan be created by shifting part ofthe previous picture to a newlocation. The shifting process iscontrolled by a vector that istransmitted to the decoder. Thevector transmission requires lessdata than sending the picture-difference data.

MPEG can handle both interlacedand non-interlaced images. Animage at some point on the timeaxis is called a "picture," whetherit is a field or a frame. Interlaceis not ideal as a source for digital

compression because it is in itself a compression technique.Temporal coding is made morecomplex because pixels in onefield are in a different position tothose in the next.

Motion compensation minimizesbut does not eliminate thedifferences between successivepictures. The picture-differenceis itself a spatial image and canbe compressed using transform-based intra-coding as previouslydescribed. Motion compensationsimply reduces the amount ofdata in the difference image.

The efficiency of a temporalcoder rises with the time spanover which it can act. Figure 1.2cshows that if a high compressionfactor is required, a longer timespan in the input must be con-sidered and thus a longer codingdelay will be experienced. Clearlytemporally coded signals are dif-ficult to edit because the contentof a given output picture may bebased on image data which wastransmitted some time earlier.Production systems will have to limit the degree of temporalcoding to allow editing and thislimitation will in turn limit theavailable compression factor.

5

Short DelayCoder has to

send even more

Non-IdealCoder has tosend more

Ideal Codersends only

Entropy

Entropy

PCM Video

WorseQuality

BetterQuality

Latency

Com

pres

sion

Fac

tor

Com

pres

sion

Fac

tor

WorseQuality

BetterQuality

Complexity

a)

b) c)

Figure 1.2.

Stream differs from a ProgramStream in that the PES packetsare further subdivided into shortfixed-size packets and in thatmultiple programs encoded withdifferent clocks can be carried.This is possible because a trans-port stream has a program clockreference (PCR) mechanismwhich allows transmission ofmultiple clocks, one of which isselected and regenerated at thedecoder. A Single ProgramTransport Stream (SPTS) is alsopossible and this may be foundbetween a coder and a multi-plexer. Since a Transport Streamcan genlock the decoder clockto the encoder clock, the SingleProgram Transport Stream(SPTS) is more common thanthe Program Stream.

A Transport Stream is more thanjust a multiplex of audio andvideo PES. In addition to thecompressed audio, video anddata, a Transport Streamincludes a great deal of metadatadescribing the bit stream. Thisincludes the Program AssociationTable (PAT) that lists every pro-gram in the transport stream.Each entry in the PAT points toa Program Map Table (PMT) thatlists the elementary streamsmaking up each program. Someprograms will be open, but someprograms may be subject to con-ditional access (encryption) andthis information is also carriedin the metadata.

The Transport Stream consistsof fixed-size data packets, eachcontaining 188 bytes. Eachpacket carries a packet identifiercode (PID). Packets in the sameelementary stream all have thesame PID, so that the decoder(or a demultiplexer) can selectthe elementary stream(s) itwants and reject the remainder.Packet-continuity counts ensurethat every packet that is neededto decode a stream is received.An effective synchronizationsystem is needed so thatdecoders can correctly identifythe beginning of each packetand deserialize the bit streaminto words.

complicating audio compressionis that delayed resonances inpoor loudspeakers actually maskcompression artifacts. Testing acompressor with poor speakersgives a false result, and signalswhich are apparently satisfactorymay be disappointing whenheard on good equipment.

1.6 MPEG signals

The output of a single MPEGaudio or video coder is calledan Elementary Stream. AnElementary Stream is an endlessnear real-time signal. For conve-nience, it can be broken intoconvenient-sized data blocks ina Packetized Elementary Stream(PES). These data blocks needheader information to identifythe start of the packets and mustinclude time stamps becausepacketizing disrupts the time axis.

Figure 1.3 shows that one videoPES and a number of audio PEScan be combined to form aProgram Stream, provided thatall of the coders are locked to acommon clock. Time stamps ineach PES ensure lip-syncbetween the video and audio.Program Streams have variable-length packets with headers.They find use in data transfersto and from optical and harddisks, which are error free and in which files of arbitrarysizes are expected. DVD usesProgram Streams.

For transmission and digitalbroadcasting, several programsand their associated PES can bemultiplexed into a singleTransport Stream. A Transport

1.5 Introduction to audio compression

The bit rate of a PCM digitalaudio channel is only about onemegabit per second, which isabout 0.5% of 4:2:2 digital video.With mild video compressionschemes, such as DigitalBetacam, audio compression isunnecessary. But, as the videocompression factor is raised, itbecomes necessary to compressthe audio as well.

Audio compression takes advan-tage of two facts. First, in typicalaudio signals, not all frequenciesare simultaneously present.Second, because of the phenom-enon of masking, human hearingcannot discern every detail ofan audio signal. Audio compres-sion splits the audio spectruminto bands by filtering or trans-forms, and includes less datawhen describing bands in whichthe level is low. Where maskingprevents or reduces audibility ofa particular band, even less dataneeds to be sent.

Audio compression is not aseasy to achieve as is video com-pression because of the acuity ofhearing. Masking only worksproperly when the masking andthe masked sounds coincidespatially. Spatial coincidence isalways the case in mono record-ings but not in stereo recordings,where low-level signals can still be heard if they are in a different part of the soundstage.Consequently, in stereo and sur-round sound systems, a lowercompression factor is allowablefor a given quality. Another factor

6

Figure 1.3.

VideoData

AudioData

ElementaryStream

VideoPES

AudioPES

Data

ProgramStream(DVD)

SingleProgramTransportStream

VideoEncoder

AudioEncoder

Packetizer

Packetizer

ProgramStreamMUX

TransportStreamMUX

7

1.7 Need for monitoring and analysis

The MPEG transport stream isan extremely complex structureusing interlinked tables andcoded identifiers to separate theprograms and the elementarystreams within the programs.Within each elementary stream,there is a complex structure,allowing a decoder to distinguishbetween, for example, vectors,coefficients and quantizationtables.

Failures can be divided into two broad categories. In the firstcategory, the transport systemcorrectly multiplexes and deliversinformation from an encoder toa decoder with no bit errors oradded jitter, but the encoder orthe decoder has a fault. In thesecond category, the encoderand decoder are fine, but thetransport of data from one to theother is defective. It is veryimportant to know whether thefault lies in the encoder, thetransport, or the decoder if aprompt solution is to be found.

Synchronizing problems, suchas loss or corruption of syncpatterns, may prevent receptionof the entire transport stream.Transport-stream protocoldefects may prevent the decoderfrom finding all of the data for aprogram, perhaps deliveringpicture but not sound. Correctdelivery of the data but withexcessive jitter can cause decodertiming problems.

If a system using an MPEGtransport stream fails, the faultcould be in the encoder, themultiplexer, or in the decoder.How can this fault be isolated?First, verify that a transportstream is compliant with theMPEG-coding standards. If thestream is not compliant, adecoder can hardly be blamedfor having difficulty. If it is, thedecoder may need attention.

Traditional video testing tools,the signal generator, the wave-form monitor and vectorscope,are not appropriate in analyzingMPEG systems, except to ensurethat the video signals enteringand leaving an MPEG systemare of suitable quality. Instead, a reliable source of valid MPEGtest signals is essential for testing receiving equipment and decoders. With a suitableanalyzer, the performance ofencoders, transmission systems,multiplexers and remultiplexerscan be assessed with a highdegree of confidence. As a longstanding supplier of high gradetest equipment to the videoindustry, Tektronix continues toprovide test and measurementsolutions as the technologyevolves, giving the MPEG userthe confidence that complexcompressed systems are correctlyfunctioning and allowing rapiddiagnosis when they are not.

1.8 Pitfalls of compression

MPEG compression is lossy inthat what is decoded, is notidentical to the original. Theentropy of the source varies,and when entropy is high, thecompression system may leavevisible artifacts when decoded.In temporal compression,redundancy between successivepictures is assumed. When thisis not the case, the system fails.An example is video from a pressconference where flashguns arefiring. Individual pictures con-taining the flash are totally dif-ferent from their neighbors, andcoding artifacts become obvious.

Irregular motion or several independently moving objectson screen require a lot of vectorbandwidth and this requirementmay only be met by reducingthe picture-data bandwidth.Again, visible artifacts may

occur whose level varies anddepends on the motion. Thisproblem often occurs in sports-coverage video.

Coarse quantizing results inluminance contouring and pos-terized color. These can be seenas blotchy shadows and blockingon large areas of plain color.Subjectively, compression artifactsare more annoying than the relatively constant impairmentsresulting from analog televisiontransmission systems.

The only solution to these prob-lems is to reduce the compressionfactor. Consequently, the com-pression user has to make avalue judgment between theeconomy of a high compressionfactor and the level of artifacts.

In addition to extending theencoding and decoding delay,temporal coding also causes difficulty in editing. In fact, anMPEG bit stream cannot be arbitrarily edited at all. Thisrestriction occurs because intemporal coding the decoding of one picture may require thecontents of an earlier pictureand the contents may not beavailable following an edit. The fact that pictures may besent out of sequence also complicates editing.

If suitable coding has been used,edits can take place only atsplice points, which are relativelywidely spaced. If arbitrary editingis required, the MPEG streammust undergo a read-modify-write process, which will resultin generation loss.

The viewer is not interested inediting, but the production userwill have to make another valuejudgment about the edit flexibilityrequired. If greater flexibility isrequired, the temporal compres-sion has to be reduced and ahigher bit rate will be needed.

8

sufficient accuracy, the outputof the inverse transform is iden-tical to the original waveform.

The most well known transformis the Fourier transform. Thistransform finds each frequencyin the input signal. It finds eachfrequency by multiplying theinput waveform by a sample ofa target frequency, called a basisfunction, and integrating theproduct. Figure 2.1 shows thatwhen the input waveform doesnot contain the target frequency,the integral will be zero, butwhen it does, the integral willbe a coefficient describing theamplitude of that componentfrequency.

The results will be as describedif the frequency component is inphase with the basis function.However if the frequency com-ponent is in quadrature with thebasis function, the integral willstill be zero. Therefore, it is necessary to perform twosearches for each frequency,with the basis functions inquadrature with one another sothat every phase of the inputwill be detected.

The Fourier transform has thedisadvantage of requiring coeffi-cients for both sine and cosinecomponents of each frequency.In the cosine transform, the inputwaveform is time-mirrored withitself prior to multiplication bythe basis functions. Figure 2.2shows that this mirroring cancelsout all sine components anddoubles all of the cosine compo-nents. The sine basis function is unnecessary and only onecoefficient is needed for eachfrequency.

The discrete cosine transform(DCT) is the sampled version ofthe cosine transform and is usedextensively in two-dimensionalform in MPEG. A block of 8 x 8pixels is transformed to becomea block of 8 x 8 coefficients.Since the transform requiresmultiplication by fractions,there is wordlength extension,resulting in coefficients thathave longer wordlength than thepixel values. Typically an 8-bit

Spatial compression relies onsimilarities between adjacentpixels in plain areas of pictureand on dominant spatial fre-quencies in areas of patterning.The JPEG system uses spatialcompression only, since it isdesigned to transmit individualstill pictures. However, JPEGmay be used to code a successionof individual pictures for video.In the so-called "Motion JPEG"application, the compressionfactor will not be as good as iftemporal coding was used, butthe bit stream will be freelyeditable on a picture-by-picture basis.

2.2 Spatial coding

The first step in spatial codingis to perform an analysis of spa-tial frequency using a transform.A transform is simply a way ofexpressing a waveform in a dif-ferent domain, in this case, thefrequency domain. The outputof a transform is a set of coeffi-cients that describe how muchof a given frequency is present.An inverse transform reproducesthe original waveform. If thecoefficients are handled with

SECTION 2 COMPRESSION IN VIDEOThis section shows how videocompression is based on theperception of the eye. Importantenabling techniques, such astransforms and motion compen-sation, are considered as anintroduction to the structure ofan MPEG coder.

2.1 Spatial or temporal coding?

As was seen in Section 1, videocompression can take advantageof both spatial and temporalredundancy. In MPEG, temporalredundancy is reduced first byusing similarities between suc-cessive pictures. As much aspossible of the current picture iscreated or "predicted" by usinginformation from pictures alreadysent. When this technique isused, it is only necessary tosend a difference picture, whicheliminates the differencesbetween the actual picture andthe prediction. The differencepicture is then subject to spatialcompression. As a practical mat-ter it is easier to explain spatialcompression prior to explainingtemporal compression.

No Correlationif Frequency

Different

High Correlationif Frequency

the Same

Input

BasisFunction

Input

BasisFunction

Mirror

Cosine ComponentCoherent Through Mirror

Sine ComponentInverts at Mirror – Cancels

Figure 2.1.

Figure 2.2.

9

Figure 2.3.

pixel block results in an 11-bitcoefficient block. Thus, a DCTdoes not result in any compres-sion; in fact it results in theopposite. However, the DCTconverts the source pixels into aform where compression is easier.

Figure 2.3 shows the results ofan inverse transform of each ofthe individual coefficients of an8 x 8 DCT. In the case of theluminance signal, the top-leftcoefficient is the average bright-ness or DC component of thewhole block. Moving across the top row, horizontal spatial frequency increases. Movingdown the left column, verticalspatial frequency increases. Inreal pictures, different verticaland horizontal spatial frequen-cies may occur simultaneouslyand a coefficient at some pointwithin the block will representall possible horizontal and vertical combinations.

Figure 2.3 also shows the coefficients as a one dimensionalhorizontal waveform. Combiningthese waveforms with variousamplitudes and either polaritycan reproduce any combinationof 8 pixels. Thus combining the64 coefficients of the 2-D DCTwill result in the original 8 x 8pixel block. Clearly for colorpictures, the color differencesamples will also need to behandled. Y, Cr, and Cb data areassembled into separate 8 x 8arrays and are transformed individually.

In much real program material,many of the coefficients willhave zero or near zero valuesand, therefore, will not be transmitted. This fact results insignificant compression that isvirtually lossless. If a highercompression factor is needed,then the wordlength of the non-zero coefficients must be reduced.This reduction will reduce accu-racy of these coefficients andwill introduce losses into theprocess. With care, the lossescan be introduced in a way that

is least visible to the viewer.

2.3 Weighting

Figure 2.4 shows that thehuman perception of noise inpictures is not uniform but is afunction of the spatial frequency.More noise can be tolerated athigh spatial frequencies. Also,video noise is effectively maskedby fine detail in the picture,whereas in plain areas it is highlyvisible. The reader will be awarethat traditional noise measure-ments are always weighted so

that the technical measurementrelates to the subjective result.

Compression reduces the accuracyof coefficients and has a similareffect to using shorter wordlengthsamples in PCM; that is, thenoise level rises. In PCM, theresult of shortening the word-length is that the noise levelrises equally at all frequencies.As the DCT splits the signal intodifferent frequencies, it becomespossible to control the spectrumof the noise. Effectively, low-frequency coefficients are

Horizontal spatialfrequency waveforms

HV

HumanVision

Sensitivity

Spatial Frequency

Figure 2.4.

10

As an alternative to truncation,weighted coefficients may benonlinearly requantized so thatthe quantizing step size increaseswith the magnitude of the coef-ficient. This technique allowshigher compression factors butworse levels of artifacts.

Clearly, the degree of compres-sion obtained and, in turn, theoutput bit rate obtained, is afunction of the severity of therequantizing process. Differentbit rates will require differentweighting tables. In MPEG, it ispossible to use various differentweighting tables and the table inuse can be transmitted to thedecoder, so that correct decodingautomatically occurs.

increased noise. Coefficientsrepresenting higher spatial fre-quencies are requantized withlarge steps and suffer morenoise. However, fewer stepsmeans that fewer bits are neededto identify the step and a com-pression is obtained.

In the decoder, a low-order zerowill be added to return theweighted coefficients to theircorrect magnitude. They willthen be multiplied by inverseweighting factors. Clearly, athigh frequencies the multiplica-tion factors will be larger, so therequantizing noise will be greater.Following inverse weighting,the coefficients will have theiroriginal DCT output values, plusrequantizing error, which willbe greater at high frequencythan at low frequency.

Figure 2.5 shows that, in theweighting process, the coeffi-cients from the DCT are dividedby constants that are a functionof two-dimensional frequency.Low-frequency coefficients willbe divided by small numbers,and high-frequency coefficientswill be divided by large numbers.Following the division, theleast-significant bit is discardedor truncated. This truncation isa form of requantizing. In theabsence of weighting, thisrequantizing would have theeffect of doubling the size of thequantizing step, but with weight-ing, it increases the step sizeaccording to the division factor.

As a result, coefficients repre-senting low spatial frequenciesare requantized with relativelysmall steps and suffer little

Input DCT Coefficients(a more complex block)

Output DCT CoefficientsValue for display only

not actual results

Quant Matrix ValuesValue used correspondsto the coefficient location

Quant Scale ValuesNot all code values are shown

One value used for complete 8x8 block

Divide byQuantMatrix

Divide byQuantScale

980 12 23 16 13 4 1 0

12

7

5

2

2

1

0

9 8 11 2 1 0 0

13

6

3

4

4

0

8 3 0 2 0 1

6

8

2

2

1

4 2 1 0 0

1

1

1

0

0

0 0

0 0 0

1 0 0 0

0 0

0 00 0

7842 199 448 362 342 112 31 22

198

142

111

49

58

30

22

151 181 264 59 37 14 3

291

133

85

120

121

28

218 87 27 88 27 12

159

217

60

61

2

119 58 65 36 2

50

40

22

33

8 3 14 12

41 11 2 1

30 1 0 1

24 51 44 81

8 16 19 22 26 27 29 34

16

19

22

22

26

26

27

16 22 24 27 29 34 37

22

22

26

27

27

29

26 27 29 34 34 38

26

27

29

29

35

27 29 34 37 40

29

32

34

38

35 40 48

35 40 48

38 48 56 69

46 56 69 83

32

58

Code LinearQuant Scale

Non-LinearQuant Scale

1

8

16

20

24

28

31

2

16

32

40

48

56

62

1

8

24

40

88

112

56

Figure 2.5.

Figure 2.6.

2.6 A spatial coder

Figure 2.7 ties together all of the preceding spatial codingconcepts. The input signal isassumed to be 4:2:2 SDI (SerialDigital Interface), which mayhave 8- or 10-bit wordlength.MPEG uses only 8-bit resolutiontherefore, a rounding stage willbe needed when the SDI signalcontains 10-bit words. MostMPEG profiles operate with 4:2:0sampling; therefore, a verticallow pass filter/interpolationstage will be needed. Roundingand color subsampling intro-duces a small irreversible loss ofinformation and a proportionalreduction in bit rate. The rasterscanned input format will needto be stored so that it can be converted to 8 x 8 pixel blocks.

(RLC) allows these coefficientsto be handled more efficiently.Where repeating values, such asa string of 0s, are present, runlength coding simply transmitsthe number of zeros rather thaneach individual bit.

The probability of occurrence ofparticular coefficient values inreal video can be studied. Inpractice, some values occur veryoften; others occur less often.This statistical information canbe used to achieve further com-pression using variable lengthcoding (VLC). Frequently occur-ring values are converted toshort code words, and infrequentvalues are converted to longcode words. To aid deserializa-tion, no code word can be theprefix of another.

2.4 Scanning

In typical program material, thesignificant DCT coefficients aregenerally found in the top-leftcorner of the matrix. Afterweighting, low-value coefficientsmight be truncated to zero.More efficient transmission canbe obtained if all of the non-zerocoefficients are sent first, fol-lowed by a code indicating thatthe remainder are all zero.Scanning is a technique whichincreases the probability ofachieving this result, because itsends coefficients in descendingorder of magnitude probability.Figure 2.6a shows that in a non-interlaced system, the probabilityof a coefficient having a highvalue is highest in the top-leftcorner and lowest in the bottom-right corner. A 45 degree diagonalzig-zag scan is the best sequenceto use here.

In Figure 2.6b, the scan for aninterlaced source is shown. Inan interlaced picture, an 8 x 8DCT block from one fieldextends over twice the verticalscreen area, so that for a givenpicture detail, vertical frequencieswill appear to be twice as greatas horizontal frequencies. Thus,the ideal scan for an interlacedpicture will be on a diagonalthat is twice as steep. Figure 2.6bshows that a given vertical spa-tial frequency is scanned beforescanning the same horizontalspatial frequency.

2.5 Entropy coding

In real video, not all spatial frequencies are simultaneouslypresent; therefore, the DCT coefficient matrix will have zeroterms in it. Despite the use ofscanning, zero coefficients willstill appear between the signifi-cant values. Run length coding

11

Zigzag or Classic (nominally for frames)a)

Alternate (nominally for fields)b)

Quantizing

FullBitrate10-bitData

Rate Control

Quantizing Data

CompressedData

Data reduced(no loss)

Data reduced(information lost)

No LossNo Data reduced

Information lostData reduced

Convert4:2:2 to

8-bit 4:2:0Quantize Entropy

Coding BufferDCT

Entropy CodingReduce the number of bits for each coefficient.

Give preference to certain coefficients.Reduction can differ for each coefficient.

Variable Length CodingUse short words for most frequent

values (like Morse Code)

Run Length CodingSend a unique code word instead

of strings of zeros

Figure 2.7.

Figure 2.8.

Following an inverse transform,the 8 x 8 pixel block is recreated.To obtain a raster scanned output,the blocks are stored in RAM,which is read a line at a time.To obtain a 4:2:2 output from4:2:0 data, a vertical interpolationprocess will be needed as shownin Figure 2.8.

The chroma samples in 4:2:0 arepositioned half way betweenluminance samples in the verticalaxis so that they are evenlyspaced when an interlacedsource is used.

2.7 Temporal coding

Temporal redundancy can beexploited by inter-coding ortransmitting only the differencesbetween pictures. Figure 2.9shows that a one-picture delaycombined with a subtractor cancompute the picture differences.The picture difference is animage in its own right and canbe further compressed by thespatial coder as was previouslydescribed. The decoder reversesthe spatial coding and adds thedifference picture to the previouspicture to obtain the next picture.

There are some disadvantages tothis simple system. First, asonly differences are sent, it isimpossible to begin decodingafter the start of the transmission.This limitation makes it difficultfor a decoder to provide picturesfollowing a switch from one bit stream to another (as occurswhen the viewer changes chan-nels). Second, if any part of thedifference data is incorrect, theerror in the picture will propagate indefinitely.

The solution to these problems isto use a system that is not com-pletely differential. Figure 2.10shows that periodically completepictures are sent. These arecalled Intra-coded pictures (or I-pictures), and they are obtainedby spatial compression only. Ifan error or a channel switchoccurs, it will be possible toresume correct decoding at thenext I-picture.

system, a buffer memory is used to absorb variations in coding difficulty. Highlydetailed pictures will tend to fillthe buffer, whereas plain pic-tures will allow it to empty. Ifthe buffer is in danger of over-flowing, the requantizing stepswill have to be made larger, sothat the compression factor iseffectively raised.

In the decoder, the bit stream isdeserialized and the entropycoding is reversed to reproducethe weighted coefficients. Theinverse weighting is applied andcoefficients are placed in thematrix according to the zig-zagscan to recreate the DCT matrix.

The DCT stage transforms thepicture information to the fre-quency domain. The DCT itselfdoes not achieve any compres-sion. Following DCT, the coeffi-cients are weighted and truncated,providing the first significantcompression. The coefficientsare then zig-zag scanned toincrease the probability that thesignificant coefficients occurearly in the scan. After the lastnon-zero coefficient, an EOB(end of block) code is generated.

Coefficient data are further com-pressed by run length and vari-able length coding. In a variablebit-rate system, the quantizing isfixed, but in a fixed bit-rate

12

4:2:2 Rec 601 4:1:1

4:2:0

1 Luminance sample Y

2 Chrominance samples Cb, Cr

N

N+1

N+2

N+3

ThisPicture

PreviousPicture

Picture

Difference

Picture Delay

+

_

Figure 2.9.

Difference Restart

Start Intra IntraTemporal Temporal Temporal

DifferenceDifference

Figure 2.10.

Figure 2.11.

objects in the image and therewill be occasions where part ofthe macroblock moves and partof it does not. In this case, it isimpossible to compensate prop-erly. If the motion of the movingpart is compensated by trans-mitting a vector, the stationarypart will be incorrectly shifted,and it will need difference datato be corrected. If no vector issent, the stationary part will becorrect, but difference data willbe needed to correct the movingpart. A practical compressormight attempt both strategiesand select the one which requiredthe least difference data.

with a resolution of half a pixelover the entire search range.When the greatest correlation isfound, this correlation is assumedto represent the correct motion.

The motion vector has a verticaland horizontal component. Intypical program material,motion continues over a numberof pictures. A greater compressionfactor is obtained if the vectorsare transmitted differentially.Consequently, if an objectmoves at constant speed, thevectors do not change and thevector difference is zero.

Motion vectors are associatedwith macroblocks, not with real

2.8 Motion compensation

Motion reduces the similaritiesbetween pictures and increasesthe data needed to create thedifference picture. Motion com-pensation is used to increase thesimilarity. Figure 2.11 shows theprinciple. When an object movesacross the TV screen, it mayappear in a different place ineach picture, but it does notchange in appearance very much.The picture difference can bereduced by measuring the motionat the encoder. This is sent tothe decoder as a vector. Thedecoder uses the vector to shiftpart of the previous picture to amore appropriate place in thenew picture.

One vector controls the shiftingof an entire area of the picturethat is known as a macroblock.The size of the macroblock isdetermined by the DCT codingand the color subsamplingstructure. Figure 2.12a showsthat, with a 4:2:0 system, thevertical and horizontal spacingof color samples is exactly twicethe spacing of luminance. A single 8 x 8 DCT block of colorsamples extends over the samearea as four 8 x 8 luminanceblocks; therefore this is the minimum picture area whichcan be shifted by a vector. One4:2:0 macroblock contains fourluminance blocks: one Cr blockand one Cb block.

In the 4:2:2 profile, color is onlysubsampled in the horizontalaxis. Figure 2.12b shows that in4:2:2, a single 8 x 8 DCT blockof color samples extends overtwo luminance blocks. A 4:2:2macroblock contains four lumi-nance blocks: two Cr blocks andtwo Cb blocks.

The motion estimator works bycomparing the luminance datafrom two successive pictures. Amacroblock in the first pictureis used as a reference. When theinput is interlaced, pixels willbe at different vertical locationsin the two fields, and it will,therefore, be necessary to inter-polate one field before it can becompared with the other. Thecorrelation between the referenceand the next picture is measuredat all possible displacements

13

Actions:

1. Compute Motion Vector

2. Shift Data from Picture N Using Vector to Make Predicted Picture N+1

3. Compare Actual Picture with Predicted Picture

4. Send Vector and Prediction ErrorPicture N Picture N+1

MotionVector

Part of Moving Object

a) 4:2:0 has 1/4 as many chroma sampling points as Y.

b) 4:2:2 has twice as much chroma data as 4:2:0.

8

8

8

8

8

8

8

8

8

8

8

8

16

16 4 x Y

Cr

Cb

8

8

8

82 x Cr

2 x Cb

16

16

8

8

8

8

8

8

8

84 x Y

8

8

8

8

Figure 2.12.

Figure 2.13.

2.10 I, P and B pictures

In MPEG, three different typesof pictures are needed to supportdifferential and bidirectionalcoding while minimizing errorpropagation:

I pictures are Intra-coded picturesthat need no additional informa-tion for decoding. They requirea lot of data compared to otherpicture types, and therefore theyare not transmitted any morefrequently than necessary. Theyconsist primarily of transformcoefficients and have no vectors.I pictures allow the viewer toswitch channels, and they arresterror propagation.

P pictures are forward Predictedfrom an earlier picture, whichcould be an I picture or a P pic-ture. P-picture data consists ofvectors describing where, in theprevious picture, each macro-block should be taken from, andnot of transform coefficients thatdescribe the correction or differ-ence data that must be added tothat macroblock. P picturesrequire roughly half the data ofan I picture.

B pictures are Bidirectionallypredicted from earlier or later Ior P pictures. B-picture dataconsists of vectors describingwhere in earlier or later picturesdata should be taken from. Italso contains the transform coefficients that provide the correction. Because bidirectionalprediction is so effective, thecorrection data are minimal and this helps the B picture totypically require one quarter thedata of an I picture.

moved backwards in time tocreate part of an earlier picture.

Figure 2.13 shows the conceptof bidirectional coding. On anindividual macroblock basis, abidirectionally coded picturecan obtain motion-compensateddata from an earlier or later picture, or even use an averageof earlier and later data.Bidirectional coding significantlyreduces the amount of differencedata needed by improving thedegree of prediction possible.MPEG does not specify how anencoder should be built, onlywhat constitutes a compliant bitstream. However, an intelligentcompressor could try all threecoding strategies and select theone that results in the least datato be transmitted.

2.9 Bidirectional coding

When an object moves, it concealsthe background at its leadingedge and reveals the backgroundat its trailing edge. The revealedbackground requires new data tobe transmitted because the areaof background was previouslyconcealed and no informationcan be obtained from a previouspicture. A similar problemoccurs if the camera pans: newareas come into view and nothingis known about them. MPEGhelps to minimize this problemby using bidirectional coding,which allows information to betaken from pictures before andafter the current picture. If abackground is being revealed, itwill be present in a later picture,and the information can be

14

Revealed Area isNot in Picture (N)

Revealed Area isin Picture N+2

Picture(N)

Picture(N+1)

Picture(N+2)

TIME

Figure 2.14.

Figure 2.14 introduces the concept of the GOP or Group Of Pictures. The GOP beginswith an I picture and then has P pictures spaced throughout. The remaining pictures are B pictures. The GOP is definedas ending at the last picturebefore the next I picture. TheGOP length is flexible, but 12 or15 pictures is a common value.Clearly, if data for B pictures areto be taken from a future picture,that data must already be avail-able at the decoder. Consequently,bidirectional coding requiresthat picture data is sent out ofsequence and temporarilystored. Figure 2.14 also showsthat the P-picture data are sentbefore the B-picture data. Notethat the last B pictures in theGOP cannot be transmitted untilafter the I picture of the nextGOP since this data will beneeded to bidirectionally decodethem. In order to return picturesto their correct sequence, a tem-poral reference is included witheach picture. As the picture rateis also embedded periodically inheaders in the bit stream, anMPEG file may be displayed by,for example, a personal computer,in the correct order and timescale.

Sending picture data out ofsequence requires additionalmemory at the encoder anddecoder and also causes delay.The number of bidirectionally

15

Rec 601Video Frames

ElementaryStream

temporal_reference

I B PB B PB B B I B PB

Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame

I IB B B B B BB BP P P

0 3 1 2 6 4 5 0 7 8 3 1 2

Bit RateMBit/Sec

LowerQuality

HigherQuality

ConstantQualityCurve

10 –

20 –

30 –

40 –

50 –

I IB IBBP

Figure 2.15.

coded pictures between intra- or forward-predicted picturesmust be restricted to reduce costand minimize delay, if delay isan issue.

Figure 2.15 shows the tradeoffthat must be made between

compression factor and codingdelay. For a given quality, sendingonly I pictures requires morethan twice the bit rate of anIBBP sequence. Where the abilityto edit is important, an IBsequence is a useful compromise.

Figure 2.16a.

the data pass straight through tobe spatially coded. Subtractoroutput data also pass to a framestore that can hold several pic-tures. The I picture is held inthe store.

order. The data then enter thesubtractor and the motion esti-mator. To create an I picture, seeFigure 2.16a, the end of theinput delay is selected and thesubtractor is turned off, so that

2.11 An MPEG compressor

Figures 2.16a, b, and c show atypical bidirectional motioncompensator structure. Pre-processed input video enters aseries of frame stores that can bebypassed to change the picture

16

Reordering Frame Delay QuantizingTables

In

SpatialCoder

RateControl

BackwardPrediction

Error

Disablefor I,P

SpatialData

VectorsOut

ForwardPrediction

Error

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

Out

F

B

F

B

QT RC

Subtract SpatialCoder

Forward- Backward

Decision

ForwardPredictor

BackwardPredictor

PastPicture

MotionEstimator

FuturePicture

SpatialDecoder

SpatialDecoder

Subtract

Pass (I)

I Pictures(shaded areas are unused)

Figure 2.16b.


In

SpatialCoder

RateControl

BackwardPrediction

Error

Disablefor I,P

SpatialData

VectorsOut

ForwardPrediction

Error

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

Out

F

B

F

B

QT RC


Forward- Backward

Decision

BackwardPredictor

FuturePicture

SpatialDecoder

P Pictures(shaded areas are unused)

ForwardPredictor

MotionEstimator

PastPicture

SpatialDecoder

Subtract

Pass (I)

17

To encode a P picture, seeFigure 2.16b, the B pictures inthe input buffer are bypassed, sothat the future picture is selected.The motion estimator comparesthe I picture in the output store

with the P picture in the inputstore to create forward motionvectors. The I picture is shiftedby these vectors to make a pre-dicted P picture. The predictedP picture is subtracted from theactual P picture to produce the

prediction error, which is spatially coded and sent alongwith the vectors. The predictionerror is also added to the pre-dicted P picture to create a locallydecoded P picture that alsoenters the output store.

18

Figure 2.16c.


In

SpatialCoder

RateControl

BackwardPrediction

Error

Disablefor I,P

SpatialData

VectorsOut

ForwardPrediction

Error

Norm

Reorder

ForwardVectors

BackwardVectors

GOPControl

CurrentPicture

Out

F

B

F

B

QT RC

ForwardPredictor

MotionEstimator

PastPicture

SpatialDecoder

Subtract

Pass (I)


Forward- Backward

Decision

BackwardPredictor

FuturePicture

SpatialDecoder

B Pictures(shaded area is unused)

output is spatially coded andthe vectors are added in a multi-plexer. Syntactical data is alsoadded which identifies the typeof picture (I, P or B) and pro-vides other information to help adecoder (see section 4). The out-put data are buffered to allowtemporary variations in bit rate.If the bit rate shows a long termincrease, the buffer will tend tofill up and to prevent overflowthe quantization process willhave to be made more severe.Equally, should the buffer showsigns of underflow, the quantiza-tion will be relaxed to maintainthe average bit rate. This meansthat the store contains exactlywhat the store in the decoderwill contain, so that the resultsof all previous coding errors arepresent. These will automaticallybe reduced when the predictedpicture is subtracted from theactual picture because the differ-ence data will be more accurate.

forward or backward data areselected according to which rep-resent the smallest differences.The picture differences are thenspatially coded and sent withthe vectors.

When all of the intermediate B pictures are coded, the inputmemory will once more bebypassed to create a new P pic-ture based on the previous P picture.

Figure 2.17 shows an MPEGcoder. The motion compensator

The output store then containsan I picture and a P picture. A B picture from the input buffercan now be selected. The motioncompensator, see Figure 2.16c,will compare the B picture withthe I picture that precedes it andthe P picture that follows it toobtain bidirectional vectors.Forward and backward motioncompensation is performed toproduce two predicted B pictures.These are subtracted from thecurrent B picture. On a macro-block-by-macroblock basis, the

Figure 2.17.

In

Out

DemandClock

MUX Buffer

QuantizingTables

SpatialData

MotionVectors

SyntacticalData

Rate Control

Entropy andRun Length Coding

DifferentialCoder

BidirectionalCoder

(Fig. 2.13)

this behavior would result in aserious increase in vector data.In 60 Hz video, 3:2 pulldown isused to obtain 60 Hz from 24 Hzfilm. One frame is made intotwo fields, the next is made intothree fields, and so on.

Consequently, one field in fiveis completely redundant. MPEGhandles film material best bydiscarding the third field in 3:2systems. A 24 Hz code in thetransmission alerts the decoderto recreate the 3:2 sequence byre-reading a field store. In 50and 60 Hz telecine, pairs offields are deinterlaced to createframes, and then motion is measured between frames. Thedecoder can recreate interlaceby reading alternate lines in theframe store.

A cut is a difficult event for acompressor to handle because itresults in an almost completeprediction failure, requiring alarge amount of correction data.If a coding delay can be tolerated,a coder may detect cuts inadvance and modify the GOPstructure dynamically, so thatthe cut is made to coincide withthe generation of an I picture. Inthis case, the cut is handledwith very little extra data. Thelast B pictures before the I framewill almost certainly need touse forward prediction. In someapplications that are not real-time, such as DVD mastering, acoder could take two passes atthe input video: one pass toidentify the difficult or highentropy areas and create a codingstrategy, and a second pass toactually compress the input video.

If a high compression factor isrequired, the level of artifactscan increase, especially if inputquality is poor. In this case, itmay be better to reduce theentropy entering the coder usingprefiltering. The video signal issubject to two-dimensional,low-pass filtering, whichreduces the number of coeffi-cients needed and reduces thelevel of artifacts. The picturewill be less sharp, but lesssharpness is preferable to a highlevel of artifacts.

In most MPEG-2 applications,4:2:0 sampling is used, whichrequires a chroma downsam-pling process if the source is4:2:2. In MPEG-1, the luminanceand chroma are further down-sampled to produce an inputpicture or SIF (Source InputFormat), that is only 352-pixelswide. This technique reducesthe entropy by a further factor.For very high compression, theQSIF (Quarter Source InputFormat) picture, which is 176-pixels wide, is used.Downsampling is a process thatcombines a spatial low-pass filterwith an interpolator. Downsam-pling interlaced signals is prob-lematic because vertical detail isspread over two fields whichmay decorrelate due to motion.

When the source material istelecine, the video signal hasdifferent characteristics thannormal video. In 50 Hz video,pairs of fields represent thesame film frame, and there is nomotion between them. Thus, themotion between fields alternatesbetween zero and the motionbetween frames. Since motionvectors are sent differentially,

2.12 Preprocessing

A compressor attempts to elimi-nate redundancy within the picture and between pictures.Anything which reduces thatredundancy is undesirable.Noise and film grain are particu-larly problematic because theygenerally occur over the entirepicture. After the DCT process,noise results in more non-zerocoefficients, which the codercannot distinguish from genuinepicture data. Heavier quantizingwill be required to encode all ofthe coefficients, reducing picturequality. Noise also reduces simi-larities between successive pic-tures, increasing the differencedata needed.

Residual subcarrier in videodecoded from composite videois a serious problem because itresults in high, spatial frequenciesthat are normally at a low levelin component programs. Subcar-rier also alternates from pictureto picture causing an increase indifference data. Naturally, anycomposite decoding artifact that is visible in the input to theMPEG coder is likely to be reproduced at the decoder.

Any practice that causes un-wanted motion is to be avoided.Unstable camera mountings, inaddition to giving a shaky picture,increase picture differences andvector transmission requirements.This will also happen withtelecine material if film weaveor hop due to sprocket holedamage is present. In general,video that is to be compressedmust be of the highest qualitypossible. If high quality cannotbe achieved, then noise reductionand other stabilization techniqueswill be desirable.

19

Figure 2.18.

picture with moderate signal-to-noise ratio results. If, however,that picture is locally decodedand subtracted pixel-by-pixelfrom the original, a quantizingnoise picture results. This picturecan be compressed and trans-mitted as the helper signal. Asimple decoder only decodesthe main, noisy bit stream, but amore complex decoder candecode both bit streams andcombine them to produce a low noise picture. This is theprinciple of SNR scaleability.

As an alternative, coding onlythe lower spatial frequencies ina HDTV picture can produce amain bit stream that an SDTVreceiver can decode. If the lowerdefinition picture is locallydecoded and subtracted fromthe original picture, a definition-enhancing picture would result.This picture can be coded into ahelper signal. A suitable decodercould combine the main andhelper signals to recreate theHDTV picture. This is the principle of Spatial scaleability.

The High profile supports bothSNR and spatial scaleability aswell as allowing the option of4:2:2 sampling.

The 4:2:2 profile has beendeveloped for improved compat-ibility with digital productionequipment. This profile allows4:2:2 operation without requiringthe additional complexity ofusing the high profile. Forexample, an HP@ML decodermust support SNR scaleability,which is not a requirement forproduction. The 4:2:2 profilehas the same freedom of GOPstructure as other profiles, butin practice it is commonly usedwith short GOPs making editingeasier. 4:2:2 operation requires ahigher bit rate than 4:2:0, andthe use of short GOPs requiresan even higher bit rate for agiven quality.

low level uses a low resolutioninput having only 352 pixelsper line. The majority of broad-cast applications will requirethe MP@ML (Main Profile atMain Level) subset of MPEG,which supports SDTV (StandardDefinition TV).

The high-1440 level is a highdefinition scheme that doublesthe definition compared to themain level. The high level notonly doubles the resolution butmaintains that resolution with16:9 format by increasing thenumber of horizontal samplesfrom 1440 to 1920.

In compression systems usingspatial transforms and requan-tizing, it is possible to producescaleable signals. A scaleableprocess is one in which the inputresults in a main signal and a"helper" signal. The main signalcan be decoded alone to give apicture of a certain quality, but,if the information from the helpersignal is added, some aspect ofthe quality can be improved.

For example, a conventionalMPEG coder, by heavily requan-tizing coefficients, encodes a

2.13 Profiles and levels

MPEG is applicable to a widerange of applications requiringdifferent performance and com-plexity. Using all of the encodingtools defined in MPEG, there aremillions of combinations possi-ble. For practical purposes, theMPEG-2 standard is dividedinto profiles, and each profile is subdivided into levels (seeFigure 2.18). A profile is basicallya subset of the entire codingrepertoire requiring a certaincomplexity. A level is a parame-ter such as the size of the pictureor bit rate used with that profile.In principle, there are 24 combi-nations, but not all of these havebeen defined. An MPEG decoderhaving a given Profile and Levelmust also be able to decodelower profiles and levels.

The simple profile does not sup-port bidirectional coding, and soonly I and P pictures will beoutput. This reduces the codingand decoding delay and allowssimpler hardware. The simpleprofile has only been defined atMain level (SP@ML).

The Main Profile is designed fora large proportion of uses. The

20

HIGH

HIGH-1440

MAIN

LOW

LEVELPROFILE

4:2:01920x1152

80 Mb/sI,P,B

4:2:01440x1152

60 Mb/sI,P,B

4:2:0720x57615 Mb/s

I,P,B

4:2:0352x2884 Mb/sI,P,B

4:2:0720x57615 Mb/s

I,P

SIMPLE MAIN

4:2:2720x60850 Mb/s

I,P,B

4:2:2PROFILE

4:2:0352x2884 Mb/sI,P,B

4:2:0720x57615 Mb/s

I,P,B

4:2:01440x1152

60 Mb/sI,P,B

4:2:0, 4:2:21440x1152

80 Mb/sI,P,B

4:2:0, 4:2:2720x57620 Mb/s

I,P,B

4:2:0, 4:2:21920x1152100 Mb/s

I,P,B

SNR SPATIAL HIGH

Figure 2.19.

of pitch in steady tones.

For video coding, wavelets havethe advantage of producing reso-lution scaleable signals withalmost no extra effort. In movingvideo, the advantages of waveletsare offset by the difficulty ofassigning motion vectors to avariable size block, but in still-picture or I-picture coding this

Figure 2.19 contrasts the fixedblock size of the DFT/DCT withthe variable size of the wavelet.

Wavelets are especially usefulfor audio coding because theyautomatically adapt to the con-flicting requirements of theaccurate location of transients intime and the accurate assessment

2.14 Wavelets

All transforms suffer fromuncertainty because the moreaccurately the frequency domainis known, the less accurately thetime domain is known (and viceversa). In most transforms suchas DFT and DCT, the blocklength is fixed, so the time andfrequency resolution is fixed.The frequency coefficients rep-resent evenly spaced values ona linear scale. Unfortunately,because human senses are loga-rithmic, the even scale of theDFT and DCT gives inadequatefrequency resolution at one endand excess resolution at the other.

The wavelet transform is notaffected by this problem becauseits frequency resolution is afixed fraction of an octave andtherefore has a logarithmic characteristic. This is done bychanging the block length as afunction of frequency. As fre-quency goes down, the blockbecomes longer. Thus, a charac-teristic of the wavelet transformis that the basis functions allcontain the same number ofcycles, and these cycles are sim-ply scaled along the time axis tosearch for different frequencies.

21

FFTWavelet

Transform

ConstantSize Windows

in FFT

ConstantNumber of Cyclesin Basis Function

Figure 3.1.

frequencies available determinesthe frequency range of humanhearing, which in most peopleis from 20 Hz to about 15 kHz.

Different frequencies in theinput sound cause differentareas of the membrane tovibrate. Each area has differentnerve endings to allow pitchdiscrimination. The basilarmembrane also has tiny musclescontrolled by the nerves thattogether act as a kind of positivefeedback system that improvesthe Q factor of the resonance.

The resonant behavior of thebasilar membrane is an exactparallel with the behavior of atransform analyzer. Accordingto the uncertainty theory oftransforms, the more accuratelythe frequency domain of a signalis known, the less accurately thetime domain is known.Consequently, the more able atransform is able to discriminatebetween two frequencies, theless able it is to discriminatebetween the time of two events.Human hearing has evolvedwith a certain compromise thatbalances time-uncertainty discrimination and frequencydiscrimination; in the balance,neither ability is perfect.

The imperfect frequency discrimination results in theinability to separate closelyspaced frequencies. This inabilityis known as auditory masking,defined as the reduced sensitivityto sound in the presence of another.

The physical hearing mechanismconsists of the outer, middleand inner ears. The outer earcomprises the ear canal and theeardrum. The eardrum convertsthe incident sound into a vibra-tion in much the same way asdoes a microphone diaphragm.The inner ear works by sensingvibrations transmitted through afluid. The impedance of fluid ismuch higher than that of air andthe middle ear acts as an imped-ance-matching transformer thatimproves power transfer.

Figure 3.1 shows that vibrationsare transferred to the inner earby the stirrup bone, which actson the oval window. Vibrationsin the fluid in the ear travel upthe cochlea, a spiral cavity inthe skull (shown unrolled inFigure 3.1 for clarity). The basilarmembrane is stretched acrossthe cochlea. This membranevaries in mass and stiffnessalong its length. At the end nearthe oval window, the membraneis stiff and light, so its resonantfrequency is high. At the distantend, the membrane is heavy andsoft and resonates at low fre-quency. The range of resonant

SECTION 3 AUDIO COMPRESSIONLossy audio compression isbased entirely on the character-istics of human hearing, whichmust be considered before anydescription of compression ispossible. Surprisingly, humanhearing, particularly in stereo, isactually more critically discrim-inating than human vision, andconsequently audio compressionshould be undertaken with care.As with video compression,audio compression requires anumber of different levels ofcomplexity according to therequired compression factor.

3.1 The hearing mechanism

Hearing comprises physicalprocesses in the ear and nervous/mental processes that combineto give us an impression ofsound. The impression wereceive is not identical to theactual acoustic waveform presentin the ear canal because someentropy is lost. Audio compres-sion systems that lose only thatpart of the entropy that will belost in the hearing mechanismwill produce good results.

22

OuterEar

EarDrum

MiddleEar

StirrupBone

Basilar Membrane

InnerEar

Cochlea (unrolled)

Figure 3.2a.

3.2 Subband coding

Figure 3.4 shows a band-splittingcompandor. The band-splittingfilter is a set of narrow-band,linear-phase filters that overlapand all have the same band-width. The output in each bandconsists of samples representinga waveform. In each frequencyband, the audio input is ampli-fied up to maximum level priorto transmission. Afterwards, eachlevel is returned to its correctvalue. Noise picked up in thetransmission is reduced in eachband. If the noise reduction iscompared with the threshold ofhearing, it can be seen thatgreater noise can be tolerated insome bands because of masking.Consequently, in each band aftercompanding, it is possible toreduce the wordlength of sam-ples. This technique achieves acompression because the noiseintroduced by the loss of resolu-tion is masked.

be present for at least about 1 millisecond before it becomesaudible. Because of this slowresponse, masking can still takeplace even when the two signalsinvolved are not simultaneous.Forward and backward maskingoccur when the masking soundcontinues to mask sounds atlower levels before and after themasking sound's actual duration.Figure 3.3 shows this concept.

Masking raises the threshold ofhearing, and compressors takeadvantage of this effect by raisingthe noise floor, which allowsthe audio waveform to beexpressed with fewer bits. Thenoise floor can only be raised atfrequencies at which there iseffective masking. To maximizeeffective masking, it is necessaryto split the audio spectrum intodifferent frequency bands toallow introduction of differentamounts of companding andnoise in each band.

Figure 3.2a shows that thethreshold of hearing is a functionof frequency. The greatest sensi-tivity is, not surprisingly, in thespeech range. In the presence ofa single tone, the threshold ismodified as in Figure 3.2b. Notethat the threshold is raised fortones at higher frequency and tosome extent at lower frequency.In the presence of a complexinput spectrum, such as music,the threshold is raised at nearlyall frequencies. One consequenceof this behavior is that the hissfrom an analog audio cassette isonly audible during quiet pas-sages in music. Compandingmakes use of this principle byamplifying low-level audio sig-nals prior to recording or trans-mission and returning them totheir correct level afterwards.

The imperfect time discrimina-tion of the ear is due to its reso-nant response. The Q factor issuch that a given sound has to

23

SoundPressure

20 Hz

Thresholdin Quiet

1 kHz 20 kHz

SoundPressure

20 Hz

MaskingThreshold

1 kHz 20 kHz

1 kHz Sine Wave

Figure 3.2b.

Figure 3.3.

SoundPressure

Post-Masking

Time

Pre-Masking

Figure 3.4.

Gain Factor

CompandedAudio

Input Sub-BandFilter

LevelDetect

X

Figure 3.5.

be reversed at the decoder.

The filter bank output is alsoanalyzed to determine the spec-trum of the input signal. Thisanalysis drives a masking modelthat determines the degree ofmasking that can be expected ineach band. The more maskingavailable, the less accurate thesamples in each band can be.The sample accuracy is reducedby requantizing to reducewordlength. This reduction isalso constant for every word ina band, but different bands canuse different wordlengths. Thewordlength needs to be trans-mitted as a bit allocation codefor each band to allow thedecoder to deserialize the bitstream properly.

3.3 MPEG Layer 1

Figure 3.6 shows an MPEGLevel 1 audio bit stream.Following the synchronizingpattern and the header, there are32-bit allocation codes of fourbits each. These codes describethe wordlength of samples ineach subband. Next come the 32scale factors used in the com-panding of each band. Thesescale factors determine the gainneeded in the decoder to returnthe audio to the correct level.The scale factors are followed,in turn, by the audio data ineach band.

Figure 3.7 shows the Layer 1decoder. The synchronizationpattern is detected by the timinggenerator, which deserializes thebit allocation and scale factordata. The bit allocation data thenallows deserialization of thevariable length samples. Therequantizing is reversed and thecompression is reversed by thescale factor data to put eachband back to the correct level.These 32 separate bands arethen combined in a combinerfilter which produces the audio output.

output of the filter there are 12samples in each of 32 bands.Within each band, the level isamplified by multiplication tobring the level up to maximum.The gain required is constant forthe duration of a block, and asingle scale factor is transmittedwith each block for each bandin order to allow the process to

Figure 3.5 shows a simple band-splitting coder as is used inMPEG Layer 1. The digitalaudio input is fed to a band-splitting filter that divides thespectrum of the signal into anumber of bands. In MPEG thisnumber is 32. The time axis isdivided into blocks of equallength. In MPEG layer 1, this is384 input samples, so in the

24

In

OutMUX

X32

MaskingThresholds

BandsplittingFilter32

Subbands

Scalerand

Quantizer

DynamicBit and

Scale FactorAllocater

and Coder

}

SubbandSamples

384 PCM Audio Input SamplesDuration 8 msec @ 48 kHz

Header

20 Bit System12 Bit Sync

CRC

Optional

Bit Allocation

4 Bit Linear

Scale Factors

6 Bit Linear

Anc Data

UnspecifiedLength

GR0 GR1 GR2 GR11

0 1 2 31

MPEGLayer 1

InOut

SyncDET

32InputFilterX

ScaleFactors

X32

X32

SamplesX32Bit

Allocation

Demux

Figure 3.6.

Figure 3.7.

Figure 3.8.

performed by the psychoacousticmodel. It has been found thatpre-echo is associated with theentropy in the audio risingabove the average value. Toobtain the highest compressionfactor, nonuniform quantizing of the coefficients is used alongwith Huffman coding. This technique allocates the shortestwordlengths to the most commoncode values.

3.7 AC-3

The AC-3 audio coding techniqueis used with the ATSC systeminstead of one of the MPEGaudio coding schemes. AC-3 is atransform-based system thatobtains coding gain by requan-tizing frequency coefficients.

The PCM input to an AC-3coder is divided into overlappingwindowed blocks as shown inFigure 3.9. These blocks contain512 samples each, but becauseof the complete overlap, there is100% redundancy. After thetransform, there are 512 coeffi-cients in each block, but becauseof the redundancy, these coeffi-cients can be decimated to 256coefficients using a techniquecalled Time Domain AliasingCancellation (TDAC).

response cannot increase orreduce rapidly. Consequently, ifan audio waveform is trans-formed into the frequencydomain, the coefficients do notneed to be sent very often. Thisprinciple is the basis of transformcoding. For higher compressionfactors, the coefficients can berequantized, making them lessaccurate. This process producesnoise which will be placed atfrequencies where the maskingis the greatest. A by-product of atransform coder is that the inputspectrum is accurately known,so a precise masking model canbe created.

3.6 MPEG Layer 3

This complex level of coding isreally only required when thehighest compression factor isneeded. It has a degree of com-monality with Layer 2. A discretecosine transform is used having384 output coefficients per block.This output can be obtained bydirect processing of the inputsamples, but in a multi-levelcoder, it is possible to use ahybrid transform incorporatingthe 32-band filtering of Layers 1and 2 as a basis. If this is done,the 32 subbands from the QMF(Quadrature Mirror Filter) areeach further processed by a 12-band MDCT (ModifiedDiscreet Cosine Transform) toobtain 384 output coefficients.

Two window sizes are used toavoid pre-echo on transients.The window switching is

3.4 MPEG Layer 2

Figure 3.8 shows that when theband-splitting filter is used todrive the masking model, thespectral analysis is not veryaccurate, since there are only 32bands and the energy could beanywhere in the band. Thenoise floor cannot be raised verymuch because, in the worst caseshown, the masking may notoperate. A more accurate spectralanalysis would allow a highercompression factor. In MPEGlayer 2, the spectral analysis isperformed by a separate process.A 512-point FFT working directlyfrom the input is used to drivethe masking model instead. Toresolve frequencies more accu-rately, the time span of thetransform has to be increased,which is done by raising theblock size to 1152 samples.

While the block-compandingscheme is the same as in Layer 1,not all of the scale factors aretransmitted, since they contain adegree of redundancy on realprogram material. The scale factorof successive blocks in the sameband exceeds 2 dB less than10% of the time, and advantageis taken of this characteristic byanalyzing sets of three successivescale factors. On stationary pro-grams, only one scale factor outof three is sent. As transientcontent increases in a given sub-band, two or three scale factorswill be sent. A scale factorselect code is also sent to allowthe decoder to determine whathas been sent in each subband.This technique effectivelyhalves the scale factor bit rate.

3.5 Transform coding

Layers 1 and 2 are based onband-splitting filters in whichthe signal is still represented asa waveform. However, Layer 3adopts transform coding similarto that used in video coding. Aswas mentioned above, the earperforms a kind of frequencytransform on the incident soundand because of the Q factor ofthe basilar membrane, the

25

WideSub-Band

Is the MaskingLike This?

Or Like This?

512Samples

Figure 3.9.

Figure 4.1.

in absolute form. The remainderare transmitted differentiallyand the decoder adds the differ-ence to the previous exponent.Where the input audio has asmooth spectrum, the exponentsin several frequency bands maybe the same. Exponents can begrouped into sets of two or fourwith flags that describe whathas been done.

Sets of six blocks are assembledinto an AC-3 sync frame. Thefirst block of the frame alwayshas full exponent data, but incases of stationary signals, laterblocks in the frame can use thesame exponents.

analysis of the input to a finiteaccuracy on a logarithmic scalecalled the spectral envelope. Thisspectral analysis is the input tothe masking model that deter-mines the degree to which noisecan be raised at each frequency.

The masking model drives therequantizing process, whichreduces the accuracy of eachcoefficient by rounding themantissae. A significant propor-tion of the transmitted data consist of these mantissae.

The exponents are also transmit-ted, but not directly as there isfurther redundancy within themthat can be exploited. Within ablock, only the first (lowest fre-quency) exponent is transmitted

The input waveform is analyzed,and if there is a significant tran-sient in the second half of theblock, the waveform will besplit into two to prevent pre-echo. In this case, the number ofcoefficients remains the same,but the frequency resolutionwill be halved and the temporalresolution will be doubled. Aflag is set in the bit stream toindicate to the decoder that thishas been done.

The coefficients are output infloating-point notation as a mantissa and an exponent. Therepresentation is the binaryequivalent of scientific notation.Exponents are effectively scalefactors. The set of exponents ina block produce a spectral

26

decoders. The approach alsoallows the use of proprietarycoding algorithms, which neednot enter the public domain.

4.1 Video elementary stream syntax

Figure 4.1 shows the construc-tion of the elementary videostream. The fundamental unit ofpicture information is the DCTblock, which represents an 8 x 8array of pixels that can be Y, Cr,or Cb. The DC coefficient is sentfirst and is represented moreaccurately than the other coeffi-cients. Following the remainingcoefficients, an end of block(EOB) code is sent.

encoders. By standardizing thedecoder, they can be made atlow cost. In contrast, theencoder can be more complexand more expensive without agreat cost penalty, but with thepotential for better picture quality as complexity increases.When the encoder and thedecoder are different in com-plexity, the coding system issaid to be asymmetrical.

The MPEG approach also allowsfor the possibility that qualitywill improve as coding algo-rithms are refined while stillproducing bit streams that canbe understood by earlier

SECTION 4 ELEMENTARY STREAMSAn elementary stream is basicallythe raw output of an encoderand contains no more than isnecessary for a decoder toapproximate the original pictureor audio. The syntax of the com-pressed signal is rigidly definedin MPEG so that decoders canbe guaranteed to operate on it.The encoder is not definedexcept that it must somehowproduce the right syntax.

The advantage of this approachis that it suits the real world inwhich there are likely to bemany more decoders than

MotionVector

SyncPattern

GlobalVectors

PictureSize

AspectRatio

ProgressiveInterlace

ChromaType

QuantizingMatrices

PictureRate

Level

Profile

I.P.B. Time Open/Closed

XnPictureSliceMacro

BlockGroup

ofPictures

VideoSequence

Blockof

CoefficientsXn Xn Xn Xn

Without the sequence headerdata, a decoder cannot under-stand the bit stream, and there-fore sequence headers becomeentry points at which decoderscan begin correct operation. Thespacing of entry points influencesthe delay in correct decodingthat occurs when the viewerswitches from one televisionchannel to another.

4.2 Audio elementary streams

Various types of audio can beembedded in an MPEG-2 multi-plex. These types include audiocoded according to MPEG layers1, 2, 3, or AC-3. The type ofaudio encoding used must beincluded in a descriptor that adecoder will read in order toinvoke the appropriate type of decoding.

The audio compression processis quite different from the videoprocess. There is no equivalentto the different I, P, and B frametypes, and audio frames alwayscontain the same amount ofaudio data. There is no equiva-lent of bidirectional coding andaudio frames are not transmittedout of sequence.

In MPEG-2 audio, the descriptorin the sequence header containsthe layer that has been used tocompress the audio and the typeof compression used (for exam-ple, joint stereo), along with theoriginal sampling rate. Theaudio sequence is assembledfrom a number of access units(AUs) which will be codedaudio frames.

If AC-3 coding is used, as inATSC, this usage will be reflectedin the sequence header. Theaudio access unit (AU) is an AC-3 sync frame as described inSection 3.7. The AC-3 syncframe represents a time spanequivalent of 1536 audio samplesand will be 32 ms for 48 kHzsampling and 48 ms for 32 kHz.

structure allows recovery byproviding a desynchronizingpoint in the bit stream.

A number of slices are combinedto make a picture that is theactive part of a field or a frame.The picture header defineswhether the picture was I, P, orB coded and includes a temporalreference so that the picture canbe presented at the correct time.In the case of pans and tilts, thevectors in every macroblock willbe the same. A global vector canbe sent for the whole picture,and the individual vectors thenbecome differences from thisglobal value.

Pictures may be combined toproduce a GOP (group of pic-tures) that must begin with an Ipicture. The GOP is the funda-mental unit of temporal coding.In the MPEG standard, the useof a GOP is optional, but it is apractical necessity. Between Ipictures, a variable number of Pand/or B pictures may be placedas was described in Section 2. AGOP may be open or closed. Ina closed GOP, the last B picturesdo not require the I picture inthe next GOP for decoding andthe bit stream could be cut atthe end of the GOP.

If GOPs are used, several GOPsmay be combined to produce avideo sequence. The sequencebegins with a sequence startcode, followed by a sequenceheader and ends with asequence end code. Additionalsequence headers can be placedthroughout the sequence. Thisapproach allows decoding tobegin part way through thesequence, as might happen inplayback of digital video discsand tape cassettes. Thesequence header specifies thevertical and horizontal size ofthe picture, the aspect ratio, thechroma subsampling format, thepicture rate, the use of progres-sive scan or interlace, the profile,level, and bit rate, and the quantizing matrices used inintra and inter-coded pictures.

Blocks are assembled into macroblocks, which are the fundamental units of a pictureand which can be motion com-pensated. Each macroblock hasa two-dimensional motion vectorin the header. In B pictures, thevectors can be backward as wellas forward. The motion compen-sation can be field or framebased and this is indicated. Thescale used for coefficientrequantizing is also indicated.Using the vectors, the decoderobtains information from earlierand later pictures to produce apredicted picture. The blocksare inverse transformed to pro-duce a correction picture that isadded to the predicted pictureto produce the decoded output.In 4:2:0 coding, each macroblockwill have four Y blocks and twocolor difference blocks. To makeit possible to identify whichblock describes which compo-nent, the blocks are sent in aspecified order.

Macroblocks are assembled intoslices that must always representhorizontal strips of picture fromleft to right. In MPEG, slices canstart anywhere and be of arbitrarysize, but in ATSC (AdvancedTelevision Systems Committee)they must start at the left-handedge of the picture. Severalslices can exist across the screenwidth. The slice is the funda-mental unit of synchronizationfor variable length and differen-tial coding. The first vectors in aslice are sent absolutely, whereasthe remaining vectors are trans-mitted differentially. In I pictures,the first DC coefficients in theslice are sent absolutely and theremaining DC coefficients aretransmitted differentially. In dif-ference pictures, this techniqueis not worthwhile.

In the case of a bit error in theelementary stream, either thedeserialization of the variablelength symbols will break downor subsequent differentially-coded coefficients or vectorswill be incorrect. The slice

27

Figure 5.1.

A time stamp is a 33-bit numberthat is a sample of a counter driven by a 90-kHz clock. Thisclock is obtained by dividingthe 27-MHz program clock by300. Since presentation timesare evenly spaced, it is notessential to include a timestamp in every presentationunit. Instead, time stamps canbe interpolated by the decoder,but they must not be more than700 ms apart in either programstreams or transport streams.

Time stamps indicate where aparticular access unit belongs intime. Lip sync is obtained byincorporating time stamps intothe headers in both video andaudio PES packets. When adecoder receives a selected PESpacket, it decodes each accessunit and buffers it into RAM.When the time line count reachesthe value of the time stamp, theRAM is read out. This operationhas two desirable results. First,effective timebase correction isobtained in each elementarystream. Second, the video andaudio elementary streams can besynchronized together to make a program.

5.3 PTS/DTS

When bidirectional coding isused, a picture may have to bedecoded some time before it ispresented, so that it can act asthe source of data for a B picture.Although, for example, picturescan be presented in the orderIBBP, they will be transmitted inthe order IPBB. Consequently,two types of time stamp exist.The decode time stamp (DTS)indicates the time when a picturemust be decoded, whereas apresentation time stamp (PTS)indicates when it must be pre-sented to the decoder output.

B pictures are decoded and pre-sented simultaneously so thatthey only contain PTS. When anIPBB sequence is received, bothI and P must be decoded beforethe first B picture. A decodercan only decode one picture at atime; therefore the I picture isdecoded first and stored. Whilethe P picture is being decoded,the decoded I picture is outputso that it can be followed by theB pictures.

two parameters (start code prefixand stream ID) comprise thepacket start code that identifiesthe beginning of a packet. It isimportant not to confuse thepacket in a PES with the muchsmaller packet used in transportstreams that, unfortunately,shares the same name.

Because MPEG only defines thetransport stream, not theencoder, a designer mightchoose to build a multiplexerthat converts from elementarystreams to a transport stream inone step. In this case, the PESpackets may never exist in anidentifiable form, but instead,they are logically present in thetransport stream payload.

5.2 Time stamps

After compression, pictures aresent out of sequence because ofbidirectional coding. Theyrequire a variable amount ofdata and are subject to variabledelay due to multiplexing andtransmission. In order to keepthe audio and video lockedtogether, time stamps are periodically incorporated ineach picture.

SECTION 5 PACKETIZED ELEMENTARYSTREAMS (PES)For practical purposes, the continuous elementary streamscarrying audio or video fromcompressors need to be brokeninto packets. These packets areidentified by headers that containtime stamps for synchronizing.PES packets can be used to create Program Streams orTransport Streams.

5.1 PES packets

In the Packetized ElementaryStream (PES), an endless ele-mentary stream is divided intopackets of a convenient size forthe application. This size mightbe a few hundred kilobytes,although this would vary withthe application.

Each packet is preceded by aPES packet header. Figure 5.1shows the contents of a header.The packet begins with a startcode prefix of 24 bits and astream ID that identifies thecontents of the packet as videoor audio and further specifiesthe type of audio coding. These

28

Figure 5.2.

PTS and DTS time stamp.Audio packets may contain sev-eral access units and the packetheader contains a PTS. Becauseaudio packets are never trans-mitted out of sequence, there isno DTS in an audio packet.

and P, the difference betweenDTS and PTS in the P pictureswill be greater.

The PTS/DTS flags in the packetheader are set to indicate thepresence of PTS alone or both

Figure 5.2 shows that when an access unit containing an Ipicture is received, it will haveboth DTS and PTS in the headerand these time stamps will beseparated by one picture period.If bidirectional coding is beingused, a P picture must followand this picture also has a DTSand a PTS time stamp, but theseparation between the twostamp times is three pictureperiods to allow for the inter-vening B pictures. Thus, if IPBBis received, I is delayed one pic-ture period, P is delayed threepicture periods, the two Bs arenot delayed at all, and the presentation sequence becomesIBBP. Clearly, if the GOP struc-ture is changed such that thereare more B pictures between I

29

PTS

DTS

I P1 P2 B3

N+1 N+4

N+1 N+4

N+2 N+3 N+7 N+5

N — — —

Time (inPictures) N N+1 N+2 N+3 N+4 N+5

Decode I Decode P1 Decode P2 Decode B3Decode B2Decode B1

N N+1 N+2 N+3 N+4 N+5

Present I Present B1 Present B2 Present P2 Present B3

I B B BP

B2B1

6.2 Introduction to program streams

A Program Stream is a PESpacket multiplex that carriesseveral elementary streams thatwere encoded using the samemaster clock or system timeclock (STC). This stream mightbe a video stream and its associ-ated audio streams, or a multi-channel audio-only program.The elementary video stream isdivided into Access Units (AUs),each of which contains com-pressed data describing one picture. These pictures are iden-tified as I, P, or B and each carriesan Access Unit number thatindicates the correct displaysequence. One video AccessUnit becomes one program-stream packet. In video, thesepackets vary in size. For example,an I picture packet will be muchlarger than a B picture packet.Digital audio Access Units aregenerally of the same size andseveral are assembled into oneprogram stream packet. Thesepackets should not be confusedwith transport stream packetsthat are smaller and of fixedsize. Video and audio AccessUnit boundaries rarely coincideon the time axis, but this lack ofcoincidence is not a problembecause each boundary has itsown time stamp structure.

will tend to empty the buffer,and the drive system will simplyincrease the access rate to restorebalance. This technique onlyworks if the audio and videowere encoded from the sameclock; otherwise, they will slipover the length of the recording.

To satisfy these conflictingrequirements, Program Streamsand Transport Streams havebeen devised as alternatives. AProgram Stream works well on asingle program with variable bitrate in a recording environment;a Transport Stream works wellon multiple programs in a fixed bit-rate transmission environment.

The problem of genlocking tothe source does not occur in aDVD player. The player deter-mines the time base of the videowith a local SPG (internal orexternal) and simply obtainsdata from the disk in order tosupply pictures on that timebase. In transmission, thedecoder has to recreate the timebase at the encoder or it willsuffer overflow or underflow.Thus, a Transport Stream usesProgram Clock Reference (PCR),whereas a Program Stream hasno need for the Program Clock.

SECTION 6 PROGRAM STREAMSProgram Streams are one way ofcombining several PES packetstreams and are advantageousfor recording applications suchas DVD.

6.1 Recording vs. transmission

For a given picture quality, thedata rate of compressed videowill vary with picture content.A variable bit-rate channel willgive the best results. In trans-mission, most practical channelsare fixed and the overall bit-rateis kept constant by the use ofstuffing (meaningless data).

In a DVD, the use of stuffing is awaste of storage capacity.However, a storage medium canbe slowed down or speeded up,either physically or, in the caseof a disk drive, by changing therate of data transfer requests.This approach allows a variable-rate channel to be obtainedwithout capacity penalty. Whena medium is replayed, the speedcan be adjusted to keep a databuffer approximately half full,irrespective of the actual bit ratewhich can change dynamically.If the decoder reads from thebuffer at an increased rate, it

Figure 7.1.

so that the time stamps for theelementary streams in each program stream become useful.Consequently, one definition ofa program is a set of elementarystreams sharing the same timing reference.

In a Single Program TransportStream (SPTS), there will be onePCR channel that recreates oneprogram clock for both audio andvideo. The SPTS is often usedas the communication betweenan audio/video coder and a multiplexer.

7.2 Packets

Figure 7.1 shows the structureof a transport stream packet.The size is a constant 188 bytesand it is always divided into aheader and a payload. Figure 7.1.ashows the minimum header of 4bytes. In this header, the mostimportant information is:

The sync byte. This byte is rec-ognized by the decoder so thatthe header and the payload canbe deserialized.

The transport error indicator.This indicator is set if the errorcorrection layer above the trans-port layer is experiencing a raw-bit error rate (BER) that is toohigh to be correctable. It indicatesthat the packet may containerrors. See Section 8 for detailsof the error correction layer.

The Packet Identification (PID).This thirteen-bit code is used todistinguish between differenttypes of packets. More will besaid about PID later.

The continuity counter. Thisfour-bit value is incremented bythe encoder as each new packethaving the same PID is sent. It isused to determine if any packetsare lost, repeated, or out ofsequence.

In some cases, more headerinformation is needed, and ifthis is the case, the adaptationfield control bits are set to indi-cate that the header is largerthan normal. Figure 7.1b showsthat when this happens theextra header length is describedby the adaptation field lengthcode. Where the header isextended, the payload becomessmaller to maintain constantpacket length.

conditional access (CA) informa-tion to administer this protection.The transport stream containsprogram specific information(PSI) to handle these tasks.

The transport layer converts thePES data into small packets ofconstant size that are self con-tained. When these packets arriveat the decoder, there may be jitterin the timing. The use of timedivision multiplexing also causesdelay, but this factor is not fixedbecause the proportion of the bitstream allocated to each programis not fixed. Time stamps are partof the solution, but they onlywork if a stable clock is available.The transport stream must con-tain further data allowing the recreation of a stable clock.

The operation of digital videoproduction equipment is heavilydependent on the distribution ofa stable system clock for synchro-nization. In video production,genlocking is used, but over longdistances, the distribution of aseparate clock is not practical.In a transport stream, the differentprograms may have originatedin different places that are notnecessarily synchronized. As aresult, the transport stream hasto provide a separate means ofsynchronizing for each program.

This additional synchronizationmethod is called a ProgramClock Reference (PCR) and itrecreates a stable reference clockthat can be divided down to create a time line at the decoder,

SECTION 7 TRANSPORT STREAMSA transport stream is more thana multiplex of many PES packets.In Program Streams, time stampsare sufficient to recreate the timeaxis because the audio and videoare locked to a common clock.For transmission down a datanetwork over distance, there isan additional requirement torecreate the clock for each pro-gram at the decoder. This requiresan additional layer of syntax toprovide program clock reference(PCR) signals.

7.1 The job of a transport stream

The transport stream carriesmany different programs andeach may use a different com-pression factor and a bit rate thatcan change dynamically eventhough the overall bit rate staysconstant. This behavior is calledstatistical multiplexing and itallows a program that is handlingdifficult material to borrowbandwidth from a program han-dling easy material. Each videoPES can have a different numberof audio and data PESs associatedwith it. Despite this flexibility, adecoder must be able to changefrom one program to the nextand correctly select the appro-priate audio and data channels.Some of the programs can beprotected so that they can onlybe viewed by those who havepaid a subscription or fee. Thetransport stream must contain

30

a)

b)

188 Bytes

Minimum 4-Byte Header

Header Payload

PCR OPCR SpliceCountdown

TransportPrivateData

AdaptionField

Extension48 48 8

AdaptationField

Length

DiscontinuityIndicator

RandomAccess

Indicator

Elem StreamPriority

Indicator

5 Flags OptionalFields

StuffingBytes

8 1 1 1 5

SyncByte

TransportError

Indicator

StartIndicator

TransportPriority

PID ScramblingControl

AdaptationField

Control

ContinuityCounter

AdaptionField

Payload

8 1 11 13 2 2 4

Figure 7.2.

of all of the different streams itcontains can vary. This require-ment is handled by the use ofnull packets that contain allzeros in the payload. If the realpayload rate falls, more nullpackets are inserted. Null packetsalways have the same PID,which is 8191 or thirteen 1's.

In a given transport stream, allpackets belonging to a given elementary stream will have thesame PID. Packets in anotherelementary stream will haveanother PID. The demultiplexercan easily select all data for agiven elementary stream simplyby accepting only packets withthe right PID. Data for an entireprogram can be selected usingthe PIDs for video, audio, andteletext data. The demultiplexercan correctly select packets onlyif it can correctly associate themwith the transport stream towhich they belong. The demul-tiplexer can do this task only ifit knows what the right PIDs are.This is the function of the Pro-gram Specific Information (PSI).

PCR counter. The local PCR iscompared with the PCR fromthe packet header, and the dif-ference is the PCR phase error.This error is filtered to controlthe VCO that eventually willbring the local PCR count intostep with the header PCRs.Heavy VCO filtering ensuresthat jitter in PCR transmissiondoes not modulate the clock.The discontinuity indicator willreset the local PCR count and,optionally, may be used to reducethe filtering to help the systemquickly lock to the new timing.

MPEG requires that PCRs are sentat a rate of at least 10 PCRs persecond, whereas DVB specifies aminimum of 25 PCRs per second.

7.4 Packet Identification (PID)

A 13-bit field in the transportpacket header contains thePacket Identification Code (PID).The PID is used by the demulti-plexer to distinguish betweenpackets containing different typesof information. The transport-stream bit rate must be constant,even though the sum of the rates

7.3 Program Clock Reference (PCR)

The encoder used for a particularprogram will have a 27 MHzprogram clock. In the case of anSDI (Serial Digital Interface)input, the bit clock can bedivided by 10 to produce theencoder program clock. Whereseveral programs originate inthe same production facility, itis possible that they will all havethe same clock. In case of ananalog video input, the H syncperiod will need to be multipliedby a constant in a phase lockedloop to produce 27 MHz.

The adaptation field in thepacket header is periodicallyused to include the PCR (programclock reference) code that allowsgeneration of a locked clock atthe decoder. If the encoder or aremultiplexer has to switchsources, the PCR may have adiscontinuity. The continuitycount can also be disturbed.This event is handled by thediscontinuity indicator, whichtells the decoder to expect a dis-turbance. Otherwise, a disconti-nuity is an error condition.

Figure 7.2 shows how the PCRis used by the decoder to recreatea remote version of the 27-MHzclock for each program. Theencoder clocks drive a constantlyrunning binary counter, and thevalue of these counters are sam-pled periodically and placed inthe header adaptation fields asthe PCR. (The PCR, like the PTS,is a 33-bit number that is a sam-ple of a counter driven by a 90-kHz clock). Each encoderproduces packets having a dif-ferent PID. The decoder recog-nizes the packets with the correctPID for the selected program andignores others. At the decoder, aVCO generates a nominal 27-MHzclock and this drives a local

31

Video InElementary Stream

188 Byte PacketsPCR = X PCR = X Plus

the Time ofExactly n bitsn bits

PCR

LocalPCR

Load

Receiver 27 MHz Clock

VideoEncoder

TransportStreamDecoder

27 MHzClock

Compare

TransportStream

Formation

LowPassFilter

27 MHzClock

27 MHzXtalVCO

Figure 7.3.

streams that may be available tothe same decoder, for example,by tuning to a different RF chan-nel or steering a dish to a differ-ent satellite. The NIT may list anumber of other transportstreams and each one may havea descriptor that specifies theradio frequency, orbital position,and so on. In MPEG, only theNIT is mandatory for this purpose. In DVB, additionalmetadata, known as DVB-SI, isincluded, and the NIT is consid-ered to be part of DVB-SI. Thisoperation is discussed inSection 8. When discussing thesubject in general, the termPSI/SI is used.

Upon first receiving a transportstream, the demultiplexer mustlook for PIDs 0 and 1 in thepacket headers. All PID 0 packetscontain the Program AssociationTable (PAT). All PID 1 packetscontain Conditional AccessTable (CAT) data.

By reading the PAT, the demuxcan find the PIDs of theNetwork Information Table(NIT) and of each Program MapTable (PMT). By finding thePMTs, the demux can find thePIDs of each elementary stream.

Consequently, if the decoding ofa particular program is required,reference to the PAT and thenthe PMT is all that is needed tofind the PIDs of all of the ele-mentary streams in the program.If the program is encrypted,then access to the CAT will alsobe necessary. As demultiplexingis impossible without a PAT, thelockup speed is a function ofhow often the PAT packets aresent. MPEG specifies a maximumof 0.5 seconds between the PATpackets and the PMT packetsthat are referred to in those PATpackets. In DVB and ATSC, theNIT may reside in packets thathave a specific PID.

(PAT) packets (PID = 0) thatspecify the PIDs of all ProgramMap Table (PMT) packets. Thefirst entry in the PAT, program 0,is always reserved for networkdata and contains the PID ofnetwork information table (NIT) packets.

The PIDs for EntitlementControl Messages (ECM) andEntitlement ManagementMessages (EMM) are listed inthe Conditional Access Table(CAT) packets (PID = 1).

As Figure 7.3 shows, the PIDs ofthe video, audio, and data ele-mentary streams that belong inthe same program stream arelisted in the Program Map Table(PMT) packets. Each PMT packethas its own PID.

A given network informationtable contains details of morethan just the transport streamcarrying it. Also included aredetails of other transport

7.5 Program Specific Information (PSI)

PSI is carried in packets havingunique PIDs, some of which arestandardized and some of whichare specified by the ProgramAssociation Table (PAT) and theConditional Access Table (CAT).These packets must be includedperiodically in every transportstream. The PAT always has aPID of 0, and the CAT alwayshas a PID of 1. These values andthe null packet PID of 8191 arethe only fixed PIDs in the wholeMPEG system. The demultiplexermust determine all of theremaining PIDs by accessing theappropriate table. However, inATSC and DVB, PMTs mayrequire specific PIDs. In thisrespect (and in some others),MPEG and DVB/ATSC are notfully interchangeable.

The program streams that existin the transport stream are listedin the program association table

32

Program Association Table (PID 0) Network Information Table

Conditional AccessTable (PID 1)

Program MapTables

Transport Stream

Stream 1 Video 54Stream 2 Audio 48Stream 3 Audio 49

Stream k Data 66- - - - - - - - -

- - - - - - - - -

Stream 1 Video 19Stream 2 Audio 81Stream 3 Audio 82

Stream k Data 88- - - - - - - - -

- - - - - - - - -

PrivateNetwork

Data

ConditionalAccess Data

0

PAT Prog 1PMT

Prog 3PMT EMM Prog 1

Audio 2Prog 3Audio 2

Prog 3Video 1

Prog 3Video 1

Prog 1Video 1

Prog 3Audio 1

Prog 3Video 1

22 33 1 49 82 19 19 54 81 19

Program 0 16Program 1 22Program 3 33

Program k 55- - - - - -

- - - - - -

Figure 8.1.

The addition of error correctionobviously increases the bit rateas far as the transmitter or cableis concerned. Unfortunately,reliable, economical radio andcable-transmission of datarequires more than serializingthe data. Practical systemsrequire channel coding.

8.2 Remultiplexing

Remultiplexing is a complextask because it has to output acompliant bit stream that isassembled from parts of others.The required data from a giveninput transport stream can beselected with reference to theprogram association table andthe program map tables that willdisclose the PIDs of the programsrequired. It is possible that thesame PIDs have been used intwo input transport streams;therefore, the PIDs of one ormore elementary streams mayhave to be changed. The packetheaders must pass on the pro-gram clock reference (PCR) thatwill allow the final decoder torecreate a 27 MHz clock. As theposition of packets containingPCR may be different in the newmultiplex, the remultiplexermay need to edit the PCR valuesto reflect their new position onthe time axis.

The program map tables andprogram association tables willneed to be edited to reflect thenew transport stream structure,as will the Conditional AccessTables (CAT).

Operators, who will use an addi-tional layer of error correction asneeded. This layer should betransparent to the destination.

A particular transmitter or cableoperator may not want all of theprograms in a transport stream.Several transport streams maybe received and a selection ofchannels may be made andencoded into a single outputtransport stream using a remul-tiplexer. The configuration maychange dynamically.

Broadcasting in the digitaldomain consists of conveyingthe entire transport stream tothe viewer. Whether the channelis cable, satellite, or terrestrial,the problems are much thesame. Metadata describing thetransmission must be encodedinto the transport stream in astandardized way. In DVB, thismetadata is called ServiceInformation (DVB-SI) andincludes details of programs car-ried on other multiplexes andservices such as teletext.

In broadcasting, there is muchless control of the signal qualityand noise or interference is apossibility. This requires someform of forward error correction(FEC) layer. Unlike the FECused by the TelecommunicationsNetwork Operators, which canbe proprietary, (or standardizedas per ETSI, which defines DVBtransmission over SDH and PDHnetworks), the FEC used inbroadcasting must be standard-ized so that receivers will beable to handle it.

SECTION 8 INTRODUCTION TO DVB/ATSCMPEG compression is alreadybeing used in broadcasting andwill become increasingly impor-tant in the future. A discussionof the additional requirementsof broadcasting data follows.

8.1 An overall view

ATSC stands for the AdvancedTelevision Systems Committee,which is a U.S. organizationthat defines standards for terres-trial digital broadcasting andcable distribution. DVB refersto the Digital Video BroadcastingProject and to the standards andpractices established by theDVB Project. This project wasoriginally a European project, butproduces standards and guidesaccepted in many areas of theworld. These standards andguides encompass all transmis-sion media, including satellite,cable, and terrestrial broadcasting.

Digital broadcasting has differentdistribution and transmissionrequirements, as is shown inFigure 8.1. Broadcasters willproduce transport streams thatcontain several television pro-grams. Transport streams haveno protection against errors, andin compressed data, the effect oferrors is serious. Transportstreams need to be deliverederror-free to transmitters, satelliteuplinks, and cable head ends. Inthis context, error free means aBER of 1 x 10-11. This task isnormally entrusted toTelecommunications Network

33

Contributor A

Prog 1Prog 2Prog 3

TS TS

TS

Telecom Network Operator

Telecom Network Operator

Send

SendMux

Remux

DVB-SI

(A1, A3, B2, B3)

TV Transmitter

PresentationSuite

AddFEC

ErrorCorrect

ErrorCorrectContributor B

Prog 1Prog 2Prog 3

TS TSSend

Mux

AddFEC

ErrorCorrect

AddFEC

EnergyDispersal

OuterFEC

InnerFEC InterleaveModulate

Figure 8.2.

the scope of this book. Briefly,R-S codes add redundancy tothe data to make a code-wordsuch that when each symbol isused as a term in a minimum oftwo simultaneous equations, thesum (or syndrome) is alwayszero if there is no error. Thiszero condition is obtained irrespective of the data andmakes checking easy. InTransport streams, the packetsare always 188 bytes long. Theaddition of 16 bytes of R-Sredundancy produces a standardFEC code-word of 204 bytes.

In the event that the syndromeis non-zero, solving the simulta-neous equations will result intwo values needed for error correction: the location of theerror and the nature of the error.However, if the size of the errorexceeds half the amount ofredundancy added, the errorcannot be corrected.Unfortunately, in typical trans-mission channels, the signalquality is statistical. This meansthat while single bits may be inerror due to noise, on occasion alarge number of bits, known as aburst, can be corrupted together.This corruption might be due tolightning or interference fromelectrical equipment.

It is not economic to protectevery code word against suchbursts because they do not occuroften enough. The solution is touse a technique known as inter-leaving. Figure 8.2 shows that,when interleaving is used, thesource data are FEC coded, butprior to transmission, they arefed into a RAM buffer. Figure 8.3shows one possible technique inwhich data enters the RAM inrows and are then read out incolumns. The reordered data arenow transmitted. On reception,the data are put back to theiroriginal order, or deinterleaved,by using a second RAM. Theresult of the interleaving processis that a burst of errors in thechannel after deinterleavingbecomes a large number of single-symbol errors, which aremore readily correctable.

contains Electronic ProgramGuide (EPG) information, suchas the nature of a program, thetiming and the channel on whichit can be located, and the coun-tries in which it is available.Programs can also be rated sothat parental judgment can be exercised.

DVB-SI can include the followingoptions over and above MPEG-PSI:

Service Description Table (SDT).Each service in a DVB transportstream can have a servicedescriptor and these descriptorsare assembled into the ServiceDescription Table. A service maybe television, radio, or teletext.The service descriptor includesthe name of the service provider.

Event Information Table (EIT).EIT is an optional table for DVBwhich contains program names,start times, durations, and so on.

Bouquet Association Table(BAT). The BAT is an optionaltable for DVB that providesdetails of bouquets, which arecollections of services marketedas a single product.

Time and Date Table (TDT). TheTDT is an option that embeds aUTC time and date stamp in thetransport stream.

8.4 Error correction

Error correction is necessarybecause conditions on longtransmission paths cannot becontrolled. In some systems,error detection is sufficientbecause it can be used to requesta retransmission. Clearly, thisapproach will not work withreal-time signals such as televi-sion. Instead, a system calledForward Error Correction (FEC)is used in which sufficient extrabits, known as redundancy, areadded to the data to allow thedecoder to perform correctionsin real time.

The FEC used in modern systemsis usually based on the Reed-Solomon (R-S) codes. A full discussion of these is outside

If the sum of the selected pro-gram stream bit rates, is lessthan the output bit rate, theremultiplexer will create stuffingpackets with suitable PIDs.However, if the transport streamshave come from statistical mul-tiplexers, it is possible that theinstantaneous bit rate of the newtransport stream will exceed thechannel capacity. This conditionmight occur if several selectedprograms in different transportstreams simultaneously containhigh entropy. In this case, theonly solution is to recompressand create new, shorter coeffi-cients in one or more bit streamsto reduce the bit rate.

8.3 Service Information (SI)

In the future, digital deliverywill mean that there will be alarge number of programs, tele-text, and services available tothe viewer and these may bespread across a number of dif-ferent transport streams. Boththe viewer and the IntegratedReceiver Decoder (IRD) willneed help to display what isavailable and to output theselected service. This capabilityrequires metadata beyond thecapabilities of MPEG-PSI(Program Specific Information)and is referred to as DVB-SI(Service Information). DVB-SI isconsidered to include the NIT,which must be present in allMPEG transport streams.

DVB-SI is embedded in thetransport stream as additionaltransport packets with uniquePIDs and carries technical infor-mation for IRDs. DVB-SI also

34

Data InNon-Sequential

Data Out

AddressReorder

RAM

Figure 8.3.

the bit pattern. In QuadraturePhase Shift Keying (QPSK),there are four possible phases.The phases are 90 degrees apart,therefore each symbol carriestwo bits.

For cable operation, multi-levelsignaling and phase shift keyingcan be combined in QuadratureAmplitude Modulation (QAM),which is a discrete version ofthe modulation method used incomposite video subcarrier.Figure 8.5 shows that if twoquadrature carriers, known as Iand Q, are individually modu-lated into eight levels each, theresult is a carrier which canhave 64 possible combinationsof amplitude and phase. Eachsymbol carries six bits, thereforethe bandwidth required is onethird that of QPSK.

third the bandwidth of binary.As an analog transmitterrequires about 8-bit resolution,transmitting three-bit resolutioncan be achieved with a muchlower SNR and with lowertransmitter power. In the ATSCsystem, it is proposed to use asystem called 8-VSB. Eight-levelmodulation is used and thetransmitter output has a vestigiallower sideband like that used inanalog television transmissions.

In satellite broadcasting, there isless restriction on bandwidth,but the received signal has apoor SNR, which precludesusing multi-level signaling. Inthis case, it is possible to transmitdata by modulating the phase ofa signal. In Phase Shift Keying(PSK), the carrier is transmittedin different phases according to

When a burst error reaches themaximum correctable size, thesystem is vulnerable to randombit errors that make code-wordsuncorrectable. The use of aninner code applied after inter-leave and corrected before dein-terleave can prevent randomerrors from entering the deinterleave memory.

As Figure 8.3 shows, when thisapproach is used with a blockinterleave structure, the result isa product code. Figure 8.4shows that interleave can alsobe convolutional, in which thedata array is sheared by applyinga different delay to each row.Convolutional, or cross inter-leave, has the advantage thatless memory is needed to inter-leave and deinterleave.

8.5 Channel coding

Raw-serial binary data is unsuit-able for transmission for severalreasons. Runs of identical bitscause DC offsets and lack a bitclock. There is no control of thespectrum and the bandwidthrequired is too great. In practicalradio and cable systems, a mod-ulation scheme called a channelcode is necessary.

In terrestrial broadcasting, channel bandwidth is at a premium and channel codeswhich minimize bandwidth arerequired. In multi-level signaling,the transmitter power is switchedbetween, for example, eight dif-ferent levels. This approachresults in each symbol representsthree data bits, requiring one

35

VerticalCodewords

HorizontalCodewords

= Symbol

VerticalCodeword

ShearedData Array

= Symbol

Figure 8.4.

CarrierIn

4 QuadrantModulator

64 - QuamOut

6 Bits/Symbol

3Bits

DAC

8 Levels

X

90˚Shift

3Bits

DAC

X

8 Levels

+

I

Q

Figure 8.5.

Figure 8.6.

principle of COFDM (CodedOrthogonal Frequency DivisionMultiplexing), which is proposedfor use with DVB terrestrialbroadcasts (DVB-T).

8.6 Inner coding

The inner code of a FEC systemis designed to prevent randomerrors from reducing the powerof the interleave scheme. A suitable inner code can preventsuch errors by giving an apparentincrease to the SNR of the trans-mission. In trellis coding, whichcan be used with multi-levelsignaling, several multi-levelsymbols are associated into agroup. The waveform that resultsfrom a particular group of sym-bols is called a trellis. If eachsymbol can have eight levels,then in three symbols there canbe 512 possible trellises.

In trellis coding, the data arecoded such that only certaintrellis waveforms representvalid data. If only 64 of the trellises represent error-freedata, then two data bits persymbol can be sent instead ofthree. The remaining bit is aform of redundancy becausetrellises other than the correct64 must be due to errors. If atrellis is received in which thelevel of one of the symbols isambiguous due to noise, theambiguity can be resolvedbecause the correct level mustbe the one which gives a validtrellis. This technique is knownas maximum-likelihood decoding.

be subtracted at the receiver asshown. Randomizing cannot beapplied to sync patterns, or theycould not be detected.

In the above systems, a basebandsignal is supplied to a modulatorthat operates on a single carrierto produce the transmitted side-band(s). In digital systems withfixed bit rates, the frequency ofthe sidebands is known. Analternative to a wideband systemis one that produces many narrowband carriers at carefullyregulated spacing. Figure 8.7ashows that a digitally modulatedcarrier has a spectral null ateach side. Another identical carrier can be placed here with-out interference because the two are mutually orthogonal asFigure 8.7b shows. This is the

In the schemes described above,the transmitted signal spectrumis signal dependent. Some partsof the spectrum may containhigh energy and cause interfer-ence to other services, whereasother parts of the spectrum maycontain little energy and be sus-ceptible to interference. In prac-tice, randomizing is necessary todecorrelate the transmittedspectrum from the data content.Figure 8.6 shows that when randomizing or energy dispersalis used, a pseudo-randomsequence is added to the serialdata before it is input to themodulator. The result is that thetransmitted spectrum is noise likethat found in relatively stationarystatistics. Clearly, an identicaland synchronous sequence must

36

DataIn Data

Out

RandomizedData Transmit

Pseudo-RandomSequenceGenerator

Pseudo-RandomSequenceGenerator

SpectralNulls

MutuallyOrthogonal

Carriers

a)

b)

Figure 8.7.

Figure 8.8.

modulator output is then upcon-verted to produce the RF output.

At the receiver, the bit clock isextracted and used to controlthe timing of the whole system.The channel coding is reversedto obtain the raw data plus thetransmission errors. The innercode corrects random errors andmay identify larger errors tohelp the outer coder after deinterleaving. The randomizingis removed and the result is theoriginal transport stream. Thereceiver must identify theProgram Association Table (PAT)and the Service Information (SI)and Program Map Tables (PMT)that the PAT points to so thatthe viewer can be told what isavailable in the transport stream

coder that adds redundancy tothe data. A convolutional inter-leave process then reorders thedata so that adjacent data in thetransport stream are not adjacentin the transmission. An innertrellis coder is then used to produce a multi-level signal forthe VSB modulator.

Figure 8.10 shows a DVB-Ttransmitter. Service informationis added as before, followed bythe randomizing stage for energydispersal. Outer R-S check sym-bols are added prior to inter-leaving. After the interleaver,the inner coding process takesplace, and the coded data is fedto a COFDM modulator. The

The 64 valid trellises should bemade as different as possible tomake the system continue towork with a poorer signal tonoise ratio. If the trellis codermakes an error, the outer codewill correct it.

In DVB, Viterbi convolutionalcoding may be used. Figure 8.8shows that following interleave,the data are fed to a shift register.The contents of the shift registerproduce two outputs that repre-sent different parity checks onthe input data so that bit errorscan be corrected. Clearly, therewill be two output bits for everyinput bit; therefore the codershown is described as a 1/2 ratecoder. Any rate between 1/1 and1/2 would still allow the originaldata to be transmitted, but theamount of redundancy wouldvary. Failing to transmit theentire 1/2 output is called punc-turing and it allows any requiredbalance to be obtained betweenbit rate and correcting power.

8.7 Transmitting digits

Figure 8.9 shows the elementsof an ATSC digital transmitter.Service information describingthe transmission is added to thetransport stream. This stream isthen randomized prior to routingto an outer R-S error correction

37

Data input Data output1 2 3 4 5 6 7

1

2

EnergyDispersal

RFStage

TS

SI

OuterFEC

VSBModulator

ConvolutionalInterleave

TrellisCoder

EnergyDispersal

RFStage

TS

DVB-SI

OuterFEC

ConvolutionalInterleave

ViterbiEncoder

COFDMModulator

Figure 8.9.

Figure 8.10.

Figure 9.1.

user can observe any requireddetails. Many general types ofanalysis can take place in realtime on a live transport stream.These include displays of thehierarchy of programs in thetransport stream and of the proportion of the stream bit rateallocated to each stream.

More detailed analysis is onlypossible if part of a transportstream is recorded so that it canbe picked apart later. This tech-nique is known as deferred timetesting and could be used, forexample, to examine the contentsof a time stamp.

When used for deferred timetesting, the MPEG transport-stream analyzer is acting like alogic analyzer that providesdata-interpretation tools specificto MPEG. Like all logic analyzers,a real-time triggering mechanismis required to determine thetime or conditions under whichcapture will take place. Figure 9.1shows that an analyzer containsa real-time section, a storagesection, and a deferred section.In real-time analysis, only thatsection operates, and a signalsource needs to be connected.For capture, the real-time sectionis used to determine when totrigger the capture. The analyzerincludes tools known as filtersthat allow selective analysis tobe applied before or after capture.

Once the capture is completed,the deferred section can operateon the captured data and theinput signal is no longer neces-sary. There is a good parallel inthe storage oscilloscope whichcan display the real-time inputdirectly or save it for later study.

One of the characteristics ofMPEG that distances it mostfrom traditional broadcast videoequipment is the existence ofmultiple information layers, inwhich each layer is hoped to betransparent to the one below. Itis very important to be able toestablish in which layer any faultresides to avoid a fruitless search.

For example, if the picture mon-itor on an MPEG decoder isshowing visible defects, thesedefects could be due to a numberof possibilities. Perhaps theencoder is not allowing a highenough bit rate or perhaps theencoder is faulty, causing thetransport stream to deliver thefaulty information. On the otherhand, the encoder might operatecorrectly, but the transport layercorrupts the data. In DVB, thereare even more layers such asenergy dispersal, error correction,and interleaving. Such complex-ity requires a structured approachto fault finding, using the righttools. The discussion of protocolanalysis of the compressed datain this primer may help the userderive such an approach.Reading the discussion of anotherimportant aspect of testing forcompressed television, picture-quality assessment, may also be helpful. This later discussionis found in the Tektronix publication, A Guide to VideoMeasurements for CompressedTelevision Systems.

9.2 Analyzing a Transport Stream

An MPEG transport stream hasan extremely complex structure,but an analyzer such as the MTS100 can break down the structurein a logical fashion such that the

SECTION 9 MPEG TESTINGThe ability to analyze existingtransport streams for complianceis essential, but this ability mustbe complemented by an abilityto create known-complianttransport streams.

9.1 Testing requirements

Although the technology ofMPEG differs dramatically fromthe technology that preceded it,the testing requirements arebasically the same. On an opera-tional basis, the user wants tohave a simple, regular confidencecheck that ensures all is well. Inthe event of a failure, the locationof the fault needs to be estab-lished rapidly. For the purposeof equipment design, the natureof problems need to be exploredin some detail. As with all signaltesting, the approach is to com-bine the generation of knownvalid signals for insertion into asystem with the ability to mea-sure signals at various points.

38

Real TimeAnalysis

+Triggering

DeferredTime

Analysis

StorageSystem

In

Figure 9.2.

9.3 Hierarchic view

When analyzing an unfamiliartransport stream, the hierarchicview is an excellent startingpoint because it enables a graphicview of every component in thestream. Figure 9.2 shows anexample of a hierarchic displaysuch as that provided by theTektronix MTS 200. Beginningat top left of the entire transportstream, the stream splits and anicon is presented for everystream component. Figure 9.3shows the different icons thatthe hierarchical view uses andtheir meaning. The user canvery easily see how many pro-gram streams are present andthe video and audio content ofeach. Each icon represents thetop layer of a number of loweranalysis and information layers.

The analyzer creates the hierar-chic view by using the PAT andPMT in the Program SpecificInformation (PSI) data in thetransport stream. The PIDs fromthese tables are displayedbeneath each icon. PAT and PMTdata are fundamental to theoperation of any demultiplexeror decoder; if the analyzer cannotdisplay a hierarchic view or dis-plays a view which is obviouslywrong, the transport streamunder test has a PAT/PMT error.It is unlikely that equipmentfurther up the line will be ableto interpret the stream at all.

39

Figure 9.3.

Icon Element TypeMultiplex transport packets. This icon represents all (188- and 204-byte transport packetsthat make up the stream. If you visualize the transport stream as a train, this icon representsevery car in the train, regardless of its configuration (flat car, boxcar, or hopper for example)and what it contains.

Transport packets of a particular PID (Program ID). Other elements (tables, clocks, PESpackets) are the “payload” contained within transport packets or are constructed from thepayload of several transport packets that have the same PID. The PID number appearsunder the icon.In the hierarchic view, the icon to the right of this icon represents the payload of packetswith this PID.

Transport Packets that contain independent PCR clocks. The PID appears under the icon.

PAT (Program Association Table) sections. Always contained in PID 0 transport packets.

PMT (Program Map Table) sections

NIT (Network Information Table) Provides access to SI Tables through the PSI/SI commandfrom the Selection menu. Also used for Private sections.When the DVB option (in the Options menu) is selected, this icon can also represent SDT,BAT, EIT, and TDT sections.

Packetized Elementary Stream (PES). This icon represents all packets that, together, contain agiven elementary stream. Individual PES packets are assembled from the payloads of severaltransport packets.

Video elementary stream

Audio elementary stream

Data elementary stream

ECM (Entitlement Control Message) sections

EMM (Entitlement Management Message) sections

Figure 9.4.

9.4 Interpreted view

As an alternative to checking forspecific data in unspecifiedplaces, it is possible to analyzeunspecified data in specificplaces, including in the individ-ual transport stream packets, thetables, or the PES packets. Thisanalysis is known as the inter-preted view because the analyzerautomatically parses and decodesthe data and then displays itsmeaning. Figure 9.7 shows anexample of an interpreted view.As the selected item is changed,the elapsed time and byte countrelative to the start of the streamcan be displayed.

Figure 9.6 shows an example ofa MUX-allocation pie-chart dis-play. The hierarchical view andthe MUX Allocation Chart showthe number of elements in thetransport stream and the propor-tion of bandwidth allocated, butthey do not show how the dif-ferent elements are distributedin time in the multiplex. ThePID map function displays, inorder of reception, a list of thePID of every packet in a sectionof the transport stream. EachPID type is color coded differ-ently, so it is possible to assessthe uniformity of the multi-plexing. Bursts of SI data in aTransport Stream, seen as contiguous packets, may causebuffer overflows in the decoder.

The ability of a demux ordecoder to lock to a transportstream depends on the frequencywith which the PSI data are sent.The PSI/SI rate option shown inFigure 9.4 displays the frequencyof insertion of system informa-tion. PSI/SI information shouldalso be consistent with the actualcontent in the bit stream. Forexample, if a given PID is refer-enced in a PMT, it should bepossible to find PIDs of this valuein the bit stream. The consistencycheck function makes such acomparison. Figure 9.5 shows aconsistency-error readout.

A MUX allocation chart maygraphically display the propor-tions of the transport streamallocated to each PID or program.

40

Figure 9.5.

Figure 9.6. Figure 9.7.

Alternatively, the filtering canbe used to search for unknownPIDs based on some condition.For example, if the audio-com-pression levels being used arenot known, audio-sequenceheaders containing, say, Layer 2coding can be searched for.

It may be useful to filter so thatonly certain elementary-streamaccess units are selected. At avery low level, it may be usefulto filter on a specific four-bytebit pattern. In the case of falsedetection of sync patterns, thisfilter would allow the source ofthe false syncs to be located.

In practice, it is useful to beable to concatenate filters. Inother words, it is possible to filternot just on packets having a PIDof a certain value, but perhapsonly packets of that PID valuewhich also contain a PES header.By combining filters in this way,it is possible to be very specificabout the data to be displayed.As fault finding proceeds, thisprogressive filtering allows opti-mization of the search process.

to stream-bit errors, but consis-tent CRC errors point to a hardware fault.

9.6 Filtering

A transport stream contains agreat amount of data, and in realfault conditions, it is probablethat, unless a serious problemexists, much of the data is validand that perhaps only one ele-mentary stream or one programis affected. In this case, it ismore effective to test selectively,which is the function of filtering.

Essentially, filtering allows theuser of an analyzer to be moreselective when examining atransport stream. Instead ofaccepting every bit, the user cananalyze only those parts of thedata that meet certain conditions.

One condition results from filtering packet headers so thatonly packets with a given PIDare analyzed. This approachmakes it very easy to check thePAT by selecting PID 0, and,from there, all other PIDs can beread out. If the PIDs of a suspectstream are known, perhaps fromviewing a hierarchical display,it is easy to select a single PIDfor analysis.

The interpreted view can alsobe expanded to give a full expla-nation of the meaning of anyfield in the interpreted item.Figure 9.8 shows an example ofa field explanation. BecauseMPEG has so many differentparameters, field explanationscan help recall their meanings.

9.5 Syntax and CRC analysis

To ship program material, thetransport stream relies completelyon the accurate use of syntax byencoders. Without correct settingsof fixed flag bits, sync patterns,packet-start codes, and packetcounts, a decoder may misinter-pret the bit stream. The syntaxcheck function considers all bitsthat are not program material anddisplays any discrepancies.Spurious discrepancies could bedue to transmission errors; con-sistent discrepancies point to afaulty encoder or multiplexer.Figure 9.9 shows a syntax-error display.

Many MPEG tables have checksums or CRCs attached forerror detection. The analyzercan recalculate the checksumsand compare them with theactual checksum. Again, spuriousCRC mismatches could be due

41

Figure 9.8.

Figure 9.9.

In MPEG, the reordering anduse of different picture typescauses delay and requiresbuffering at both encoder anddecoder. A given elementarystream must be encoded withinthe constraints of the availabilityof buffering at the decoder.MPEG defines a model decodercalled the T-STD (TransportStream Target Decoder); anencoder or multiplexer must notdistort the data flow beyond thebuffering ability of the T-STD.The transport stream containsparameters called VBV (VideoBuffer Verify) specifying theamount of buffering needed by agiven elementary stream.

The T-STD analysis displays thebuffer occupancy graphically sothat overflows or underflowscan be easily seen. Figure 9.14shows a buffering display.

The output of a normal com-pressor/multiplexer is of limiteduse because it is not deterministic.If a decoder defect is seen, thereis no guarantee that the samedefect will be seen on a repeat ofthe test because the same videosignal will not result in the sametransport stream. In this case, anabsolutely repeatable transportstream is essential so that thedefect can be made to occur atwill for study or rectification.

Transport stream jitter should bewithin certain limit, but a well-designed decoder should be ableto recover programs beyond thislimit in order to guarantee reli-able operation. There is no wayto test for this capability usingexisting transport streamsbecause, if they are compliant,the decoder is not being tested.If there is a failure, it will not bereproducible and it may not beclear whether the failure wasdue to jitter or some other non-compliance. The solution is togenerate a transport stream thatis compliant in every respectand then add a controlledamount of jitter to it so that jitteris then known to be the onlysource of noncompliance. Themultiplexer feature of the MTS200 is designed to create such signals.

9.7 Timing Analysis

The tests described above checkfor the presence of the correctelements and syntax in thetransport stream. However, todisplay real-time audio andvideo correctly, the transportstream must also deliver accuratetiming to the decoders. This taskcan be confirmed by analyzingthe PCR and time-stamp data.

The correct transfer of programclock data is vital because thisdata controls the entire timingof the decoding process. PCRanalysis can show that, in eachprogram, PCR data is sent at asufficient rate and with sufficientaccuracy to be compliant.

The PCR data from a multiplexermay be precise, but remulti-plexing may put the packets of agiven program at a different placeon the time axis, requiring thatthe PCR data be edited by theremultiplexer. Consequently, itis important to test for PCR jitterafter the data is remultiplexed.

Figure 9.10 shows a PCR displaythat indicates the times at whichPCRs were received. At the nextdisplay level, each PCR can beopened to display the PCR data,as is shown in Figure 9.11. Tomeasure jitter, the analyzer predicts the PCR value by usingthe previous PCR and the bitrate to produce what is calledthe interpolated PCR or PCRI.The actual PCR value is sub-tracted from PCRI to give anestimate of the jitter. The figurealso shows the time since theprevious PCR arrived.

An alternate approach shown inFigure 9.12 provides a graphicaldisplay of PCR jitter and PCRrepetition rate which is updatedin real-time.

Once PCR data is known to becorrect, the time stamps can beanalyzed. Figure 9.13 shows atime-stamp display for a selectedelementary stream. The time ofarrival, the presentation time,and, where appropriate, thedecode times are all shown.

42

Figure 9.10.

Figure 9.11.

Figure 9.13.

Figure 9.12.

No access to the internal workingof the decoder is required. Toavoid the need for lengthyanalysis of the decoder output,the bit streams have beendesigned to create a plain picturewhen they complete so that it isonly necessary to connect a picture monitor to the decoderoutput to view them.

There are a number of thesesimple pictures. Figure 9.15shows the gray verify screen.The user should examine theverify screen to look for discrep-ancies that will display wellagainst the gray field. There arealso some verify pictures whichare not gray.

Some tests will result in no picture at all if there is a failure.These tests display the word"SUCCESS!" on screen whenthey complete.

Further tests require the viewerto check for smooth motion of a moving element across thepicture. Timing or orderingproblems will cause visible jitter.

The suite of Sarnoff tests may beused to check all of the MPEGsyntax elements in turn. In onetest, the bit stream begins with Ipictures only, adds P pictures,and then adds B pictures to testwhether all MPEG picture typescan be handled and correctlyreordered. Backward compati-bility with MPEG-1 can beproven. Another bit stream testsusing a range of different GOPstructures. There are tests thatcheck the operation of motionvectors over the whole range ofvalues, and there are tests thatvary the size of slices or theamount of stuffing.

In addition to providing decodertests, the Sarnoff streams alsoinclude sequences that cause agood decoder to produce stan-dard video test signals to checkDACs, signal levels, and com-posite or Y/C encoders. Thesesequences turn the decoder intoa video test-pattern generatorcapable of producing conven-tional video signals such as zoneplates, ramps, and color bars.

9.8 Elementary stream testing

Because of the flexible nature ofthe MPEG bit stream, the numberof possibilities and combinationsit can contain is almost incalcu-lable. As the encoder is notdefined, encoder manufacturersare not compelled to use everypossibility; indeed, for economicreasons, this is unlikely. Thisfact makes testing quite difficultbecause the fact that a decoderworks with a particular encoderdoes not prove compliance.That decoder may simply not beusing the modes that cause thedecoder to fail.

A further complication occursbecause encoders are not deter-ministic and will not producethe same bit stream if the videoor audio input is repeated. Thereis little chance that the samealignment will exist between I,P, and B pictures and the videoframes. If a decoder fails a giventest, it may not fail the next timethe test is run, making fault-finding difficult. A failure witha given encoder does not deter-mine whether the fault lies withthe encoder or the decoder. Thecoding difficulty depends heavilyon the nature of the programmaterial, and any given programmaterial will not necessarilyexercise every parameter overthe whole coding range.

To make tests that have meaning-ful results, two tools are required:

A known source of complianttest signals that deliberatelyexplore the whole coding range.These signals must be determin-istic so that a decoder failure willgive repeatable symptoms. TheSarnoff compliant bit streamsare designed to perform this task.

An elementary stream analyzerthat allows the entire syntaxfrom an encoder to be checkedfor compliance.

9.9 Sarnoff compliant bit streams

These bit streams have beenspecifically designed by theDavid Sarnoff Research Centerfor decoder compliance testing.They can be multiplexed into atransport stream feeding a decoder.

43

9.10 Elementary stream analysis

An elementary stream is a pay-load that the transport streammust deliver transparently. Thetransport stream will do sowhether or not the elementarystream is compliant. In otherwords, testing a transport streamfor compliance simply meanschecking that it is deliveringelementary streams unchanged.It does not mean that the ele-mentary streams were properlyassembled in the first place.

The elementary-stream structureor syntax is the responsibility ofthe compressor. Therefore anelementary stream test is essen-tially a form of compressor test.It should be noted that a com-pressor can produce compliantsyntax, and yet still have pooraudio or video quality. However,if the syntax is incorrect, adecoder may not be able tointerpret the elementary stream.Since compressors are algorith-mic rather than deterministic, anelementary stream may be inter-mittently noncompliant if someless common mode of operationis not properly implemented.

Figure 9.15.

Figure 9.14.

9.12 Jitter generation

The MPEG decoder has to recre-ate a continuous clock by usingthe clock samples in PCR datato drive a phase-locked loop.The loop needs filtering anddamping so that jitter in thetime of arrival of PCR data doesnot cause instability in the clock.

To test the phase-locked loopperformance, a signal withknown jitter is required; other-wise, the test is meaningless.The MTS 200 can generate simulated jitter for this purpose.Because it is a reference genera-tor, the MTS 200 has highly stable clock circuits and theactual output jitter is very small.To create the effect of jitter, thetiming of the PCR data is notchanged at all. Instead, the PCRvalues are modified so that thePCR count they contain is slightlydifferent from the ideal. Themodified value results in phaseerrors at the decoder that areindistinguishable from real jitter.

The advantage of this approachis that jitter of any requiredmagnitude can easily be addedto any program stream simplyby modifying the PCR data andleaving all other data intact.Other program streams in thetransport stream need not havejitter added. In fact, it may bebest to have a stable programstream to use as a reference.

For different test purposes, thetime base may be modulated ina number of ways that determinethe spectrum of the loop phaseerror in order to test the loop filtering. Square-wave jitteralternates between values which are equally early or late.Sinusoidal jitter values causethe phase error to be a sampledsine wave. Random jitter causesthe phase error to be similar to noise.

The MTS 200 also has the abilityto add a fixed offset to the PCRdata. This offset will cause aconstant shift in the decode time.If two programs in the sametransport stream are decoded tobaseband video, a PCR shift inone program produces a changeof sync phase with respect tothe unmodified program.

As transport streams often con-tain several programs that comefrom different coders, elementary-stream problems tend to berestricted to one program, whereastransport stream problems tendto affect all programs. If problemsare noted with the output of aparticular decoder, then theSarnoff compliance tests shouldbe run on that decoder. If theseare satisfactory, the fault may liein the input signal. If the trans-port stream syntax has beentested, or if other programs areworking without fault, then anelementary stream analysis is justified.

Elementary stream analysis canbegin at the top level of the syntax and continue downwards.Sequence headers are veryimportant as they tell thedecoder all of the relevantmodes and parameters used inthe compression. The elemen-tary-stream syntax described inSections 4.1 and 4.2 should beused as a guide. Figure 9.16shows part of a sequence headerdisplayed on an MTS 200. At alower level of syntax, Figure 9.17shows a picture header.

9.11 Creating a transport stream

Whenever the decoder is suspect,it is useful to be able to generatea test signal of known quality.Figure 9.18 shows that an MPEGtransport stream must includeProgram Specific Information(PSI), such as PAT, PMT andNIT, describing one or more program streams. Each programstream must contain its ownprogram clock reference (PCR)and elementary streams havingperiodic time stamps.

A DVB transport stream willcontain additional service infor-mation, such as BAT, SDT andEIT tables. A PSI/SI editorenables insertion of any desiredcompliant combination ofPSI/SI into a custom test stream.

Clearly, each item requires ashare of the available transport-stream rate. The multiplexerprovides a rate gauge to displaythe total bit rate used. Theremainder of the bit rate is usedup by inserting stuffing packetswith PIDs that contain all 1's,which a decoder will reject.

44

Figure 9.16.

Figure 9.17.

PATPMT

NIT

MUXTS

AudioVideo

PTS/DTS

Prog

AudioVideo

PTS/DTS

Prog

AudioVideo

PTS/DTS

Prog

Figure 9.18.

Similarly, the MTS 200 can addand check Reed-Solomon pro-tection and can interleave anddeinterleave data. These stagesin the encoder and the reversestages in the decoder can betested. The inner redundancyafter interleaving can also beadded and checked. The onlyprocess the MTS 200 cannotperform is the modulation,which is an analog process.However, the MTS 100 can drivea modulator with a compliantsignal so that it can be tested ona constellation analyzer or withan IRD. This approach is also avery simple way of producingcompliant-test transmissions.

Once the DVB layer is found tobe operating at an acceptableerror rate, a transport streamtester at the receiver is essentiallytesting the transport stream as itleft the encoder or remultiplexer.Therefore, the transport-streamtest should be performed at apoint prior to the transmitter.The transport stream testsshould then run through thetests outlined above. Most ofthese tests operate on transportstream headers, and so thereshould be no difficulty if theprogram material is scrambled.

In addition to these tests, theDVB-specific tests include testingfor the presence and the frequencyof the SDT (Service DescriptionTable), NIT (Network InformationTable), and the EIT (EventInformation Table). This testingcan be performed using the DVBoption in the MTS 200. Figure 9.20 shows a hierarchicview of DVB-SI tables from atransport stream. Figure 9.21shows a display of decodeddescriptor data.

9.13 DVB tests

DVB transmission testing can bedivided into distinct areas. Thepurpose of the DVB equipment(FEC coder, transmitter, receiver,and error correction) is to delivera transport stream to the receiverwith negligible error. In thissense, the DVB layer is (orshould be) transparent to MPEG,and certain testing can assumethat the DVB transmission layeris intact. This assumption is reasonable because any defect in the DVB layer will usuallyresult in excessive bit errors,and this condition will causeboth an MPEG decoder and asyntax tester to misinterpret thestream. Consequently, if a trans-port stream tester connected to aDVB/ATSC receiver finds awide range of many errors, theproblem is probably not in theMPEG equipment, but due toerrors in the transmission.

Finding transmission faultsrequires the ability to analyzevarious points in the DVBChain, checking for conformationwith DVB specifications andMPEG standards. Figure 8.1showed the major processes inthe DVB chain. For example, if anencoder and an IRD (integratedreceiver decoder) are incompati-ble, which one is in error? Or, ifthe energy dispersal randomizingis not correct at either encoder ordecoder, how is one to proceed?

The solution is to use an analyzerthat can add or remove compli-ant energy dispersal. Figure 9.19shows that if energy dispersal isremoved, the encoders random-izer can be tested. Alternatively,if randomized data are available,the energy dispersal removal atthe IRD can be tested.

45

Figure 9.19.

Part of Encoder

TS

TestTS

Randomizer

EnergyDispersalRemover

Figure 9.20.

Figure 9.21.

IRD - Integrated ReceiverDecoder. A combined RF receiverand MPEG decoder that is usedto adapt a TV set to digitaltransmissions.

Level - The size of the inputpicture in use with a given profile. (See Section 2,Compression in Video.)

Macroblock - The screen arearepresented by several lumi-nance and color-difference DCTblocks that are all steered byone motion vector.

Masking - A psychoacousticphenomenon whereby certainsounds cannot be heard in thepresence of others.

NIT - Network InformationTable. Information in one trans-port stream that describes manytransport streams.

Null packets - Packets of "stuff-ing" that carry no data but arenecessary to maintain a constantbit rate with a variable payload.Null packets always have a PIDof 8191 (all 1s). (See Section 7,Transport Streams.)

PAT - Program AssociationTable. Data appearing in packetshaving PID (see Section 7,Transport Streams) code of zerothat the MPEG decoder uses todetermine which programs existin a Transport Stream. PATpoints to PMT, which, in turn,points to the video, audio, anddata content of each program.

PCM - Pulse Code Modulation.A technical term for an analogsource waveform, for example,audio or video signals, expressedas periodic, numerical samples.PCM is an uncompressed digitalsignal.

PCR - Program Clock Reference.The sample of the encoder clockcount that is sent in the programheader to synchronize thedecoder clock.

PCRI - Interpolated ProgramClock Reference. A PCR estimatedfrom a previous PCR and usedto measure jitter.

DVB-SI - DVB ServiceInformation. Information carriedin a DVB multiplex describingthe contents of different multi-plexes. Includes NIT, SDT, EIT,TDT, BAT, RST, and ST. (SeeSection 8, Introduction toDVB/ATSC.)

DVC - Digital Video Cassette.

Elementary Stream - The rawoutput of a compressor carryinga single video or audio signal.

ECM - Entitlement ControlMessage. Conditional accessinformation specifying controlwords or other stream-specificscrambling parameters.

EIT - Event Information Table.Part of DVB-SI.

EMM - Entitlement ManagementMessage. Conditional accessinformation specifying autho-rization level or services of specific decoders. An individualdecoder or a group of decodersmay be addressed.

ENG - Electronic NewsGathering. Term used todescribe use of video-recordinginstead of film in news coverage.

EPG - Electronic Program Guide.A program guide delivered bydata transfer rather than printedpaper.

FEC - Forward Error Correction.System in which redundancy isadded to the message so thaterrors can be corrected dynami-cally at the receiver.

GOP - Group Of Pictures. AGOP starts with an I picture andends with the last picture beforethe next I picture.

Inter-coding - Compression thatuses redundancy between suc-cessive pictures; also known astemporal coding.

Interleaving - A technique usedwith error correction that breaksup burst errors into many smallererrors.

Intra-coding - Compression that works entirely within onepicture; also known as spatialcoding.

Glossary AAU - Audio Access Unit. SeeAccess unit.

AC-3 - An audio-coding tech-nique used with ATSC. (SeeSection 4, Elementary Streams.)

Access unit - The coded data fora picture or block of sound andany stuffing (null values) thatfollows it.

ATSC - Advanced TelevisionSystems Committee.

Bouquet - A group of transportstreams in which programs areidentified by combination ofnetwork ID and PID (part ofDVB-SI).

CAT - Conditional Access Table.Packets having PID (see Section 7,Transport Streams) codes of 1and that contain informationabout the scrambling system.See ECM and EMM.

Channel code - A modulationtechnique that converts raw datainto a signal that can be recordedor transmitted by radio or cable.

CIF - Common InterchangeFormat. A 352 x 240 pixel formatfor 30 fps video conferencing.

Closed GOP - A Group OfPictures in which the last pic-tures do not need data from thenext GOP for bidirectional coding. Closed GOP is used tomake a splice point in a bit stream.

Coefficient - A number specifyingthe amplitude of a particularfrequency in a transform.

DCT - Discrete Cosine Transform.

DTS - Decoding Time Stamp. Partof PES header indicating whenan access unit is to be decoded.

DVB - Digital Video Broadcasting.The broadcasting of televisionprograms using digital modula-tion of a radio frequency carrier.The broadcast can be terrestrialor from a satellite.

DVB-IRD - See IRD.

46

TDAC - Time Domain AliasingCancellation. A coding techniqueused in AC-3 audio compression.

TDT - Time and Date Table.Used in DVB-SI.

T-STD - Transport StreamSystem Target Decoder. Adecoder having a certainamount of buffer memoryassumed to be present by anencoder.

Transport stream - A multiplexof several program streams thatare carried in packets.Demultiplexing is achieved bydifferent packet IDs (PIDs). SeePSI, PAT, PMT, and PCR.

Truncation - Shortening thewordlength of a sample or coef-ficient by removing low-orderbits.

VAU - Video Access Unit. Onecompressed picture in programstream.

VLC - Variable Length Coding.A compressed technique thatallocates short codes to frequencyvalues and long codes to infrequent values.

VOD - Video On Demand. A system in which television programs or movies are trans-mitted to a single consumeronly when requested.

Vector - A motion compensationparameter that tells a decoderhow to shift part of a previouspicture to more closely approxi-mate the current picture.

Wavelet - A transform in thebasis function that is not offixed length but that growslonger as frequency reduces.

Weighting - A method of changingthe distribution of the noise thatis due to truncation by premul-tiplying values.

QCIF - One-quarter-resolution(176 x 144 pixels) CommonInterchange Format. See CIF.

QSIF - One-quarter-resolutionSource Input Format. See SIF.

RLC - Run Length Coding. Acoding scheme that counts number of similar bits instead of sending them individually.

SDI - Serial Digital Interface.Serial coaxial cable interfacestandard intended for productiondigital video signals.

STC - System Time Clock. Thecommon clock used to encodevideo and audio in the sameprogram.

SDT - Service Description Table.A table listing the providers ofeach service in a transportstream.

SI - See DVB-SI.

SIF - Source Input Format. Ahalf-resolution input signal usedby MPEG-1.

ST - Stuffing Table.

Scalability - A characteristic ofMPEG2 that provides for multi-ple quality levels by providinglayers of video data. Multiplelayers of data allow a complexdecoder to produce a better picture by using more layers ofdata, while a more simpledecoder can still produce a picture using only the first layerof data.

Stuffing - Meaningless dataadded to maintain constant bitrate.

Syndrome - Initial result of anerror checking calculation.Generally, if the syndrome iszero, there is assumed to be no error.

PID - Program Identifier. A 13-bit code in the transport packetheader. PID 0 indicates that thepacket contains a PAT PID. (SeeSection 7, Transport Streams.)PID 1 indicates a packet thatcontains CAT. The PID 8191 (all1s) indicates null (stuffing)packets. All packets belongingto the same elementary streamhave the same PID.

PMT - Program Map Tables. Thetables in PAT that point tovideo, audio, and data contentof a transport stream.

Packets - A term used in twocontexts: in program streams, apacket is a unit that containsone or more presentation units;in transport streams, a packet isa small, fixed-size data quantum.

Preprocessing - The video signalprocessing that occurs beforeMPEG Encoding. Noise reduc-tion, downsampling, cut-editidentification, and 3:2 pulldownidentification are examples ofpreprocessing.

Profile - Specifies the codingsyntax used.

Program Stream - A bit streamcontaining compressed video,audio, and timing information.

PTS - Presentation Time Stamps -The time at which a presentationunit is to be available to theviewer.

PSI - Program SpecificInformation. Information thatkeeps track of the different programs in an MPEG transportstream and in the elementarystreams in each program. PSIincludes PAT, PMT, NIT, CAT,ECM, and EMM.

PSI/SI - A general term for com-bined MPEG PSI and DVB-SI.

PU - Presentation Unit. Onecompressed picture or block of audio.

47

10/97 FL5419 25W–11418–0

Copyright © 1997, Tektronix, Inc. All rights reserved. Tektronix products are covered by U.S. and foreign patents, issued and pending. Information in this publication supersedes that in all previously published material. Specification and price change privileges reserved. TEKTRONIX and TEK are registered trademarks.

For further information, contact Tektronix:World Wide Web: http://www.tek.com; ASEAN Countries (65) 356-3900; Australia & New Zealand 61 (2) 888-7066; Austria, Eastern Europe, & Middle East 43 (1) 7 0177-261; Belgium 32 (2) 725-96-10;Brazil and South America 55 (11) 3741 8360; Canada 1 (800) 661-5625; Denmark 445 (44) 850700; Finland 358 (9) 4783 400; France & North Africa 33 (1) 69 86 81 08; Germany 49 (221) 94 77-400; Hong Kong (852) 2585-6688; India 91 (80) 2275577; Italy 39 (2) 250861; Japan (Sony/Tektronix Corporation) 81 (3) 3448-4611; Mexico, Central America, & Caribbean 52 (5) 666-6333;The Netherlands 31 23 56 95555; Norway 47 (22) 070700; People’s Republic of China (86) 10-62351230; Republic of Korea 82 (2) 528-5299; Spain & Portugal 34 (1) 372 6000; Sweden 46 (8) 629 6500;Switzerland 41 (41) 7119192; Taiwan 886 (2) 765-6362; United Kingdom & Eire 44 (1628) 403300; USA 1 (800) 426-2200

From other areas, contact: Tektronix, Inc. Export Sales, P.O. Box 500, M/S 50-255, Beaverton, Oregon 97077-0001, USA (503) 627-1916

a guide to mpeg fundamentals and protocol analysis to mpeg.pdf · figure 1.1a shows that in...

Documents