a guide to mpeg
TRANSCRIPT
-
8/14/2019 A guide to MPEG
1/48Copyright 1997, Tektronix, Inc. All rights reserved.
A Guide to MPEGFundamentals andProtocol Analysis(Including DVB and ATSC)
-
8/14/2019 A guide to MPEG
2/48
Section 1 Introduction to MPEG . . . . . . . . . . . . . . .3
1.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.2 Why compression is needed . . . . . . . . . . . . . . . . . .31.3 Applications of compression . . . . . . . . . . . . . . . . .31.4 Introduction to video compression . . . . . . . . . . . . .41.5 Introduction to audio compression . . . . . . . . . . . . .61.6 MPEG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
1.7 Need for monitoring and analysis . . . . . . . . . . . . . .71.8 Pitfalls of compression . . . . . . . . . . . . . . . . . . . . . .7
Section 2 Compression in Video . . . . . . . . . . . . . . .8
2.1 Spatial or temporal coding? . . . . . . . . . . . . . . . . . .82.2 Spatial coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.3 Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92.4 Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112.5 Entropy coding . . . . . . . . . . . . . . . . . . . . . . . . . . .112.6 A spatial coder . . . . . . . . . . . . . . . . . . . . . . . . . . .112.7 Temporal coding . . . . . . . . . . . . . . . . . . . . . . . . .122.8 Motion compensation . . . . . . . . . . . . . . . . . . . . . .132.9 Bidirectional coding . . . . . . . . . . . . . . . . . . . . . . .142.10 I, P, and B pictures . . . . . . . . . . . . . . . . . . . . . . .14
2.11 An MPEG compressor . . . . . . . . . . . . . . . . . . . .162.12 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . .192.13 Profiles and levels . . . . . . . . . . . . . . . . . . . . . . .202.14 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Section 3 Audio Compression . . . . . . . . . . . . . . .22
3.1 The hearing mechanism . . . . . . . . . . . . . . . . . . . .223.2 Subband coding . . . . . . . . . . . . . . . . . . . . . . . . . .233.3 MPEG Layer 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .243.4 MPEG Layer 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .253.5 Transform coding . . . . . . . . . . . . . . . . . . . . . . . . .253.6 MPEG Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . .253.7 AC-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Section 4 Elementary Streams . . . . . . . . . . . . . . .26
4.1 Video elementary stream syntax . . . . . . . . . . . . . .264.2 Audio elementary streams . . . . . . . . . . . . . . . . . .27
Contents
Section 5 Packetized Elementary Streams (PES) . . .28
5.1 PES packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .285.2 Time stamps . . . . . . . . . . . . . . . . . . . . . . . . . . . .285.3 PTS/DTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
Section 6 Program Streams . . . . . . . . . . . . . . . . .29
6.1 Recording vs. transmission . . . . . . . . . . . . . . . . .29
6.2 Introduction to program streams . . . . . . . . . . . . .29
Section 7 Transport streams . . . . . . . . . . . . . . . .30
7.1 The job of a transport stream . . . . . . . . . . . . . . . .307.2 Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .307.3 Program Clock Reference (PCR) . . . . . . . . . . . . . .317.4 Packet Identification (PID) . . . . . . . . . . . . . . . . . .317.5 Program Specific Information (PSI) . . . . . . . . . . .32
Section 8 Introduction to DVB/ATSC . . . . . . . . . . .33
8.1 An overall view . . . . . . . . . . . . . . . . . . . . . . . . . . .338.2 Remultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . .338.3 Service Information (SI) . . . . . . . . . . . . . . . . . . . .348.4 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . .34
8.5 Channel coding . . . . . . . . . . . . . . . . . . . . . . . . . .358.6 Inner coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .368.7 Transmitting digits . . . . . . . . . . . . . . . . . . . . . . . .37
Section 9 MPEG Testing . . . . . . . . . . . . . . . . . . .38
9.1 Testing requirements . . . . . . . . . . . . . . . . . . . . . .389.2 Analyzing a Transport Stream . . . . . . . . . . . . . . . .389.3 Hierarchic view . . . . . . . . . . . . . . . . . . . . . . . . . .399.4 Interpreted view . . . . . . . . . . . . . . . . . . . . . . . . . .409.5 Syntax and CRC analysis . . . . . . . . . . . . . . . . . . .419.6 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .419.7 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .429.8 Elementary stream testing . . . . . . . . . . . . . . . . . .439.9 Sarnoff compliant bit streams . . . . . . . . . . . . . . . .43
9.10 Elementary stream analysis . . . . . . . . . . . . . . . .439.11 Creating a transport stream . . . . . . . . . . . . . . . .449.12 Jitter generation . . . . . . . . . . . . . . . . . . . . . . . . .449.13 DVB tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
-
8/14/2019 A guide to MPEG
3/48
SECTION 1INTRODUCTION TO MPEG
MPEG is one of the most popularaudio/video compression tech-niques because it is not just asingle standard. Instead it is arange of standards suitable fordifferent applications but basedon similar principles. MPEG isan acronym for the Moving
Picture Experts Group which wasset up by the ISO (InternationalStandards Organization) to workon compression.
MPEG can be described as theinteraction of acronyms. As ETSIstated "The CAT is a pointer toenable the IRD to find the EMMsassociated with the CA system(s)that it uses." If you can under-stand that sentence you don'tneed this book.
1.1 Convergence
Digital techniques have maderapid progress in audio andvideo for a number of reasons.Digital information is more robustand can be coded to substantiallyeliminate error. This means thatgeneration loss in recording andlosses in transmission are elimi-nated. The Compact Disc wasthe first consumer product todemonstrate this.
While the CD has an improvedsound quality with respect to its
vinyl predecessor, comparison ofquality alone misses the point.The real point is that digitalrecording and transmission tech-niques allow content manipula-tion to a degree that is impossiblewith analog. Once audio or videoare digitized they become data.
Such data cannot be distinguishedfrom any other kind of data;therefore, digital video andaudio become the province ofcomputer technology.
The convergence of computersand audio/video is an inevitableconsequence of the key inventionsof computing and Pulse CodeModulation. Digital media can
store any type of information, soit is easy to utilize a computerstorage device for digital video.The nonlinear workstation wasthe first example of an applicationof convergent technology thatdid not have an analog forerunner.Another example, multimedia,mixed the storage of audio, video,graphics, text and data on thesame medium. Multimedia isimpossible in the analog domain.
1.2 Why compression is needed
The initial success of digitalvideo was in post-productionapplications, where the high costof digital video was offset by itslimitless layering and effectscapability. However, production-standard digital video generatesover 200 megabits per second ofdata and this bit rate requiresextensive capacity for storageand wide bandwidth for trans-mission. Digital video could only
be used in wider applications if
the storage and bandwidthrequirements could be eased;easing these requirements is thepurpose of compression.
Compression is a way ofexpressing digital audio andvideo by using less data.Compression has the followingadvantages:
A smaller amount of storage isneeded for a given amount ofsource material. With high-density recording, such as withtape, compression allows highlyminiaturized equipment forconsumer and Electronic NewsGathering (ENG) use. The accesstime of tape improves with com-pression because less tape needs
to be shuttled to skip over agiven amount of program. Withexpensive storage media such asRAM, compression makes newapplications affordable.
When working in real time, com-pression reduces the bandwidthneeded. Additionally, compres-sion allows faster-than-real-timetransfer between media, forexample, between tape and disk.
A compressed recording formatcan afford a lower recording
density and this can make therecorder less sensitive toenvironmental factors andmaintenance.
1.3 Applications of compression
Compression has a long associa-tion with television. Interlace isa simple form of compressiongiving a 2:1 reduction in band-width. The use of color-differencesignals instead of GBR is anotherform of compression. Becausethe eye is less sensitive to color
detail, the color-difference signalsneed less bandwidth. Whencolor broadcasting was intro-duced, the channel structure ofmonochrome had to be retainedand composite video was devel-oped. Composite video systems,such as PAL, NTSC and SECAM,are forms of compression becausethey use the same bandwidth forcolor as was used for monochrome.
3
-
8/14/2019 A guide to MPEG
4/48
Figure 1.1a shows that in tradi-tional television systems, theGBR camera signal is convertedto Y, Pr, Pb components for pro-duction and encoded into ana-logue composite for transmission.Figure 1.1b shows the modernequivalent. The Y, Pr, Pb signalsare digitized and carried as Y,Cr, Cb signals in SDI form
through the production processprior to being encoded withMPEG for transmission. Clearly,MPEG can be considered by the
broadcaster as a more efficientreplacement for compositevideo. In addition, MPEG hasgreater flexibility because the bitrate required can be adjusted tosuit the application. At lower bitrates and resolutions, MPEG can
be used for video conferencingand video telephones.
DVB and ATSC (the European-
and American-originated digital-television broadcasting standards)would not be viable withoutcompression because the band-width required would be toogreat. Compression extends theplaying time of DVD (digitalvideo/versatile disc) allowingfull-length movies on a standardsize compact disc. Compressionalso reduces the cost of ElectronicNews Gathering and other contri-
butions to television production.
In tape recording, mild compres-sion eases tolerances and addsreliability in Digital Betacam andDigital-S, whereas in SX, DVC,DVCPRO and DVCAM, the goalis miniaturization. In magneticdisk drives, such as the TektronixProfile
storage system, that are
used in file servers and networks(especially for news purposes),
compression lowers storage cost.Compression also lowers band-width, which allows more usersto access a given server. Thischaracteristic is also importantfor VOD (Video On Demand)applications.
1.4 Introduction to videocompression
In all real program material,there are two types of componentsof the signal: those which arenovel and unpredictable andthose which can be anticipated.The novel component is calledentropy and is the true informa-tion in the signal. The remainderis called redundancy because itis not essential. Redundancy may
be spatial, as it is in large plainareas of picture where adjacentpixels have almost the samevalue. Redundancy can also be
temporal as it is where similaritiesbetween successive pictures areused. All compression systemswork by separating the entropyfrom the redundancy in theencoder. Only the entropy isrecorded or transmitted and thedecoder computes the redundancyfrom the transmitted signal.Figure 1.2a shows this concept.
An ideal encoder would extractall the entropy and only this will
be transmitted to the decoder.
An ideal decoder would thenreproduce the original signal. Inpractice, this ideal cannot bereached. An ideal coder would
be complex and cause a verylong delay in order to use tem-poral redundancy. In certainapplications, such as recordingor broadcasting, some delay isacceptable, but in videoconfer-
encing it is not. In some cases, avery complex coder would betoo expensive. It follows thatthere is no one ideal compres-sion system.
In practice, a range of coders isneeded which have a range ofprocessing delays and complexi-ties. The power of MPEG is thatit is not a single compression
format, but a range of standard-ized coding tools that can becombined flexibly to suit a rangeof applications. The way inwhich coding has been performedis included in the compresseddata so that the decoder canautomatically handle whateverthe coder decided to do.
MPEG coding is divided intoseveral profiles that have differentcomplexity, and each profile can
be implemented at a different
level depending on the resolutionof the input picture. Section 2considers profiles and levelsin detail.
There are many different digitalvideo formats and each has adifferent bit rate. For example ahigh definition system mighthave six times the bit rate of astandard definition system.Consequently just knowing the
bit rate out of the coder is notvery useful. What matters is thecompression factor, which is the
ratio of the input bit rate to thecompressed bit rate, for example2:1, 5:1, and so on.
Unfortunately the number ofvariables involved make it verydifficult to determine a suitablecompression factor. Figure 1.2ashows that for an ideal coder, ifall of the entropy is sent, thequality is good. However, if thecompression factor is increasedin order to reduce the bit rate,not all of the entropy is sent and
the quality falls. Note that in acompressed system when thequality loss occurs, compressionis steep (Figure 1.2b). If theavailable bit rate is inadequate,it is better to avoid this area byreducing the entropy of theinput picture. This can be done
by filtering. The loss of resolu-tion caused by the filtering issubjectively more acceptablethan the compression artifacts.
4
AnalogComposite
Out
(PAL, NTSCor SECAM)
B
G
R
Y
Pr
Pb
DigitalCompressed
Out
Matrix ADC ProductionProcess
B
G
R
Y
Pr
Pb
Y
Cr
Cb
Y
Cr
Cb
SDI
MPEGCoder
a)
b)
MatrixCamera
Camera
CompositeEncoder
Figure 1.1.
-
8/14/2019 A guide to MPEG
5/48
To identify the entropy perfectly,an ideal compressor would haveto be extremely complex. Apractical compressor may be lesscomplex for economic reasonsand must send more data to besure of carrying all of the entropy.Figure 1.2b shows the relationship
between coder complexity andperformance. The higher the com-
pression factor required, the morecomplex the encoder has to be.
The entropy in video signalsvaries. A recording of anannouncer delivering the newshas much redundancy and is easyto compress. In contrast, it ismore difficult to compress arecording with leaves blowing inthe wind or one of a footballcrowd that is constantly movingand therefore has less redundancy(more information or entropy). Ineither case, if all the entropy is
not sent, there will be quality loss.Thus, we may choose between aconstant bit-rate channel withvariable quality or a constantquality channel with variable bitrate. Telecommunications networkoperators tend to prefer a constant
bit rate for practical purposes,but a buffer memory can be usedto average out entropy variationsif the resulting increase in delayis acceptable. In recording, avariable bit rate maybe easier tohandle and DVD uses variable
bit rate, speeding up the discwhere difficult material exists.
Intra-coding (intra = within) is atechnique that exploits spatialredundancy, or redundancywithin the picture; inter-coding(inter = between) is a techniquethat exploits temporal redundancy.Intra-coding may be used alone,as in the JPEG standard for stillpictures, or combined withinter-coding as in MPEG.
Intra-coding relies on two char-
acteristics of typical images.First, not all spatial frequenciesare simultaneously present, andsecond, the higher the spatialfrequency, the lower the ampli-tude is likely to be. Intra-codingrequires analysis of the spatialfrequencies in an image. Thisanalysis is the purpose of trans-forms such as wavelets and DCT(discrete cosine transform).Transforms produce coefficientswhich describe the magnitude ofeach spatial frequency.
Typically, many coefficients willbe zero, or nearly zero, and thesecoefficients can be omitted,resulting in a reduction in bit rate.
Inter-coding relies on findingsimilarities between successivepictures. If a given picture is
available at the decoder, the nextpicture can be created by sendingonly the picture differences. Thepicture differences will beincreased when objects move,
but this magnification can beoffset by using motion compen-sation, since a moving objectdoes not generally change itsappearance very much from onepicture to the next. If the motioncan be measured, a closer approx-imation to the current picturecan be created by shifting part of
the previous picture to a newlocation. The shifting process iscontrolled by a vector that istransmitted to the decoder. Thevector transmission requires lessdata than sending the picture-difference data.
MPEG can handle both interlacedand non-interlaced images. Animage at some point on the timeaxis is called a "picture," whetherit is a field or a frame. Interlaceis not ideal as a source for digital
compression because it is initself a compression technique.Temporal coding is made morecomplex because pixels in onefield are in a different position tothose in the next.
Motion compensation minimizes
but does not eliminate thedifferences between successivepictures. The picture-differenceis itself a spatial image and can
be compressed using transform-based intra-coding as previouslydescribed. Motion compensationsimply reduces the amount ofdata in the difference image.
The efficiency of a temporalcoder rises with the time spanover which it can act. Figure 1.2cshows that if a high compressionfactor is required, a longer timespan in the input must be con-sidered and thus a longer codingdelay will be experienced. Clearlytemporally coded signals are dif-ficult to edit because the contentof a given output picture may be
based on image data which wastransmitted some time earlier.Production systems will haveto limit the degree of temporalcoding to allow editing and thislimitation will in turn limit theavailable compression factor.
5
Short DelayCoder has to
send even more
Non-IdealCoder has tosend more
Ideal Codersends only
Entropy
Entropy
PCM Video
WorseQuality
BetterQuality
Latency
CompressionFactor
CompressionFactor
WorseQuality
BetterQuality
Complexity
a)
b) c)
Figure 1.2.
-
8/14/2019 A guide to MPEG
6/48
Stream differs from a ProgramStream in that the PES packetsare further subdivided into shortfixed-size packets and in thatmultiple programs encoded withdifferent clocks can be carried.This is possible because a trans-port stream has a program clockreference (PCR) mechanismwhich allows transmission of
multiple clocks, one of which isselected and regenerated at thedecoder. A Single ProgramTransport Stream (SPTS) is alsopossible and this may be found
between a coder and a multi-plexer. Since a Transport Streamcan genlock the decoder clockto the encoder clock, the SingleProgram Transport Stream(SPTS) is more common thanthe Program Stream.
A Transport Stream is more thanjust a multiplex of audio andvideo PES. In addition to thecompressed audio, video anddata, a Transport Streamincludes a great deal of metadatadescribing the bit stream. Thisincludes the Program AssociationTable (PAT) that lists every pro-gram in the transport stream.Each entry in the PAT points toa Program Map Table (PMT) thatlists the elementary streamsmaking up each program. Someprograms will be open, but someprograms may be subject to con-ditional access (encryption) andthis information is also carriedin the metadata.
The Transport Stream consistsof fixed-size data packets, eachcontaining 188 bytes. Eachpacket carries a packet identifiercode (PID). Packets in the sameelementary stream all have thesame PID, so that the decoder(or a demultiplexer) can selectthe elementary stream(s) itwants and reject the remainder.
Packet-continuity counts ensurethat every packet that is neededto decode a stream is received.An effective synchronizationsystem is needed so thatdecoders can correctly identifythe beginning of each packetand deserialize the bit streaminto words.
complicating audio compressionis that delayed resonances inpoor loudspeakers actually maskcompression artifacts. Testing acompressor with poor speakersgives a false result, and signalswhich are apparently satisfactorymay be disappointing whenheard on good equipment.
1.6 MPEG signalsThe output of a single MPEGaudio or video coder is calledan Elementary Stream. AnElementary Stream is an endlessnear real-time signal. For conve-nience, it can be broken intoconvenient-sized data blocks ina Packetized Elementary Stream(PES). These data blocks needheader information to identifythe start of the packets and mustinclude time stamps becausepacketizing disrupts the time axis.
Figure 1.3 shows that one videoPES and a number of audio PEScan be combined to form aProgram Stream, provided thatall of the coders are locked to acommon clock. Time stamps ineach PES ensure lip-sync
between the video and audio.Program Streams have variable-length packets with headers.They find use in data transfersto and from optical and harddisks, which are error free
and in which files of arbitrarysizes are expected. DVD usesProgram Streams.
For transmission and digitalbroadcasting, several programsand their associated PES can bemultiplexed into a singleTransport Stream. A Transport
1.5 Introduction to audiocompression
The bit rate of a PCM digitalaudio channel is only about onemegabit per second, which isabout 0.5% of 4:2:2 digital video.With mild video compressionschemes, such as DigitalBetacam, audio compression isunnecessary. But, as the video
compression factor is raised, itbecomes necessary to compressthe audio as well.
Audio compression takes advan-tage of two facts. First, in typicalaudio signals, not all frequenciesare simultaneously present.Second, because of the phenom-enon of masking, human hearingcannot discern every detail ofan audio signal. Audio compres-sion splits the audio spectruminto bands by filtering or trans-
forms, and includes less datawhen describing bands in whichthe level is low. Where maskingprevents or reduces audibility ofa particular band, even less dataneeds to be sent.
Audio compression is not aseasy to achieve as is video com-pression because of the acuity ofhearing. Masking only worksproperly when the masking andthe masked sounds coincidespatially. Spatial coincidence isalways the case in mono record-
ings but not in stereo recordings,where low-level signals canstill be heard if they are in adifferent part of the soundstage.Consequently, in stereo and sur-round sound systems, a lowercompression factor is allowablefor a given quality. Another factor
6
Figure 1.3.
Video
Data
Audio
Data
ElementaryStream
Video
PES
Audio
PES
Data
Program
Stream(DVD)
SingleProgram
TransportStream
VideoEncoder
Audio
Encoder
Packetizer
Packetizer
ProgramStreamMUX
TransportStreamMUX
-
8/14/2019 A guide to MPEG
7/487
1.7 Need for monitoring and analysis
The MPEG transport stream isan extremely complex structureusing interlinked tables andcoded identifiers to separate theprograms and the elementarystreams within the programs.Within each elementary stream,there is a complex structure,allowing a decoder to distinguish
between, for example, vectors,coefficients and quantizationtables.
Failures can be divided intotwo broad categories. In the firstcategory, the transport systemcorrectly multiplexes and deliversinformation from an encoder toa decoder with no bit errors oradded jitter, but the encoder orthe decoder has a fault. In thesecond category, the encoderand decoder are fine, but the
transport of data from one to theother is defective. It is veryimportant to know whether thefault lies in the encoder, thetransport, or the decoder if aprompt solution is to be found.
Synchronizing problems, suchas loss or corruption of syncpatterns, may prevent receptionof the entire transport stream.Transport-stream protocoldefects may prevent the decoderfrom finding all of the data for aprogram, perhaps delivering
picture but not sound. Correctdelivery of the data but withexcessive jitter can cause decodertiming problems.
If a system using an MPEGtransport stream fails, the faultcould be in the encoder, themultiplexer, or in the decoder.How can this fault be isolated?First, verify that a transportstream is compliant with theMPEG-coding standards. If thestream is not compliant, adecoder can hardly be blamedfor having difficulty. If it is, thedecoder may need attention.
Traditional video testing tools,the signal generator, the wave-form monitor and vectorscope,are not appropriate in analyzingMPEG systems, except to ensurethat the video signals enteringand leaving an MPEG systemare of suitable quality. Instead,a reliable source of valid MPEGtest signals is essential for
testing receiving equipmentand decoders. With a suitableanalyzer, the performance ofencoders, transmission systems,multiplexers and remultiplexerscan be assessed with a highdegree of confidence. As a longstanding supplier of high gradetest equipment to the videoindustry, Tektronix continues toprovide test and measurementsolutions as the technologyevolves, giving the MPEG userthe confidence that complex
compressed systems are correctlyfunctioning and allowing rapiddiagnosis when they are not.
1.8 Pitfalls of compression
MPEG compression is lossy inthat what is decoded, is notidentical to the original. Theentropy of the source varies,and when entropy is high, thecompression system may leavevisible artifacts when decoded.In temporal compression,redundancy between successive
pictures is assumed. When thisis not the case, the system fails.An example is video from a pressconference where flashguns arefiring. Individual pictures con-taining the flash are totally dif-ferent from their neighbors, andcoding artifacts become obvious.
Irregular motion or severalindependently moving objectson screen require a lot of vector
bandwidth and this requirementmay only be met by reducing
the picture-data bandwidth.Again, visible artifacts may
occur whose level varies anddepends on the motion. Thisproblem often occurs in sports-coverage video.
Coarse quantizing results inluminance contouring and pos-terized color. These can be seenas blotchy shadows and blockingon large areas of plain color.Subjectively, compression artifacts
are more annoying than therelatively constant impairmentsresulting from analog televisiontransmission systems.
The only solution to these prob-lems is to reduce the compressionfactor. Consequently, the com-pression user has to make avalue judgment between theeconomy of a high compressionfactor and the level of artifacts.
In addition to extending theencoding and decoding delay,
temporal coding also causesdifficulty in editing. In fact, anMPEG bit stream cannot bearbitrarily edited at all. Thisrestriction occurs because intemporal coding the decodingof one picture may require thecontents of an earlier pictureand the contents may not beavailable following an edit.The fact that pictures may besent out of sequence alsocomplicates editing.
If suitable coding has been used,edits can take place only atsplice points, which are relativelywidely spaced. If arbitrary editingis required, the MPEG streammust undergo a read-modify-write process, which will resultin generation loss.
The viewer is not interested inediting, but the production userwill have to make another valuejudgment about the edit flexibilityrequired. If greater flexibility isrequired, the temporal compres-
sion has to be reduced and ahigher bit rate will be needed.
-
8/14/2019 A guide to MPEG
8/488
sufficient accuracy, the outputof the inverse transform is iden-tical to the original waveform.
The most well known transformis the Fourier transform. Thistransform finds each frequencyin the input signal. It finds eachfrequency by multiplying theinput waveform by a sample ofa target frequency, called a basis
function, and integrating theproduct. Figure 2.1 shows thatwhen the input waveform doesnot contain the target frequency,the integral will be zero, butwhen it does, the integral will
be a coefficient describing theamplitude of that componentfrequency.
The results will be as describedif the frequency component is inphase with the basis function.However if the frequency com-
ponent is in quadrature with thebasis function, the integral willstill be zero. Therefore, it isnecessary to perform twosearches for each frequency,with the basis functions inquadrature with one another sothat every phase of the inputwill be detected.
The Fourier transform has thedisadvantage of requiring coeffi-cients for both sine and cosinecomponents of each frequency.In the cosine transform, the input
waveform is time-mirrored withitself prior to multiplication bythe basis functions. Figure 2.2shows that this mirroring cancelsout all sine components anddoubles all of the cosine compo-nents. The sine basis functionis unnecessary and only onecoefficient is needed for eachfrequency.
The discrete cosine transform(DCT) is the sampled version ofthe cosine transform and is used
extensively in two-dimensionalform in MPEG. A block of 8 x 8pixels is transformed to becomea block of 8 x 8 coefficients.Since the transform requiresmultiplication by fractions,there is wordlength extension,resulting in coefficients thathave longer wordlength than thepixel values. Typically an 8-bit
Spatial compression relies onsimilarities between adjacentpixels in plain areas of pictureand on dominant spatial fre-quencies in areas of patterning.The JPEG system uses spatialcompression only, since it isdesigned to transmit individualstill pictures. However, JPEGmay be used to code a succession
of individual pictures for video.In the so-called "Motion JPEG"application, the compressionfactor will not be as good as iftemporal coding was used, butthe bit stream will be freelyeditable on a picture-by-picture basis.
2.2 Spatial coding
The first step in spatial codingis to perform an analysis of spa-tial frequency using a transform.A transform is simply a way of
expressing a waveform in a dif-ferent domain, in this case, thefrequency domain. The outputof a transform is a set of coeffi-cients that describe how muchof a given frequency is present.An inverse transform reproducesthe original waveform. If thecoefficients are handled with
SECTION 2COMPRESSION IN VIDEO
This section shows how videocompression is based on theperception of the eye. Importantenabling techniques, such astransforms and motion compen-sation, are considered as anintroduction to the structure ofan MPEG coder.
2.1 Spatial or temporal coding?
As was seen in Section 1, videocompression can take advantageof both spatial and temporalredundancy. In MPEG, temporalredundancy is reduced first byusing similarities between suc-cessive pictures. As much aspossible of the current picture iscreated or "predicted" by usinginformation from pictures alreadysent. When this technique isused, it is only necessary tosend a difference picture, whicheliminates the differences
between the actual picture andthe prediction. The differencepicture is then subject to spatialcompression. As a practical mat-ter it is easier to explain spatialcompression prior to explainingtemporal compression.
No Correlationif Frequency
Different
High Correlationif Frequency
the Same
Input
BasisFunction
Input
BasisFunction
Mirror
Cosine ComponentCoherent Through Mirror
Sine ComponentInverts at Mirror Cancels
Figure 2.1.
Figure 2.2.
-
8/14/2019 A guide to MPEG
9/489
Figure 2.3.
pixel block results in an 11-bitcoefficient block. Thus, a DCTdoes not result in any compres-sion; in fact it results in theopposite. However, the DCTconverts the source pixels into aform where compression is easier.
Figure 2.3 shows the results ofan inverse transform of each ofthe individual coefficients of an
8 x 8 DCT. In the case of theluminance signal, the top-leftcoefficient is the average bright-ness or DC component of thewhole block. Moving acrossthe top row, horizontal spatialfrequency increases. Movingdown the left column, verticalspatial frequency increases. Inreal pictures, different verticaland horizontal spatial frequen-cies may occur simultaneouslyand a coefficient at some pointwithin the block will representall possible horizontal andvertical combinations.
Figure 2.3 also shows thecoefficients as a one dimensionalhorizontal waveform. Combiningthese waveforms with variousamplitudes and either polaritycan reproduce any combinationof 8 pixels. Thus combining the64 coefficients of the 2-D DCTwill result in the original 8 x 8pixel block. Clearly for colorpictures, the color difference
samples will also need to behandled. Y, Cr, and Cb data areassembled into separate 8 x 8arrays and are transformedindividually.
In much real program material,many of the coefficients willhave zero or near zero valuesand, therefore, will not betransmitted. This fact results insignificant compression that isvirtually lossless. If a highercompression factor is needed,then the wordlength of the non-
zero coefficients must be reduced.This reduction will reduce accu-racy of these coefficients andwill introduce losses into theprocess. With care, the lossescan be introduced in a way that
is least visible to the viewer.
2.3 Weighting
Figure 2.4 shows that thehuman perception of noise inpictures is not uniform but is afunction of the spatial frequency.More noise can be tolerated athigh spatial frequencies. Also,video noise is effectively masked
by fine detail in the picture,whereas in plain areas it is highlyvisible. The reader will be awarethat traditional noise measure-ments are always weighted so
that the technical measurementrelates to the subjective result.
Compression reduces the accuracyof coefficients and has a similareffect to using shorter wordlengthsamples in PCM; that is, thenoise level rises. In PCM, theresult of shortening the word-length is that the noise levelrises equally at all frequencies.
As the DCT splits the signal intodifferent frequencies, it becomespossible to control the spectrumof the noise. Effectively, low-frequency coefficients are
Horizontal spatialfrequency waveforms
H
V
HumanVision
Sensitivity
Spatial Frequency
Figure 2.4.
-
8/14/2019 A guide to MPEG
10/4810
As an alternative to truncation,weighted coefficients may benonlinearly requantized so thatthe quantizing step size increaseswith the magnitude of the coef-ficient. This technique allowshigher compression factors butworse levels of artifacts.
Clearly, the degree of compres-sion obtained and, in turn, the
output bit rate obtained, is afunction of the severity of therequantizing process. Different
bit rates will require differentweighting tables. In MPEG, it ispossible to use various differentweighting tables and the table inuse can be transmitted to thedecoder, so that correct decodingautomatically occurs.
increased noise. Coefficientsrepresenting higher spatial fre-quencies are requantized withlarge steps and suffer morenoise. However, fewer stepsmeans that fewer bits are neededto identify the step and a com-pression is obtained.
In the decoder, a low-order zerowill be added to return the
weighted coefficients to theircorrect magnitude. They willthen be multiplied by inverseweighting factors. Clearly, athigh frequencies the multiplica-tion factors will be larger, so therequantizing noise will be greater.Following inverse weighting,the coefficients will have theiroriginal DCT output values, plusrequantizing error, which will
be greater at high frequencythan at low frequency.
Figure 2.5 shows that, in theweighting process, the coeffi-cients from the DCT are divided
by constants that are a functionof two-dimensional frequency.Low-frequency coefficients will
be divided by small numbers,and high-frequency coefficientswill be divided by large numbers.Following the division, the
least-significant bit is discardedor truncated. This truncation isa form of requantizing. In theabsence of weighting, thisrequantizing would have theeffect of doubling the size of thequantizing step, but with weight-ing, it increases the step sizeaccording to the division factor.
As a result, coefficients repre-senting low spatial frequenciesare requantized with relativelysmall steps and suffer little
Input DCT Coefficients(a more complex block)
Output DCT CoefficientsValue for display only
not actual results
Quant Matrix ValuesValue used correspondsto the coefficient location
Quant Scale ValuesNot all code values are shown
One value used for complete 8x8 block
Divide byQuantMatrix
Divide byQuantScale
980 12 23 16 13 4 1 0
12
7
5
2
2
1
0
9 8 11 2 1 0 0
13
6
3
4
4
0
8 3 0 2 0 1
6
8
2
2
1
4 2 1 0 0
1
1
1
0
0
0 0
0 0 0
1 0 0 0
0 0
0 00 0
7842 199 448 362 342 112 31 22
198
142
111
49
58
30
22
151 181 264 59 37 14 3
291
133
85
120
121
28
218 87 27 88 27 12
159
217
60
61
2
119 58 65 36 2
50
40
22
33
8 3 14 12
41 11 2 1
30 1 0 1
24 51 44 81
8 16 19 22 26 27 29 34
16
19
22
22
26
26
27
16 22 24 27 29 34 37
22
22
26
27
27
29
26 27 29 34 34 38
26
27
29
29
35
27 29 34 37 40
29
32
34
38
35 40 48
35 40 48
38 48 56 69
46 56 69 83
32
58
Code LinearQuant Scale
Non-LinearQuant Scale
1
8
16
20
24
28
31
2
16
32
40
48
56
62
1
8
24
40
88
112
56
Figure 2.5.
-
8/14/2019 A guide to MPEG
11/48
Figure 2.6.
2.6 A spatial coder
Figure 2.7 ties together all ofthe preceding spatial codingconcepts. The input signal isassumed to be 4:2:2 SDI (SerialDigital Interface), which mayhave 8- or 10-bit wordlength.MPEG uses only 8-bit resolutiontherefore, a rounding stage will
be needed when the SDI signal
contains 10-bit words. MostMPEG profiles operate with 4:2:0sampling; therefore, a verticallow pass filter/interpolationstage will be needed. Roundingand color subsampling intro-duces a small irreversible loss ofinformation and a proportionalreduction in bit rate. The rasterscanned input format will needto be stored so that it can beconverted to 8 x 8 pixel blocks.
(RLC) allows these coefficientsto be handled more efficiently.Where repeating values, such asa string of 0s, are present, runlength coding simply transmitsthe number of zeros rather thaneach individual bit.
The probability of occurrence ofparticular coefficient values inreal video can be studied. In
practice, some values occur veryoften; others occur less often.This statistical information can
be used to achieve further com-pression using variable lengthcoding (VLC). Frequently occur-ring values are converted toshort code words, and infrequentvalues are converted to longcode words. To aid deserializa-tion, no code word can be theprefix of another.
2.4 Scanning
In typical program material, thesignificant DCT coefficients aregenerally found in the top-leftcorner of the matrix. Afterweighting, low-value coefficientsmight be truncated to zero.More efficient transmission can
be obtained if all of the non-zerocoefficients are sent first, fol-
lowed by a code indicating thatthe remainder are all zero.Scanning is a technique whichincreases the probability ofachieving this result, because itsends coefficients in descendingorder of magnitude probability.Figure 2.6a shows that in a non-interlaced system, the probabilityof a coefficient having a highvalue is highest in the top-leftcorner and lowest in the bottom-right corner. A 45 degree diagonalzig-zag scan is the best sequence
to use here.
In Figure 2.6b, the scan for aninterlaced source is shown. Inan interlaced picture, an 8 x 8DCT block from one fieldextends over twice the verticalscreen area, so that for a givenpicture detail, vertical frequencieswill appear to be twice as greatas horizontal frequencies. Thus,the ideal scan for an interlacedpicture will be on a diagonalthat is twice as steep. Figure 2.6b
shows that a given vertical spa-tial frequency is scanned beforescanning the same horizontalspatial frequency.
2.5 Entropy coding
In real video, not all spatialfrequencies are simultaneouslypresent; therefore, the DCTcoefficient matrix will have zeroterms in it. Despite the use ofscanning, zero coefficients willstill appear between the signifi-cant values. Run length coding
11
Zigzag or Classic (nominally for frames)a) Alternate (nominally for fields)b)
Quantizing
FullBitrate10-bitData
Rate Control
Quantizing Data
CompressedData
Data reduced(no loss)
Data reduced(information lost)
No LossNo Data reduced
Information lostData reduced
Convert4:2:2 to
8-bit 4:2:0Quantize
EntropyCoding BufferDCT
Entropy CodingReduce the number of bits for each coefficient.
Give preference to certain coefficients.Reduction can differ for each coefficient.
Variable Length CodingUse short words for most frequent
values (like Morse Code)
Run Length CodingSend a unique code word instead
of strings of zeros
Figure 2.7.
-
8/14/2019 A guide to MPEG
12/48
Figure 2.8.
Following an inverse transform,the 8 x 8 pixel block is recreated.To obtain a raster scanned output,the blocks are stored in RAM,which is read a line at a time.To obtain a 4:2:2 output from4:2:0 data, a vertical interpolationprocess will be needed as shownin Figure 2.8.
The chroma samples in 4:2:0 are
positioned half way betweenluminance samples in the verticalaxis so that they are evenlyspaced when an interlacedsource is used.
2.7 Temporal coding
Temporal redundancy can beexploited by inter-coding ortransmitting only the differences
between pictures. Figure 2.9shows that a one-picture delaycombined with a subtractor can
compute the picture differences.The picture difference is animage in its own right and can
be further compressed by thespatial coder as was previouslydescribed. The decoder reversesthe spatial coding and adds thedifference picture to the previouspicture to obtain the next picture.
There are some disadvantages tothis simple system. First, asonly differences are sent, it isimpossible to begin decodingafter the start of the transmission.
This limitation makes it difficultfor a decoder to provide picturesfollowing a switch from one
bit stream to another (as occurswhen the viewer changes chan-nels). Second, if any part of thedifference data is incorrect, theerror in the picture willpropagate indefinitely.
The solution to these problems isto use a system that is not com-pletely differential. Figure 2.10shows that periodically completepictures are sent. These arecalled Intra-coded pictures (orI-pictures), and they are obtained
by spatial compression only. Ifan error or a channel switchoccurs, it will be possible toresume correct decoding at thenext I-picture.
system, a buffer memory isused to absorb variations incoding difficulty. Highlydetailed pictures will tend to fillthe buffer, whereas plain pic-tures will allow it to empty. Ifthe buffer is in danger of over-flowing, the requantizing stepswill have to be made larger, sothat the compression factor is
effectively raised.In the decoder, the bit stream isdeserialized and the entropycoding is reversed to reproducethe weighted coefficients. Theinverse weighting is applied andcoefficients are placed in thematrix according to the zig-zagscan to recreate the DCT matrix.
The DCT stage transforms thepicture information to the fre-quency domain. The DCT itselfdoes not achieve any compres-sion. Following DCT, the coeffi-cients are weighted and truncated,providing the first significantcompression. The coefficientsare then zig-zag scanned toincrease the probability that the
significant coefficients occurearly in the scan. After the lastnon-zero coefficient, an EOB(end of block) code is generated.
Coefficient data are further com-pressed by run length and vari-able length coding. In a variable
bit-rate system, the quantizing isfixed, but in a fixed bit-rate
12
4:2:2 Rec 601 4:1:1
4:2:0
1 Luminance sample Y
2 Chrominance samples Cb, Cr
N
N+1
N+2
N+3
ThisPicture
PreviousPicture
Picture
Difference
Picture Delay
+
_
Figure 2.9.
Difference Restart
Start Intra IntraTemporal Temporal Temporal
DifferenceDifference
Figure 2.10.
-
8/14/2019 A guide to MPEG
13/48
Figure 2.11.
objects in the image and therewill be occasions where part ofthe macroblock moves and partof it does not. In this case, it isimpossible to compensate prop-erly. If the motion of the movingpart is compensated by trans-mitting a vector, the stationarypart will be incorrectly shifted,and it will need difference datato be corrected. If no vector issent, the stationary part will becorrect, but difference data will
be needed to correct the movingpart. A practical compressormight attempt both strategiesand select the one which requiredthe least difference data.
with a resolution of half a pixelover the entire search range.When the greatest correlation isfound, this correlation is assumedto represent the correct motion.
The motion vector has a verticaland horizontal component. In
typical program material,motion continues over a numberof pictures. A greater compressionfactor is obtained if the vectorsare transmitted differentially.Consequently, if an objectmoves at constant speed, thevectors do not change and thevector difference is zero.
Motion vectors are associatedwith macroblocks, not with real
2.8 Motion compensation
Motion reduces the similaritiesbetween pictures and increasesthe data needed to create thedifference picture. Motion com-pensation is used to increase thesimilarity. Figure 2.11 shows theprinciple. When an object movesacross the TV screen, it mayappear in a different place in
each picture, but it does notchange in appearance very much.The picture difference can bereduced by measuring the motionat the encoder. This is sent tothe decoder as a vector. Thedecoder uses the vector to shiftpart of the previous picture to amore appropriate place in thenew picture.
One vector controls the shiftingof an entire area of the picturethat is known as a macroblock.
The size of the macroblock isdetermined by the DCT codingand the color subsamplingstructure. Figure 2.12a showsthat, with a 4:2:0 system, thevertical and horizontal spacingof color samples is exactly twicethe spacing of luminance. Asingle 8 x 8 DCT block of colorsamples extends over the samearea as four 8 x 8 luminance
blocks; therefore this is theminimum picture area whichcan be shifted by a vector. One
4:2:0 macroblock contains fourluminance blocks: one Cr blockand one Cb block.
In the 4:2:2 profile, color is onlysubsampled in the horizontalaxis. Figure 2.12b shows that in4:2:2, a single 8 x 8 DCT blockof color samples extends overtwo luminance blocks. A 4:2:2macroblock contains four lumi-nance blocks: two Cr blocks andtwo Cb blocks.
The motion estimator works bycomparing the luminance datafrom two successive pictures. Amacroblock in the first pictureis used as a reference. When theinput is interlaced, pixels will
be at different vertical locationsin the two fields, and it will,therefore, be necessary to inter-polate one field before it can becompared with the other. Thecorrelation between the referenceand the next picture is measuredat all possible displacements
13
Actions:
1. Compute Motion Vector
2. Shift Data from Picture NUsing Vector to MakePredicted Picture N+1
3. Compare Actual Picturewith Predicted Picture
4. Send Vector andPrediction ErrorPicture N Picture N+1
MotionVector
Part of Moving Object
a) 4:2:0 has 1/4 as many chroma sampling points as Y.
b) 4:2:2 has twice as much chroma data as 4:2:0.
8
8
8
8
8
8
8
8
8
8
8
8
16
16 4 x Y
Cr
Cb
88
8
8
2 x Cr
2 x Cb
16
16
8
8
8
8
8
8
8
8
4 x Y
8
8
8
8
Figure 2.12.
-
8/14/2019 A guide to MPEG
14/48
Figure 2.13.
2.10 I, P and B pictures
In MPEG, three different typesof pictures are needed to supportdifferential and bidirectionalcoding while minimizing errorpropagation:
I pictures are Intra-coded picturesthat need no additional informa-tion for decoding. They requirea lot of data compared to otherpicture types, and therefore theyare not transmitted any morefrequently than necessary. Theyconsist primarily of transformcoefficients and have no vectors.I pictures allow the viewer toswitch channels, and they arresterror propagation.
P pictures are forward Predictedfrom an earlier picture, whichcould be an I picture or a P pic-ture. P-picture data consists ofvectors describing where, in the
previous picture, each macro-block should be taken from, andnot of transform coefficients thatdescribe the correction or differ-ence data that must be added tothat macroblock. P picturesrequire roughly half the data ofan I picture.
B pictures are Bidirectionallypredicted from earlier or later Ior P pictures. B-picture dataconsists of vectors describingwhere in earlier or later picturesdata should be taken from. Italso contains the transformcoefficients that provide thecorrection. Because bidirectionalprediction is so effective, thecorrection data are minimaland this helps the B picture totypically require one quarter thedata of an I picture.
moved backwards in time tocreate part of an earlier picture.
Figure 2.13 shows the conceptof bidirectional coding. On anindividual macroblock basis, a
bidirectionally coded picturecan obtain motion-compensateddata from an earlier or laterpicture, or even use an averageof earlier and later data.
Bidirectional coding significantlyreduces the amount of differencedata needed by improving thedegree of prediction possible.MPEG does not specify how anencoder should be built, onlywhat constitutes a compliant bitstream. However, an intelligentcompressor could try all threecoding strategies and select theone that results in the least datato be transmitted.
2.9 Bidirectional coding
When an object moves, it concealsthe background at its leadingedge and reveals the backgroundat its trailing edge. The revealed
background requires new data tobe transmitted because the areaof background was previouslyconcealed and no informationcan be obtained from a previous
picture. A similar problemoccurs if the camera pans: newareas come into view and nothingis known about them. MPEGhelps to minimize this problem
by using bidirectional coding,which allows information to betaken from pictures before andafter the current picture. If a
background is being revealed, itwill be present in a later picture,and the information can be
14
Revealed Area isNot in Picture (N)
Revealed Area isin Picture N+2
Picture(N)
Picture(N+1)
Picture(N+2)
TI
ME
-
8/14/2019 A guide to MPEG
15/48
Figure 2.14.
Figure 2.14 introduces theconcept of the GOP or GroupOf Pictures. The GOP beginswith an I picture and then hasP pictures spaced throughout.The remaining pictures areB pictures. The GOP is definedas ending at the last picture
before the next I picture. TheGOP length is flexible, but 12 or
15 pictures is a common value.Clearly, if data for B pictures areto be taken from a future picture,that data must already be avail-able at the decoder. Consequently,
bidirectional coding requiresthat picture data is sent out ofsequence and temporarilystored. Figure 2.14 also showsthat the P-picture data are sent
before the B-picture data. Notethat the last B pictures in theGOP cannot be transmitted untilafter the I picture of the next
GOP since this data will beneeded to bidirectionally decodethem. In order to return picturesto their correct sequence, a tem-poral reference is included witheach picture. As the picture rateis also embedded periodically inheaders in the bit stream, anMPEG file may be displayed by,for example, a personal computer,in the correct order and timescale.
Sending picture data out ofsequence requires additionalmemory at the encoder anddecoder and also causes delay.The number of bidirectionally
15
Rec 601Video Frames
Elementary
Stream
temporal_reference
I B PB B PB B B I B PB
Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame
I IB B B B B BB BP P P
0 3 1 2 6 4 5 0 7 8 3 1 2
Bit RateMBit/Sec
LowerQuality
HigherQuality
ConstantQualityCurve
10
20
30
40
50
I IB IBBP
Figure 2.15.
coded pictures between intra-or forward-predicted picturesmust be restricted to reduce costand minimize delay, if delay isan issue.
Figure 2.15 shows the tradeoffthat must be made between
compression factor and codingdelay. For a given quality, sendingonly I pictures requires morethan twice the bit rate of anIBBP sequence. Where the abilityto edit is important, an IB
sequence is a useful compromise.
-
8/14/2019 A guide to MPEG
16/48
Figure 2.16a.
the data pass straight through to
be spatially coded. Subtractoroutput data also pass to a framestore that can hold several pic-tures. The I picture is held inthe store.
order. The data then enter the
subtractor and the motion esti-mator. To create an I picture, seeFigure 2.16a, the end of theinput delay is selected and thesubtractor is turned off, so that
2.11 An MPEG compressor
Figures 2.16a, b, and c show atypical bidirectional motioncompensator structure. Pre-processed input video enters aseries of frame stores that can be
bypassed to change the picture
16
Reordering Frame Delay QuantizingTables
In
SpatialCoder
RateControl
BackwardPrediction
Error
Disablefor I,P
SpatialData
VectorsOut
ForwardPrediction
Error
Norm
Reorder
ForwardVectors
BackwardVectors
GOPControl
CurrentPicture
Out
F
B
F
B
QT RC
Subtract SpatialCoder
Forward
- Backward
Decision
ForwardPredictor
BackwardPredictor
PastPicture
MotionEstimator
FuturePicture
SpatialDecoder
SpatialDecoder
Subtract
Pass (I)
I Pictures(shaded areas are unused)
-
8/14/2019 A guide to MPEG
17/48
Figure 2.16b.
Reordering Frame Delay QuantizingTables
In
SpatialCoder
RateControl
BackwardPrediction
Error
Disablefor I,P
SpatialData
VectorsOut
ForwardPrediction
Error
Norm
Reorder
ForwardVectors
BackwardVectors
GOPControl
CurrentPicture
Out
F
B
F
B
QT RC
Subtract SpatialCoder
Forward
- Backward
Decision
BackwardPredictor
FuturePicture
SpatialDecoder
P Pictures(shaded areas are unused)
ForwardPredictor
MotionEstimator
PastPicture
SpatialDecoder
Subtract
Pass (I)
17
To encode a P picture, see
Figure 2.16b, the B pictures inthe input buffer are bypassed, sothat the future picture is selected.The motion estimator comparesthe I picture in the output store
with the P picture in the input
store to create forward motionvectors. The I picture is shifted
by these vectors to make a pre-dicted P picture. The predictedP picture is subtracted from theactual P picture to produce the
prediction error, which is
spatially coded and sent alongwith the vectors. The predictionerror is also added to the pre-dicted P picture to create a locallydecoded P picture that alsoenters the output store.
-
8/14/2019 A guide to MPEG
18/4818
Figure 2.16c.
Reordering Frame Delay QuantizingTables
In
SpatialCoder
RateControl
BackwardPrediction
Error
Disablefor I,P
SpatialData
VectorsOut
ForwardPrediction
Error
Norm
Reorder
ForwardVectors
BackwardVectors
GOPControl
CurrentPicture
Out
F
B
F
B
QT RC
ForwardPredictor
MotionEstimator
PastPicture
SpatialDecoder
Subtract
Pass (I)
Subtract SpatialCoder
Forward
- Backward
Decision
BackwardPredictor
FuturePicture
SpatialDecoder
B Pictures(shaded area is unused)
output is spatially coded and
the vectors are added in a multi-plexer. Syntactical data is alsoadded which identifies the typeof picture (I, P or B) and pro-vides other information to help adecoder (see section 4). The out-put data are buffered to allowtemporary variations in bit rate.If the bit rate shows a long termincrease, the buffer will tend tofill up and to prevent overflowthe quantization process willhave to be made more severe.Equally, should the buffer show
signs of underflow, the quantiza-tion will be relaxed to maintainthe average bit rate. This meansthat the store contains exactlywhat the store in the decoderwill contain, so that the resultsof all previous coding errors arepresent. These will automatically
be reduced when the predictedpicture is subtracted from theactual picture because the differ-ence data will be more accurate.
forward or backward data are
selected according to which rep-resent the smallest differences.The picture differences are thenspatially coded and sent withthe vectors.
When all of the intermediateB pictures are coded, the inputmemory will once more be
bypassed to create a new P pic-ture based on the previousP picture.
Figure 2.17 shows an MPEGcoder. The motion compensator
The output store then contains
an I picture and a P picture. AB picture from the input buffercan now be selected. The motioncompensator, see Figure 2.16c,will compare the B picture withthe I picture that precedes it andthe P picture that follows it toobtain bidirectional vectors.Forward and backward motioncompensation is performed toproduce two predicted B pictures.These are subtracted from thecurrent B picture. On a macro-
block-by-macroblock basis, the
Figure 2.17.
In
Out
DemandClock
MUX Buffer
Quantizing
Tables
Spatial
Data
Motion
Vectors
SyntacticalData
Rate Control
Entropy andRun Length Coding
DifferentialCoder
BidirectionalCoder
(Fig. 2.13)
-
8/14/2019 A guide to MPEG
19/48
this behavior would result in aserious increase in vector data.In 60 Hz video, 3:2 pulldown isused to obtain 60 Hz from 24 Hzfilm. One frame is made intotwo fields, the next is made intothree fields, and so on.
Consequently, one field in fiveis completely redundant. MPEGhandles film material best by
discarding the third field in 3:2systems. A 24 Hz code in thetransmission alerts the decoderto recreate the 3:2 sequence byre-reading a field store. In 50and 60 Hz telecine, pairs offields are deinterlaced to createframes, and then motion ismeasured between frames. Thedecoder can recreate interlace
by reading alternate lines in theframe store.
A cut is a difficult event for a
compressor to handle because itresults in an almost completeprediction failure, requiring alarge amount of correction data.If a coding delay can be tolerated,a coder may detect cuts inadvance and modify the GOPstructure dynamically, so thatthe cut is made to coincide withthe generation of an I picture. Inthis case, the cut is handledwith very little extra data. Thelast B pictures before the I framewill almost certainly need to
use forward prediction. In someapplications that are not real-time, such as DVD mastering, acoder could take two passes atthe input video: one pass toidentify the difficult or highentropy areas and create a codingstrategy, and a second pass toactually compress the input video.
If a high compression factor isrequired, the level of artifactscan increase, especially if inputquality is poor. In this case, itmay be better to reduce theentropy entering the coder usingprefiltering. The video signal issubject to two-dimensional,low-pass filtering, whichreduces the number of coeffi-
cients needed and reduces thelevel of artifacts. The picturewill be less sharp, but lesssharpness is preferable to a highlevel of artifacts.
In most MPEG-2 applications,4:2:0 sampling is used, whichrequires a chroma downsam-pling process if the source is4:2:2. In MPEG-1, the luminanceand chroma are further down-sampled to produce an inputpicture or SIF (Source InputFormat), that is only 352-pixelswide. This technique reducesthe entropy by a further factor.For very high compression, theQSIF (Quarter Source InputFormat) picture, which is176-pixels wide, is used.Downsampling is a process thatcombines a spatial low-pass filterwith an interpolator. Downsam-pling interlaced signals is prob-lematic because vertical detail isspread over two fields whichmay decorrelate due to motion.
When the source material istelecine, the video signal hasdifferent characteristics thannormal video. In 50 Hz video,pairs of fields represent thesame film frame, and there is nomotion between them. Thus, themotion between fields alternates
between zero and the motionbetween frames. Since motionvectors are sent differentially,
2.12 Preprocessing
A compressor attempts to elimi-nate redundancy within thepicture and between pictures.Anything which reduces thatredundancy is undesirable.Noise and film grain are particu-larly problematic because theygenerally occur over the entirepicture. After the DCT process,
noise results in more non-zerocoefficients, which the codercannot distinguish from genuinepicture data. Heavier quantizingwill be required to encode all ofthe coefficients, reducing picturequality. Noise also reduces simi-larities between successive pic-tures, increasing the differencedata needed.
Residual subcarrier in videodecoded from composite videois a serious problem because it
results in high, spatial frequenciesthat are normally at a low levelin component programs. Subcar-rier also alternates from pictureto picture causing an increase indifference data. Naturally, anycomposite decoding artifactthat is visible in the input to theMPEG coder is likely to bereproduced at the decoder.
Any practice that causes un-wanted motion is to be avoided.Unstable camera mountings, inaddition to giving a shaky picture,
increase picture differences andvector transmission requirements.This will also happen withtelecine material if film weaveor hop due to sprocket holedamage is present. In general,video that is to be compressedmust be of the highest qualitypossible. If high quality cannot
be achieved, then noise reductionand other stabilization techniqueswill be desirable.
19
-
8/14/2019 A guide to MPEG
20/48
Figure 2.18.
picture with moderate signal-to-noise ratio results. If, however,that picture is locally decodedand subtracted pixel-by-pixelfrom the original, a quantizingnoise picture results. This picturecan be compressed and trans-mitted as the helper signal. Asimple decoder only decodesthe main, noisy bit stream, but a
more complex decoder candecode both bit streams andcombine them to produce alow noise picture. This is theprinciple of SNR scaleability.
As an alternative, coding onlythe lower spatial frequencies ina HDTV picture can produce amain bit stream that an SDTVreceiver can decode. If the lowerdefinition picture is locallydecoded and subtracted fromthe original picture, a definition-enhancing picture would result.This picture can be coded into ahelper signal. A suitable decodercould combine the main andhelper signals to recreate theHDTV picture. This is theprinciple of Spatial scaleability.
The High profile supports bothSNR and spatial scaleability aswell as allowing the option of4:2:2 sampling.
The 4:2:2 profile has beendeveloped for improved compat-ibility with digital production
equipment. This profile allows4:2:2 operation without requiringthe additional complexity ofusing the high profile. Forexample, an HP@ML decodermust support SNR scaleability,which is not a requirement forproduction. The 4:2:2 profilehas the same freedom of GOPstructure as other profiles, butin practice it is commonly usedwith short GOPs making editingeasier. 4:2:2 operation requires ahigher bit rate than 4:2:0, and
the use of short GOPs requiresan even higher bit rate for agiven quality.
low level uses a low resolutioninput having only 352 pixelsper line. The majority of broad-cast applications will requirethe MP@ML (Main Profile atMain Level) subset of MPEG,which supports SDTV (StandardDefinition TV).
The high-1440 level is a highdefinition scheme that doubles
the definition compared to themain level. The high level notonly doubles the resolution butmaintains that resolution with16:9 format by increasing thenumber of horizontal samplesfrom 1440 to 1920.
In compression systems usingspatial transforms and requan-tizing, it is possible to producescaleable signals. A scaleableprocess is one in which the inputresults in a main signal and a
"helper" signal. The main signalcan be decoded alone to give apicture of a certain quality, but,if the information from the helpersignal is added, some aspect ofthe quality can be improved.
For example, a conventionalMPEG coder, by heavily requan-tizing coefficients, encodes a
2.13 Profiles and levels
MPEG is applicable to a widerange of applications requiringdifferent performance and com-plexity. Using all of the encodingtools defined in MPEG, there aremillions of combinations possi-
ble. For practical purposes, theMPEG-2 standard is dividedinto profiles, and each profile
is subdivided into levels (seeFigure 2.18). A profile is basicallya subset of the entire codingrepertoire requiring a certaincomplexity. A level is a parame-ter such as the size of the pictureor bit rate used with that profile.In principle, there are 24 combi-nations, but not all of these have
been defined. An MPEG decoderhaving a given Profile and Levelmust also be able to decodelower profiles and levels.
The simple profile does not sup-port bidirectional coding, and soonly I and P pictures will beoutput. This reduces the codingand decoding delay and allowssimpler hardware. The simpleprofile has only been defined atMain level (SP@ML).
The Main Profile is designed fora large proportion of uses. The
20
HIGH
HIGH-1440
MAIN
LOW
LEVELPROFILE
4:2:01920x1152
80 Mb/sI,P,B
4:2:01440x1152
60 Mb/sI,P,B
4:2:0720x57615 Mb/s
I,P,B
4:2:0352x2884 Mb/s
I,P,B
4:2:0720x57615 Mb/s
I,P
SIMPLE MAIN
4:2:2720x60850 Mb/s
I,P,B
4:2:2PROFILE
4:2:0352x2884 Mb/sI,P,B
4:2:0720x57615 Mb/s
I,P,B
4:2:01440x1152
60 Mb/sI,P,B
4:2:0, 4:2:21440x1152
80 Mb/sI,P,B
4:2:0, 4:2:2720x57620 Mb/s
I,P,B
4:2:0, 4:2:21920x1152
100 Mb/sI,P,B
SNR SPATIAL HIGH
-
8/14/2019 A guide to MPEG
21/48
Figure 2.19.
of pitch in steady tones.
For video coding, wavelets havethe advantage of producing reso-lution scaleable signals withalmost no extra effort. In movingvideo, the advantages of waveletsare offset by the difficulty ofassigning motion vectors to avariable size block, but in still-picture or I-picture coding this
Figure 2.19 contrasts the fixedblock size of the DFT/DCT withthe variable size of the wavelet.
Wavelets are especially usefulfor audio coding because theyautomatically adapt to the con-flicting requirements of theaccurate location of transients intime and the accurate assessment
2.14 Wavelets
All transforms suffer fromuncertainty because the moreaccurately the frequency domainis known, the less accurately thetime domain is known (and viceversa). In most transforms suchas DFT and DCT, the blocklength is fixed, so the time andfrequency resolution is fixed.
The frequency coefficients rep-resent evenly spaced values ona linear scale. Unfortunately,
because human senses are loga-rithmic, the even scale of theDFT and DCT gives inadequatefrequency resolution at one endand excess resolution at the other.
The wavelet transform is notaffected by this problem becauseits frequency resolution is afixed fraction of an octave andtherefore has a logarithmic
characteristic. This is done bychanging the block length as afunction of frequency. As fre-quency goes down, the block
becomes longer. Thus, a charac-teristic of the wavelet transformis that the basis functions allcontain the same number ofcycles, and these cycles are sim-ply scaled along the time axis tosearch for different frequencies.
21
FFTWavelet
Transform
ConstantSize Windows
in FFT
ConstantNumber of Cyclesin Basis Function
-
8/14/2019 A guide to MPEG
22/48
Figure 3.1.
frequencies available determinesthe frequency range of humanhearing, which in most peopleis from 20 Hz to about 15 kHz.
Different frequencies in theinput sound cause differentareas of the membrane tovibrate. Each area has differentnerve endings to allow pitchdiscrimination. The basilar
membrane also has tiny musclescontrolled by the nerves thattogether act as a kind of positivefeedback system that improvesthe Q factor of the resonance.
The resonant behavior of thebasilar membrane is an exactparallel with the behavior of atransform analyzer. Accordingto the uncertainty theory oftransforms, the more accuratelythe frequency domain of a signalis known, the less accurately the
time domain is known.Consequently, the more able atransform is able to discriminate
between two frequencies, theless able it is to discriminate
between the time of two events.Human hearing has evolvedwith a certain compromise that
balances time-uncertaintydiscrimination and frequencydiscrimination; in the balance,neither ability is perfect.
The imperfect frequencydiscrimination results in the
inability to separate closelyspaced frequencies. This inabilityis known as auditory masking,defined as the reduced sensitivityto sound in the presenceof another.
The physical hearing mechanismconsists of the outer, middleand inner ears. The outer earcomprises the ear canal and theeardrum. The eardrum convertsthe incident sound into a vibra-tion in much the same way asdoes a microphone diaphragm.The inner ear works by sensingvibrations transmitted through a
fluid. The impedance of fluid ismuch higher than that of air andthe middle ear acts as an imped-ance-matching transformer thatimproves power transfer.
Figure 3.1 shows that vibrationsare transferred to the inner ear
by the stirrup bone, which actson the oval window. Vibrationsin the fluid in the ear travel upthe cochlea, a spiral cavity inthe skull (shown unrolled inFigure 3.1 for clarity). The basilarmembrane is stretched acrossthe cochlea. This membranevaries in mass and stiffnessalong its length. At the end nearthe oval window, the membraneis stiff and light, so its resonantfrequency is high. At the distantend, the membrane is heavy andsoft and resonates at low fre-quency. The range of resonant
SECTION 3AUDIO COMPRESSION
Lossy audio compression isbased entirely on the character-istics of human hearing, whichmust be considered before anydescription of compression ispossible. Surprisingly, humanhearing, particularly in stereo, isactually more critically discrim-
inating than human vision, andconsequently audio compressionshould be undertaken with care.As with video compression,audio compression requires anumber of different levels ofcomplexity according to therequired compression factor.
3.1 The hearing mechanism
Hearing comprises physicalprocesses in the ear and nervous/mental processes that combineto give us an impression ofsound. The impression wereceive is not identical to theactual acoustic waveform presentin the ear canal because someentropy is lost. Audio compres-sion systems that lose only thatpart of the entropy that will belost in the hearing mechanismwill produce good results.
22
OuterEar
EarDrum
MiddleEar
StirrupBone
Basilar Membrane
InnerEar
Cochlea (unrolled)
-
8/14/2019 A guide to MPEG
23/48
Figure 3.2a.
3.2 Subband coding
Figure 3.4 shows a band-splittingcompandor. The band-splittingfilter is a set of narrow-band,linear-phase filters that overlapand all have the same band-width. The output in each bandconsists of samples representinga waveform. In each frequency
band, the audio input is ampli-
fied up to maximum level priorto transmission. Afterwards, eachlevel is returned to its correctvalue. Noise picked up in thetransmission is reduced in each
band. If the noise reduction iscompared with the threshold ofhearing, it can be seen thatgreater noise can be tolerated insome bands because of masking.Consequently, in each band aftercompanding, it is possible toreduce the wordlength of sam-ples. This technique achieves a
compression because the noiseintroduced by the loss of resolu-tion is masked.
be present for at least about 1millisecond before it becomesaudible. Because of this slowresponse, masking can still takeplace even when the two signalsinvolved are not simultaneous.Forward and backward maskingoccur when the masking soundcontinues to mask sounds atlower levels before and after the
masking sound's actual duration.Figure 3.3 shows this concept.
Masking raises the threshold ofhearing, and compressors takeadvantage of this effect by raisingthe noise floor, which allowsthe audio waveform to beexpressed with fewer bits. Thenoise floor can only be raised atfrequencies at which there iseffective masking. To maximizeeffective masking, it is necessaryto split the audio spectrum intodifferent frequency bands toallow introduction of differentamounts of companding andnoise in each band.
Figure 3.2a shows that thethreshold of hearing is a functionof frequency. The greatest sensi-tivity is, not surprisingly, in thespeech range. In the presence ofa single tone, the threshold ismodified as in Figure 3.2b. Notethat the threshold is raised fortones at higher frequency and tosome extent at lower frequency.
In the presence of a complexinput spectrum, such as music,the threshold is raised at nearlyall frequencies. One consequenceof this behavior is that the hissfrom an analog audio cassette isonly audible during quiet pas-sages in music. Compandingmakes use of this principle byamplifying low-level audio sig-nals prior to recording or trans-mission and returning them totheir correct level afterwards.
The imperfect time discrimina-tion of the ear is due to its reso-nant response. The Q factor issuch that a given sound has to
23
SoundPressure
20 Hz
Thresholdin Quiet
1 kHz 20 kHz
SoundPressure
20 Hz
MaskingThreshold
1 kHz 20 kHz
1 kHz Sine Wave
Figure 3.2b.
Figure 3.3.
SoundPressure
Post-Masking
Time
Pre-Masking
Figure 3.4.
Gain Factor
CompandedAudio
Input Sub-BandFilter
LevelDetect
X
-
8/14/2019 A guide to MPEG
24/48
Figure 3.5.
be reversed at the decoder.
The filter bank output is alsoanalyzed to determine the spec-trum of the input signal. Thisanalysis drives a masking modelthat determines the degree ofmasking that can be expected ineach band. The more maskingavailable, the less accurate thesamples in each band can be.
The sample accuracy is reducedby requantizing to reducewordlength. This reduction isalso constant for every word ina band, but different bands canuse different wordlengths. Thewordlength needs to be trans-mitted as a bit allocation codefor each band to allow thedecoder to deserialize the bitstream properly.
3.3 MPEG Layer 1
Figure 3.6 shows an MPEGLevel 1 audio bit stream.Following the synchronizingpattern and the header, there are32-bit allocation codes of four
bits each. These codes describethe wordlength of samples ineach subband. Next come the 32scale factors used in the com-panding of each band. Thesescale factors determine the gainneeded in the decoder to returnthe audio to the correct level.The scale factors are followed,
in turn, by the audio data ineach band.
Figure 3.7 shows the Layer 1decoder. The synchronizationpattern is detected by the timinggenerator, which deserializes the
bit allocation and scale factordata. The bit allocation data thenallows deserialization of thevariable length samples. Therequantizing is reversed and thecompression is reversed by thescale factor data to put each
band back to the correct level.
These 32 separate bands arethen combined in a combinerfilter which produces theaudio output.
output of the filter there are 12samples in each of 32 bands.Within each band, the level isamplified by multiplication to
bring the level up to maximum.The gain required is constant forthe duration of a block, and asingle scale factor is transmittedwith each block for each bandin order to allow the process to
Figure 3.5 shows a simple band-splitting coder as is used inMPEG Layer 1. The digitalaudio input is fed to a band-splitting filter that divides thespectrum of the signal into anumber of bands. In MPEG thisnumber is 32. The time axis isdivided into blocks of equallength. In MPEG layer 1, this is
384 input samples, so in the
24
In
OutMUX
X32
MaskingThresholds
BandsplittingFilter32
Subbands
Scalerand
Quantizer
DynamicBit and
Scale FactorAllocater
and Coder
}
SubbandSamples
384 PCM Audio Input SamplesDuration 8 msec @ 48 kHz
Header
20 Bit System12 Bit Sync
CRC
Optional
Bit Allocation
4 Bit Linear
Scale Factors
6 Bit Linear
Anc Data
UnspecifiedLength
GR0 GR1 GR2 GR11
0 1 2 31
MPEGLayer 1
InOut
SyncDET
32InputFilterX
Scale
Factors
X32
X32
Samples
X32Bit
Allocation
Demux
Figure 3.6.
Figure 3.7.
-
8/14/2019 A guide to MPEG
25/48
Figure 3.8.
performed by the psychoacousticmodel. It has been found thatpre-echo is associated with theentropy in the audio risingabove the average value. Toobtain the highest compressionfactor, nonuniform quantizingof the coefficients is used along
with Huffman coding. Thistechnique allocates the shortestwordlengths to the most commoncode values.
3.7 AC-3
The AC-3 audio coding techniqueis used with the ATSC systeminstead of one of the MPEGaudio coding schemes. AC-3 is atransform-based system thatobtains coding gain by requan-tizing frequency coefficients.
The PCM input to an AC-3coder is divided into overlappingwindowed blocks as shown inFigure 3.9. These blocks contain512 samples each, but becauseof the complete overlap, there is100% redundancy. After thetransform, there are 512 coeffi-cients in each block, but becauseof the redundancy, these coeffi-cients can be decimated to 256coefficients using a techniquecalled Time Domain AliasingCancellation (TDAC).
response cannot increase orreduce rapidly. Consequently, ifan audio waveform is trans-formed into the frequencydomain, the coefficients do notneed to be sent very often. Thisprinciple is the basis of transformcoding. For higher compression
factors, the coefficients can berequantized, making them lessaccurate. This process producesnoise which will be placed atfrequencies where the maskingis the greatest. A by-product of atransform coder is that the inputspectrum is accurately known,so a precise masking model can
be created.
3.6 MPEG Layer 3
This complex level of coding isreally only required when the
highest compression factor isneeded. It has a degree of com-monality with Layer 2. A discretecosine transform is used having384 output coefficients per block.This output can be obtained bydirect processing of the inputsamples, but in a multi-levelcoder, it is possible to use ahybrid transform incorporatingthe 32-band filtering of Layers 1and 2 as a basis. If this is done,the 32 subbands from the QMF(Quadrature Mirror Filter) are
each further processed by a12-band MDCT (ModifiedDiscreet Cosine Transform) toobtain 384 output coefficients.
Two window sizes are used toavoid pre-echo on transients.The window switching is
3.4 MPEG Layer 2
Figure 3.8 shows that when theband-splitting filter is used todrive the masking model, thespectral analysis is not veryaccurate, since there are only 32
bands and the energy could beanywhere in the band. Thenoise floor cannot be raised verymuch because, in the worst case
shown, the masking may notoperate. A more accurate spectralanalysis would allow a highercompression factor. In MPEGlayer 2, the spectral analysis isperformed by a separate process.A 512-point FFT working directlyfrom the input is used to drivethe masking model instead. Toresolve frequencies more accu-rately, the time span of thetransform has to be increased,which is done by raising the
block size to 1152 samples.
While the block-compandingscheme is the same as in Layer 1,not all of the scale factors aretransmitted, since they contain adegree of redundancy on realprogram material. The scale factorof successive blocks in the same
band exceeds 2 dB less than10% of the time, and advantageis taken of this characteristic byanalyzing sets of three successivescale factors. On stationary pro-grams, only one scale factor out
of three is sent. As transientcontent increases in a given sub-band, two or three scale factorswill be sent. A scale factorselect code is also sent to allowthe decoder to determine whathas been sent in each subband.This technique effectivelyhalves the scale factor bit rate.
3.5 Transform coding
Layers 1 and 2 are based onband-splitting filters in whichthe signal is still represented as
a waveform. However, Layer 3adopts transform coding similarto that used in video coding. Aswas mentioned above, the earperforms a kind of frequencytransform on the incident soundand because of the Q factor ofthe basilar membrane, the
25
WideSub-Band
Is the MaskingLike This?
Or Like This?
512Samples
Figure 3.9.
-
8/14/2019 A guide to MPEG
26/48
Figure 4.1.
in absolute form. The remainderare transmitted differentiallyand the decoder adds the differ-ence to the previous exponent.Where the input audio has asmooth spectrum, the exponentsin several frequency bands may
be the same. Exponents can begrouped into sets of two or fourwith flags that describe what
has been done.Sets of six blocks are assembledinto an AC-3 sync frame. Thefirst block of the frame alwayshas full exponent data, but incases of stationary signals, later
blocks in the frame can use thesame exponents.
analysis of the input to a finiteaccuracy on a logarithmic scalecalled the spectral envelope. Thisspectral analysis is the input tothe masking model that deter-mines the degree to which noisecan be raised at each frequency.
The masking model drives therequantizing process, whichreduces the accuracy of each
coefficient by rounding themantissae. A significant propor-tion of the transmitted dataconsist of these mantissae.
The exponents are also transmit-ted, but not directly as there isfurther redundancy within themthat can be exploited. Within a
block, only the first (lowest fre-quency) exponent is transmitted
The input waveform is analyzed,and if there is a significant tran-sient in the second half of the
block, the waveform will besplit into two to prevent pre-echo. In this case, the number ofcoefficients remains the same,
but the frequency resolutionwill be halved and the temporalresolution will be doubled. A
flag is set in the bit stream toindicate to the decoder that thishas been done.
The coefficients are output infloating-point notation as amantissa and an exponent. Therepresentation is the binaryequivalent of scientific notation.Exponents are effectively scalefactors. The set of exponents ina block produce a spectral
26
decoders. The approach alsoallows the use of proprietarycoding algorithms, which neednot enter the public domain.
4.1 Video elementary stream syntax
Figure 4.1 shows the construc-tion of the elementary videostream. The fundamental unit ofpicture information is the DCT
block, which represents an 8 x 8array of pixels that can be Y, Cr,
or Cb. The DC coefficient is sentfirst and is represented moreaccurately than the other coeffi-cients. Following the remainingcoefficients, an end of block(EOB) code is sent.
encoders. By standardizing thedecoder, they can be made atlow cost. In contrast, theencoder can be more complexand more expensive without agreat cost penalty, but with thepotential for better picturequality as complexity increases.When the encoder and thedecoder are different in com-plexity, the coding system issaid to be asymmetrical.
The MPEG approach also allowsfor the possibility that qualitywill improve as coding algo-rithms are refined while stillproducing bit streams that can
be understood by earlier
SECTION 4ELEMENTARY STREAMS
An elementary stream is basicallythe raw output of an encoderand contains no more than isnecessary for a decoder toapproximate the original pictureor audio. The syntax of the com-pressed signal is rigidly definedin MPEG so that decoders can
be guaranteed to operate on it.The encoder is not defined
except that it must somehowproduce the right syntax.
The advantage of this approachis that it suits the real world inwhich there are likely to bemany more decoders than
MotionVector
SyncPattern
GlobalVectors
PictureSize
AspectRatio
ProgressiveInterlace
ChromaType