716 ieee transactions on circuits and systems...

17
716 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006 Modeling of Transmission-Loss-Induced Distortion in Decoded Video Yao Wang, Fellow, IEEE, Zhenyu Wu, Member, IEEE, and Jill M. Boyce, Member, IEEE Abstract—This paper analyzes the distortion in decoded video caused by random packet losses in the underlying transmission network. A recursion model is derived that relates the average channel-induced distortion in successive P-frames. The model is applicable to all video encoders using the block-based motion-com- pensated prediction framework (including the H.261/263/264 and MPEG1/2/4 video coding standards) and allows for any mo- tion-compensated temporal concealment method at the decoder. The model explicitly considers the interpolation operation invoked for motion-compensated temporal prediction and concealment with sub-pel motion vectors. The model also takes into account the two new features of the H.264/AVC standard, namely in- traprediction and inloop deblocking filtering. A comparison with simulation data shows that the model is very accurate over a large range of packet loss rates and encoder intrablock rates. The model is further adapted to characterize the channel distortion in subsequent received frames after a single lost frame. This allows one to easily evaluate the impact of a single frame loss. Index Terms—Deblocking filter, end-to-end distortion, error concealment, error propagation, H.264/AVC, intraprediction, packet loss. I. INTRODUCTION T HE QUALITY of the decoded video in a networked video application depends both on the quantization incurred at the encoder and the distortion due to channel errors (e.g., packet losses) that have occurred during transmission. We refer the latter as the channel distortion, and it depends on channel loss characteristics, the coder error resilience features (e.g., the in- trarate, which is the frequency at which a macroblock (MB) is coded in the INTRA mode), and the decoder error conceal- ment method. Accurate modeling of the channel-induced distor- tion is important for jointly determining parameters for source coding and channel error control, and for rate-distortion-opti- mized mode decision in the encoder. The past research for determining the channel (or total) dis- tortion can be categorized based on whether the distortion is estimated and tracked at the pixel level, at the MB level, or at the frame level. The well-known recursive optimal per-pixel esti- mation (ROPE) method [1] recursively calculates the first- and second-order moments of the decoded value for each pixel, from which one can determine the total distortion [in terms of the mean Manuscript received October 24, 2005. A preliminary version of this work was presented at IEEE International Conference on Multimedia and Expo, June 2005. This paper was recommended by Associate Editor H. Gharavi. Y. Wang is with the Polytechnic University, Brooklyn, NY 11201 USA (e-mail: [email protected]). Z. Wu and J. M. Boyce are with Corporate Research, Thomson, Inc., Princeton, NJ 08540 USA (e-mail: [email protected]; jill.boyce@ thomson.net). Digital Object Identifier 10.1109/TCSVT.2006.875203 squared error (MSE)] of each pixel. A problem with the ROPE method is that it is only applicable when the motion vectors (MVs) have integer accuracy and when the decoder conceals lost MBs using the simple frame copy method. Although the ROPE method provides very accurate distortion estimation under the above constraints, it requires intense computation, since it in- volves tracking two moments at every pixel. The extension of ROPE to consider sub-pel motion compensation is considered in [2], which requires substantially more computation. The work in [3] computes and tracks the average distortion at each MB. In our opinion, the recursion formula used to de- termine the distortion caused by predicting from previously concealed blocks does not account for the error propagation effect precisely. It also ignores the error propagation associated with temporal error concealment. As with the ROPE method, it does not consider sub-pel motion compensation. The more re- cent work in [4] also estimates the total distortion at the MB level. It determines the distortion of the current MB by considering the cases when the corresponding matching block in the previous frame is received and lost, separately. Thus, it needs to track the distortion when a block is received and lost separately, for each MB in the past frame. Finally, it assumes that encoder motion vectors are available at the decoder for error concealment. It does attempt to capture the effect of filtering associated with sub-pel motion compensation and deblocking through an attenuation factor, which was determined manually for each test sequence. The proceeding pixel-level and MB-level distortion estima- tion methods can be used for encoder mode decision. They can also be used to determine the distortion at each pixel or the av- erage distortion of each MB or each frame, given the chosen coding modes and MVs for all MBs. However, these methods cannot estimate the expected distortion before encoding a se- quence (and, hence, deciding on the coding mode and MV of each MB), based on a given intrarate. In many applications, it is desirable to estimate the average distortion for an entire video sequence or a segment using different quantization parameters and intrarates, without coding the sequence with all possible settings. Such estimation is required either for determining the video coder configuration (e.g., the quantization parameter and the intrarate) to meet a certain quality and/or rate target, or for jointly optimizing the video coder configuration and the channel code rate, under a total channel bandwidth constraint. To enable the aforementioned operations, it is desirable to have a mathematical model that relates the average channel (or total) distortion over individual frames or a group of frames with the average intrarate and packet-loss rate. Such a model was presented in [5], which was derived by characterizing the spatial-temporal error propagation behavior after an initial loss using a linear filter, with a leakage parameter accounting for the 1051-8215/$20.00 © 2006 IEEE

Upload: ngominh

Post on 22-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

716 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Modeling of Transmission-Loss-InducedDistortion in Decoded Video

Yao Wang, Fellow, IEEE, Zhenyu Wu, Member, IEEE, and Jill M. Boyce, Member, IEEE

Abstract—This paper analyzes the distortion in decoded videocaused by random packet losses in the underlying transmissionnetwork. A recursion model is derived that relates the averagechannel-induced distortion in successive P-frames. The model isapplicable to all video encoders using the block-based motion-com-pensated prediction framework (including the H.261/263/264 andMPEG1/2/4 video coding standards) and allows for any mo-tion-compensated temporal concealment method at the decoder.The model explicitly considers the interpolation operation invokedfor motion-compensated temporal prediction and concealmentwith sub-pel motion vectors. The model also takes into accountthe two new features of the H.264/AVC standard, namely in-traprediction and inloop deblocking filtering. A comparison withsimulation data shows that the model is very accurate over alarge range of packet loss rates and encoder intrablock rates. Themodel is further adapted to characterize the channel distortion insubsequent received frames after a single lost frame. This allowsone to easily evaluate the impact of a single frame loss.

Index Terms—Deblocking filter, end-to-end distortion, errorconcealment, error propagation, H.264/AVC, intraprediction,packet loss.

I. INTRODUCTION

THE QUALITY of the decoded video in a networked videoapplication depends both on the quantization incurred at

the encoder and the distortion due to channel errors (e.g., packetlosses) that have occurred during transmission. We refer thelatter as the channel distortion, and it depends on channel losscharacteristics, the coder error resilience features (e.g., the in-trarate, which is the frequency at which a macroblock (MB)is coded in the INTRA mode), and the decoder error conceal-ment method. Accurate modeling of the channel-induced distor-tion is important for jointly determining parameters for sourcecoding and channel error control, and for rate-distortion-opti-mized mode decision in the encoder.

The past research for determining the channel (or total) dis-tortion can be categorized based on whether the distortion isestimated and tracked at the pixel level, at the MB level, or at theframe level. The well-known recursive optimal per-pixel esti-mation (ROPE) method [1] recursively calculates the first- andsecond-order moments of the decoded value for each pixel, fromwhich one can determine the total distortion [in terms of the mean

Manuscript received October 24, 2005. A preliminary version of this workwas presented at IEEE International Conference on Multimedia and Expo, June2005. This paper was recommended by Associate Editor H. Gharavi.

Y. Wang is with the Polytechnic University, Brooklyn, NY 11201 USA(e-mail: [email protected]).

Z. Wu and J. M. Boyce are with Corporate Research, Thomson, Inc.,Princeton, NJ 08540 USA (e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TCSVT.2006.875203

squared error (MSE)] of each pixel. A problem with the ROPEmethod is that it is only applicable when the motion vectors(MVs) have integer accuracy and when the decoder conceals lostMBs using the simple frame copy method. Although the ROPEmethod provides very accurate distortion estimation under theabove constraints, it requires intense computation, since it in-volves tracking two moments at every pixel. The extension ofROPE to consider sub-pel motion compensation is consideredin [2], which requires substantially more computation.

The work in [3] computes and tracks the average distortionat each MB. In our opinion, the recursion formula used to de-termine the distortion caused by predicting from previouslyconcealed blocks does not account for the error propagationeffect precisely. It also ignores the error propagation associatedwith temporal error concealment. As with the ROPE method, itdoes not consider sub-pel motion compensation. The more re-cent work in [4] also estimates the total distortion at the MB level.It determines the distortion of the current MB by considering thecases when the corresponding matching block in the previousframe is received and lost, separately. Thus, it needs to track thedistortion when a block is received and lost separately, for eachMB in the past frame. Finally, it assumes that encoder motionvectors are available at the decoder for error concealment. It doesattempt to capture the effect of filtering associated with sub-pelmotion compensation and deblocking through an attenuationfactor, which was determined manually for each test sequence.

The proceeding pixel-level and MB-level distortion estima-tion methods can be used for encoder mode decision. They canalso be used to determine the distortion at each pixel or the av-erage distortion of each MB or each frame, given the chosencoding modes and MVs for all MBs. However, these methodscannot estimate the expected distortion before encoding a se-quence (and, hence, deciding on the coding mode and MV ofeach MB), based on a given intrarate. In many applications, it isdesirable to estimate the average distortion for an entire videosequence or a segment using different quantization parametersand intrarates, without coding the sequence with all possiblesettings. Such estimation is required either for determining thevideo coder configuration (e.g., the quantization parameter andthe intrarate) to meet a certain quality and/or rate target, or forjointly optimizing the video coder configuration and the channelcode rate, under a total channel bandwidth constraint.

To enable the aforementioned operations, it is desirable tohave a mathematical model that relates the average channel (ortotal) distortion over individual frames or a group of frameswith the average intrarate and packet-loss rate. Such a modelwas presented in [5], which was derived by characterizing thespatial-temporal error propagation behavior after an initial lossusing a linear filter, with a leakage parameter accounting for the

1051-8215/$20.00 © 2006 IEEE

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 717

filtering incurred for sub-pel motion compensation and/or de-blocking. It further assumes that the effects of losses occurringat different times are additive, which is only valid at low-lossrates. There have been concerns that its estimation accuracy maybe inadequate [4]. He et al. [6] developed a frame-level recur-sion formula, which relates the channel-induced distortion in acurrent frame with that in the previous frame. However, it doesnot take into account of the filtering incurred for sub-pel motioncompensation and/or deblocking filtering, and assumes the de-coder uses frame copy for error concealment.

All the prior work, at the pixel, MB or frame level, doesnot consider intraprediction. Nor do they consider deblockingfiltering explicitly. Some of them also assume that the decoderuses the simple frame copy method for error concealment. In-traprediction and deblocking filtering are two new features of thelatest H.264/AVC video-coding standard [7]. Furthermore, mo-tion-compensated temporal concealment is known to improvethe decoded video quality significantly. Therefore, it is impor-tant to consider these features in channel distortion modeling.The work presented here is similar to [6] in that we also modelthe average frame-level distortion through a recursion formula.We, however, extend the scope of that work substantially. Weconsider both inter and intraprediction, and deblocking. We alsoallow for motion-compensated temporal concealment at thedecoder. For motion-compensated prediction and concealment,we explicitly consider the interpolation operation incurred whenmotion vectors are noninteger. Finally, we also adapt the pro-posed model to characterize the channel distortion in subsequentreceived frames after a single lost frame. This allows one to easilyevaluate the impact of a single frame loss. This can be helpful,e.g., for performing unequal error protection at the frame level.

This paper is organized as follows. Section II introduces thenotations and assumptions underlying our model. Section II de-velops a frame-level recursion formula for the channel-induceddistortion due to random packet loss, starting from the case whenthe encoder does not employ intraprediction and deblocking fil-tering, and then moving on to consider intraprediction and de-blocking filtering, successively. Section IV verifies the proposedmodel with simulation data. Section V describes the proposedmodel for channel distortion due to a single frame loss, and ex-amines its accuracy using simulation data. Section VI concludesthe paper by summarizing the main results, and discussing limi-tations of the current work and possible solutions.

II. NOTATION AND GENERAL ASSUMPTION

A. Definition of Frame-Level Channel Distortion

Let denote the original pixel value in frame and pixelthe reconstructed signal at the encoder, and the recon-

structed signal at the decoder. In the presence of transmissionerrors, and are generally different. The total distortion interms of MSE at pixel is defined as ,where represents the expectation taken over all possiblechannel realizations. The average distortion over all pixels inframe is

(1)

where denotes the averaging-over-pixel operation. Forthe remainder of this paper, we use to denote the con-catenated operation .

The total average distortion can be written as

If we assume the encoder-introduced error (caused byquantization) and the channel-induced error are uncor-related, then the total distortion can be decomposed as

(2)

where

(3)

is the average encoder-induced distortion per pixel in frame ,and

(4)

is the average channel-induced distortion per pixel in frame .He et al. have shown with experimental data [6] that the en-coder-introduced error and the channel-induced error are indeedquite uncorrelated. Since the encoder distortion can be deter-mined accurately at the encoder, the challenging problem in de-termining the total distortion lies in the estimation of the channeldistortion. This paper deals with the modeling of the averagechannel-induced distortion in each frame .

B. Assumptions Underlying the Model

Encoder: We assume a video sequence is partitioned intogroups of pictures (GoPs) and each GoP starts with an I-frame,followed by P-frames.1 In an I-frame, every MB is coded in theINTRA mode, without making use of previous frame informa-tion. In the INTRA mode in the earlier video-coding standards,pixels are coded directly using transform coding. In the H.264standard, intraprediction is applied, by which an MB is first pre-dicted using previously coded neighboring pixels in this frame,and then the prediction error is coded using transform coding. Ina P-frame, an MB is coded either in the INTRA mode or INTERmode, by which the MB is predicted by a corresponding MB in aprevious frame. When the motion vector (MV) for this MB is aninteger vector, pixels in this MB are predicted by correspondingpixels in the best-matching MB in the previous frame directly.But when the MV is noninteger, each pixel in the current MBis predicted by interpolating from a set of corresponding pixelsin the previous frame. We will refer to INTRA-coded MBs asI-MBs, and INTER-coded MBs as P-MBs. We assume the length

1Although we do not consider B-frames in this paper, the proposed model canbe used to model channel distortions in successive P-frames, even if there areB-frames between two adjacent P-frames. The model can also be extended toconsider the channel distortion in B-frames.

718 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

of each GoP is and the percentage of I-MBs in each P-frameis . An MB can be coded in the INTRA mode either for codingefficiency or for error resilience. Generally, the number of I-MBsdue to coding efficiency varies from frame to frame. Therefore,

may not be a constant.Packetization and Loss Pattern: We assume that the MBs in

a frame are grouped into slices so that there is no dependencybetween coded data in separate slices, and each slice has its ownheader, and is carried in a separate transport packet. We alsoassume that the loss of any bits in a packet will make the entirecorresponding video slice undecodable. We further assume thatwith proper packet interleaving, the packet (and, hence, slice)loss event can be characterized as an i.i.d. random process withan average loss rate . We do not make any specific assumptionson the slice length (in terms of the number of MBs contained)and pattern (in terms of the order by which the MBs are putinto a slice). The slice length may be constant or variable. Weonly require that a slice contain either a subset or all MBs ina frame, but not MBs from different frames, and that all codeddata (including MVs) for an MB are put into the same slice.2

Note that the slice pattern can have a significant impact on theaccuracy of error concealment.

Decoder: We assume the decoder performs error conceal-ment on MBs that are contained in lost slices. Generally, onecan use either spatial or temporal concealment, or both. Here,we consider only temporal concealment for P-frames, by whichthe MV of a missing MB is first estimated (typically based onthe MVs of the adjacent MBs in the current frame or the pre-vious frame), and then the MB is replaced by the correspondingMB in the previous frame based on the estimated MV. Whenthe estimated MV is noninteger, each pixel is a weighted av-erage of some pixels in the previous frame. A commonly usedconcealment method is frame copy, which is a special case dis-cussed here with all of the MVs set to zero. We assume theI-frames are concealed by spatial concealment, where a missingpixel is recovered from a weighted average of nearby receivedor previously concealed pixels in the same frame. Because theI-frames are concealed by spatial interpolation, any error prop-agation stops at an I-frame. Therefore, we only need to modelthe channel error propagation in successive P-frames.

III. CHANNEL DISTORTION MODEL FOR RANDOM

PACKET LOSSES

A. Overview of the Model

The proposed channel distortion model is a frame-recursivemodel, relating the channel distortion of frame with that offrame within the same GoP. It has the general form of

(5)

The variable denotes the average concealment distor-tion of frame , defined as the average channel distortion perpixel of this frame, if only one slice in this frame is lost and

2The proposed model will still work if the coding mode and motion informa-tion are transported separately and reliably. In that case, P denotes the loss ratefor the slice containing texture data.

no other slices in this frame or previous frames are lost (to bedefined more rigorously later). This is the new distortion intro-duced when any slice in frame experiences packet losses andits value depends on the underlying error concealment methodand the characteristics of that frame. We assume this term canbe estimated at the encoder.

From the recursion formula in (5), is the sum of the con-cealment distortion in this frame and the product of thechannel distortion in the previous frame by a factor ,which depends on the loss probability and intrarate . Wewill call the error propagation factor, because it characterizeshow the channel distortion in a previous frame propagates intothe current frame. The particular form of depends on theencoder configuration (with or without intraprediction and de-blocking filtering, integer versus sub-pel motion estimation) andthe decoder error-concealment algorithm (with or without mo-tion compensation, with integer or sub-pel MV).

In the following subsections, we derive the forms of for dif-ferent encoder and decoder configurations. We will start with thesimplest case when the encoder does not use intraprediction nordeblocking filtering, and for temporal prediction, uses only in-teger pel MVs. Similarly, the decoder uses only integer-pel MVsfor error concealment. We will then move on to consider sub-pelMVs for encoder motion compensation and decoder error con-cealment, intraprediction, and deblocking filtering in that order.As will be seen, when constrained intraprediction is used, theexpression for stays the same as the case without intrapre-diction. However, nonconstrained intraprediction calls for a dif-ferent form of to account for the spatial error propagationcaused by intraprediction.

In the subsequent derivation, we assume that if a slice is lostin frame (with probability ), all of the MBs in this slicewill be concealed using motion-compensated temporal conceal-ment, with an average distortion . If a slice is received (withprobability ), the decoded MBs in this slice can still havechannel distortion due to errors in the previous frame or pre-viously coded pixels in the same frame. Denoting the averagedistortion in received I-MBs and P-MBs by and ,respectively, the average channel distortion is

(6)

This is the general relation we use to derive the distortion modelsfor different cases.

B. Integer-Pel Motion Vectors, No Intraprediction, NoDeblocking Filtering

For ease of understanding, we first develop the recursionmodel assuming the encoder does not use intraprediction ordeblocking filtering. In this case, for a received I-MB, there willbe no channel distortion (i.e., ). For a P-MB, evenif it is received, its reconstruction may have channel distortiondue to errors in the previous frame. Because we assume inter-prediction uses only integer MVs, each pixel in this MB ispredicted from a pixel , where refers to the spatialindex of the matching pixel in frame . In the receiver, ifthe MB is received, the prediction is based on . Becausethe prediction error signal for this MB is received correctly, the

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 719

channel distortion is purely caused by the mismatch betweenthe prediction references, with distortion

(7)

Note that, by definition, is the average dis-tortion among all pixel locations corresponding to all pos-sible over a frame. In deriving (7), we have assumed that cor-responding for all possible cover an entire frame evenly,so that the average over is equal to the average of all pixelsin frame . Note that in certain special cases, such as duringa camera zoom, for all possible in may correspond toa subregion in , making (7) less accurate.

If an MB is lost, we assume that it will be concealed usingtemporal concealment with an estimated MV, regardless of thecoding mode. Let represent the estimated matching pixelin frame for pixel based on the estimated MV, thenthe concealed value for pixel is . Thechannel distortion is thus

(8)

with

(9)

Note that would have been the concealed value forif there were no channel-induced distortion up to frame .Therefore, represents the newly introduced error atpixel caused by the loss in frame only. In deriving the aboveresult, we have assumed that this new error is uncorrelated withthe channel-induced error up to frame at the correspondingpixel . We believe this is a quite reasonableassumption. Similar assumptions are used to derive subsequentresults.

The term represents the distortion associated witha particular temporal concealment algorithm, in the absenceof error propagation from previous frames. We will refer to itas concealment distortion. For example, if we use the simpleframe-copy error concealment method, then and

is the mean squared differencebetween two successively coded frames. Generally, the under-lying slice pattern affects the estimation of the MVs and, hence,

. For example, if an entire frame is coded into a singleslice, the MVs in a current frame have to be estimated from theMVs in the previous frame or simply set to zero. On the otherhand, if a frame is divided into two interleaving slices, then theMVs in one slice can be more accurately estimated from theMVs for the neighboring MBs in the other slice. As is clear

from the definition, the concealment distortion alsodepends on the encoder quantization parameter.

Substituting and (7) and (8) into (6), we obtainthe recursion model in (5) with

(10)

Note that in this case, is equal to the percentage of pixels atwhich error propagation will continue. Generally,except in the unlikely case of , or , or and

. Clearly, by increasing the intrarate, we can reduce , sothat the error propagation decays faster.

C. Sub-Pel Motion Vectors, No Intraprediction, No DeblockingFiltering

Inall of the state-of-the-art video coders, sub-pel MVs areusedto improve the prediction accuracy. To take into account theinterpolation operation incurred for sub-pel MVs, we assumethat a pixel in a P-MB in frame is predicted by a weightedsum of several neighboring pixels in frame , denoted by

(11)

where refers to the spatial index of the th pixel in framethat was used to predict . Note that this formulation

is also applicable to overlapped block motion compensation(OBMC) employed in the H.263 codec. The interpolation co-efficients should satisfy . Note that the values for

and depend on the MV for the MB, and the interpolationfilter employed for sub-pel motion compensation. For example,with the bilinear interpolation filter, if the MV is half-pel inboth horizontal and vertical direction, then .On the other hand, if the MV is half-pel in one direction butinteger in the other . If the MV is integer inboth directions, .

At the receiver, if the MB is received correctly, the predictionis based on

(12)

Defining , the average channel distortionfor the pixels associated with a particular set of is

(13)

720 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Note that generally, the channel induced errors on neighboringpixels are correlated, especially if these pixels belong to thesame slice. To simplify the analysis, while deriving (13), we as-sumed that the correlation coefficients between errors in everytwo neighboring pixels are the same, represented by . One canthink of as the average correlation coefficient of the channel-induced errors.

The fact that different values of are used for differentreceived P-MBs can be accounted for by taking the average ofthe factors used over all P-MBs ina frame, and denote the average value as

(14)

Further assuming that this average value is the same in differentframes, the average channel distortion for received P-MBs withall possible MVs is

(15)

The definition of in (14) assumes that the correlation be-tween channel-induced errors in neighboring pixels is a con-stant. Because the received I-MBs do not have any channel er-rors, the average correlation in a frame is likely to decrease withthe intrarate in this frame. We have confirmed this conjecturewith our simulation data, and found that a more accurate modelis

(16)

with

(17)

If an MB is lost, it will be concealed using temporal conceal-ment with an estimated MV. Generally, the estimated MV mayalso be a sub-pel vector, and the concealed value can be denotedby

(18)

where represents the number of pixels used for temporalconcealment, and is the spatial index of the th pixel inframe used for interpolating pixel in frame . Note thatbecause the estimated MV may not be the same as the MV used

in the encoder, and generally differ from andfor the same pixel.

The channel distortion for MBs with the same set of is

with

(19)

Averaging over MBs with different yields

(20)

with

(21)

Note that would have been the concealedvalue using the same estimated MV in the absence of transmis-sion errors in the previous frame. Therefore, representsthe average concealment distortion for frame when lossoccurs only in this frame. When deriving the above results,we have again assumed that the concealment error at frame

is uncorrelated with the channel-induced errors in frame. We also assumed that the channel-induced errors in

neighboring pixels in frame have the same pairwisecorrelation coefficient .

Substituting (15) and (20) and into (6) yields therecursion model in (5) with

(22)

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 721

With the more accurate expression for in (16), thefactor is

(23)

We will refer to the distortion model using (5) and (22) asModel1, and the one using (5) and (23) as Model2. Model2 ismore accurate than Model1, but it requires one extra parameter.

Comparing (22) to (10), we see that the effect of sub-pel MVis to introduce the constants and into the error propaga-tion factor. Recall that the interpolation filter coefficients satisfy

and . Because the correlation coefficient sat-isfies , we have . In the special case of ,we have . Similarly, . There-fore, the error propagation factor with sub-pel MV is typi-cally smaller than when only integer-pel MVs are used. Thus,the spatial filtering incurred either for sub-pel motion-compen-sated prediction or sub-pel temporal concealment has the effectof attenuating the temporal error propagation, as is well knownin the video-coding community.

To apply the above model for real video data, the model pa-rameters , and can be predetermined by fitting the model tothe measured distortion data from simulations. This is discussedfurther in Section IV-B. The concealment distortioncan be directly measured by the encoder after coding frame ,by randomly setting certain slices in this frame as lost and run-ning the assumed error concealment method on selected MBs inthe lost slices.

The distortion model in [6] has the same form as (5), (22) butwith , because it assumes frame copy for concealment.The constant was introduced to account for the so-called mo-tion randomness. The derivation here shows clearly the relationof with the interpolation coefficients used for motion com-pensation with sub-pel motion vectors and the correlation be-tween channel-induced errors in neighboring pixels. The modelin [6] assumes the concealment distortion is proportional to theframe difference square where

is a model parameter. This assumption is only valid for theframe-copy error-concealment method. In our case, we assume

can be measured by the encoder, by running the de-coder error concealment method on selected MBs.

D. With Intraprediction, No Deblocking Filtering

When an I-MB makes use of intraprediction, there will beerror propagation within the same frame. We now extend theprevious derivation to consider this effect. We will assume thata pixel in an I-MB in frame is predicted by a weightedsum of several previously coded neighboring pixels in the sameframe, denoted by

(24)

where refers to the spatial index of the th previously codedpixel in frame that is used to predict . For example, in the

H.264 standard, there are many different modes of intrapredic-tion, each leads to a different set of values for and .In each case, the coefficients satisfy .

If an I-MB is received, the intrapredicted value at the decoderwill be

(25)

The channel distortion will be due to the difference in the pre-dicted value only. Generally, the neighboring pixels used to pre-dict a current pixel may come from either I-MBs or P-MBs. Wewill call these neighboring pixels I-neighbors and P-neighbors,respectively. These pixels are also received because they belongto the same slice as the current MB. The average distortion ofP-neighbors is by definition. Denoting the average dis-tortion of I-neighbors by , the distortion of the currentI-MB can be written as

(26)

with

(27)

(28)

In deriving the above result, we have assumed the channel-in-duced errors in I-neighbors and that in P-neighbors are un-correlated.3 Equation (26) shows that the distortion of I-MBsevolves in a recursive manner. This means that intrapredictioncauses spatial error propagation within the same slice. We willassume that this distortion quickly converges after a few I-MBsso that

Substituting this assumption into (26) and making use of (15)yields

(29)

3This assumption may not be strictly true. But our experimental data showthat the resulting relation in (30) is quite accurate, with c possibly takinginto account the discounted correlation.

722 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Assuming the number of P-neighbors and, hence, are propor-tional to , and similarly the number of I-neighbors and,hence, are proportional to , the factor willhave the form of , where and are twoscaling parameters. When is small, this factor can be approx-imated by , which equals when

is close to 1. From experimental data, we have found that thisapproximation with only one parameter works quite well. Thisleads to4

(30)

For P-MBs, if it is received, the channel distortionis the same as in (15) with generally sub-pel MVs. With in-traprediction, received I-MBs do not stop error propagation anymore. Therefore, the correlation coefficient between adjacentpixels does not necessarily decrease with the intrarate. Hence,we do not need to apply the more complicated “Model2” for

.For a lost MB, with motion-compensated temporal conceal-

ment, the distortion may stay as (30). Substituting (30), (15),and (20) into (6) yields the same recursion form as (5) with

(31)

The proceeding analysis assumes unconstrained intra-prediction. With constrained intraprediction as in H.264, onlyintracoded neighboring pixels in the same slice can be usedfor intraprediction so that . Consequently, andthe overall distortion stays the same as in the case withoutintraprediction, and can be characterized by either Model1 orModel2.

The distortion model using (5) and (31) will be referred to asModel3, which is applicable when the encoder applies noncon-strained intraprediction.

With constrained intraprediction or without intraprediction,the factor in either Model1 or Model2 is a decreasing func-tion of . This is because received I-MBs stop error propaga-tion. This is not the case with unconstrained intraprediction. Infact, (31) reveals that is not a monotonically decreasing func-tion of . Rather, reaches its maximum at some intermediateintrarate. This phenomenon is confirmed with our experimentalresults given in Section IV.

E. With Deblocking Filtering

Deblocking filtering within the encoding loop is an importanttool for enhancing visual quality of coded video, and is incor-porated in the earlier H.261 standard and the latest H.264 stan-dard. In the H.264 standard, deblocking filtering is applied in theso-called “in-loop” manner, so that the filtered value for a pixelcan be used for filtering the following pixels. Let representthe reconstructed value for pixel before filtering, and the

4In an earlier work [8], we have assumed D = c(1� � )D . Wehave since found the current relation is slightly more accurate

reconstructed value after filtering. Mathematically, we can de-scribe the deblocking operation by5

(32)

The index represents the th neighboring pixel used for thefiltering pixel . The set includes neighboring pixelsthat have been filtered, and the set includes thosethat have not been filtered. Different from the intra and inter-predictions discussed earlier, one of the is pixel itself,and it is in the set . The filter coefficients are lo-cation and content dependent, satisfying . Typically,stronger filtering is applied along the block boundaries, and oversmoother regions of a frame.

In the decoder, if an MB is received, the same filtering is ap-plied to the decoded values . Let denote the reconstructedvalue for pixel before filtering, and after filtering, the fil-tered value can be expressed as

(33)

The average distortion for a received MB is

with

(34)

(35)

When deriving the above result, we have assumed that thechannel-induced errors in the past pixels are uncorrelated withthose in the future pixels. Although this assumption may notbe strictly true, the model derived based on this assumptionis quite accurate, with the final model parameters possiblyincorporating this discounted correlation.

Assuming the distortion quickly converges so that, we have

5In the INTRA-mode of H.264, the intrapredicted values are based on theencoded values before deblocking filtering. Therefore, deblocking can be con-sidered as a postfiltering operation

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 723

where is the distortion for a received MB if no deblockingfiltering is applied. Substituting (15) and (30) for for P-and I-MBs, respectively, yields

(36)

(37)

where and are the “ ” constants corresponding to I-and P-MBs, respectively. In general, their values differ becausedifferent deblocking filters are typically applied for I- andP-MBs.

For a lost MB, deblocking is typically not applied after con-cealment. Therefore, its distortion stays as (20). If deblockingis applied, it will have an effect of changing the parameter in(20) to . Hence, the average distortion has the same formas (5) and (31) but with the definition of and and pos-sibly changed to include the multiplicative effect of the de-blocking filtering. Depending on the relative magnitude ofand or can be either smaller or greater than1. Therefore, recursive deblocking filtering can either attenuateor exacerbate error propagation.

F. Summary of the Models for Different Encoder and DecoderConfigurations

To summarize, we propose three models. They all follow thesame recursion formula in (5), but with different forms for theerror propagation factor . The definitions of for Model1,Model2, and Model3 are given in (22), (23), and (31), respec-tively. Model1 and Model2 are applicable when the encodereither does not apply intraprediction or applies constrainedintraprediction, with or without deblocking filtering. Model1has two parameters and , and Model2 has three parameters

. As will be shown by the experimental results, Model2is more accurate than Model1. Model3 is applicable whenthe encoder applies nonconstrained intraprediction, with orwithout deblocking filtering. It has three parameters .When the encoder uses integer-pel MVs only and does notapply deblocking filtering, , and Model2 reducesto Model1. When the decoder uses integer-pel MVs only forerror concealment and does not apply deblocking filtering afterconcealment . More generally, the model parametersdepend on the weighting coefficients used for interpredic-tion, intraprediction, temporal concealment, and deblockingfiltering. Because the actual weighting coefficients used arelocation and content dependent, it is not easy to determinethem directly. But they can be obtained by fitting the distortionmodel to measured channel distortion values under differentloss rates when a video is coded using different intrarates.This will be discussed further in Section IV-B.

G. Model Simplification

The previous analysis assumes that and vary fromframe to frame and can be measured accurately. A simplifiedmodel results if we assume that these values stay fairly constant

and their average values can be predetermined. Let anddenote the average concealment distortion and intrarate in all

of the P-frames, then (5) becomes

(38)

with

(39)

(40)

(41)

Assume frame 0 is coded in the I-mode and has a channeldistortion of . Applying (38) recursively yields

(42)

The average distortion over a GoP consisting of 1 I-frame andP-frames is

(43)

Because in general, (42) tells us that is generallyan exponentially increasing function of and converges to

(44)

Without intraprediction or with constrained intraprediction,in the special case of integer-pel MVs and no deblocking fil-tering , the convergedvalue is

(45)

In this case, is inversely related to . For small, in-creases with approximately linearly. Because the values of

are close to 1 even with sub-pel MVs and deblocking fil-tering, the above relation will stay approximately true. The con-verged form (44) and its special case in (45) are very useful foranalytical studies involving channel distortion.

724 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Fig. 1. Average channel distortion over all P-frames versus packet loss rates for different average intrarates. Constrained intraprediction.

IV. VERIFICATION OF THE MODEL

A. Simulation Setup

The H.264 reference software encoder (JM9.6) with differentencoding options in the Baseline profile was used to generate thetest sequences. To verify Model1 and Model2, we used “con-strained intraprediction” option. To verify Model3, we used the“nonconstrained intraprediction” option. We used one slice perframe so that a lost slice leads to a lost frame. The encoder usedup to quarter-pel accuracy for motion estimation, with a motionsearch range of 16 pels. We also limited the prediction referenceframe to only the previous frame. Four encoding configurationswere examined: constrained intraprediction without deblockingfiltering, constrained intraprediction with deblocking filtering,nonconstrained intraprediction without deblocking filtering, andnonconstrained intraprediction with deblocking filtering.

Two types of error concealment methods were examined:frame copy, which simply replaces a missing frame by the pre-viously reconstructed frame; and motion copy, which copies themotion vector data for the previously reconstructed frame to thecurrent missing frame, and then applies motion-compensatedprediction to reconstruct the missing frame. (more details willbe given in Section IV-D). Using either concealment method,we determine the concealment distortion for each P-frame bysetting that frame alone as lost.

Four common QCIF sequences were examined: two high mo-tion sequences “Football” and “Stefan,” one medium motion se-quence “Foreman,” and one low motion sequence “News.” The

actual encoding frame rates varied among the different experi-ments and are described later. For each sequence, only the first60 frames were coded, where the first frame was coded as anI-frame, while the remaining 59 frames were coded as P-framesusing different forced intrarates. A constant is used inall experiments. The P-frame data were subject to random framelosses at rates of 1%, 5%, 10%, and 15%. For a given target lossrate and a forced intrarate, 200 to 500 loss traces are generated.The channel distortion for each frame is determined by averagingthe distortion (in terms of MSE) resulting from all loss traces.

B. Model Parameter Estimation

By definition, parameters and (14) depend on the dis-tribution of interpolation filters used for sub-pel motion com-pensation which, in turn, depend on the distribution of the res-olutions (integer-pel versus half-pel versus quarter-pel) of themotion vectors used for temporal prediction in INTER mode.Similarly, parameter (21) depends on the distribution of theresolutions of the motion vectors used for temporal conceal-ment. Finally, parameter (27)–(29) depends on the distribu-tion of different INTRA predictors, and the percentage of neigh-boring pels used for intraprediction that are coded in the INTERmode. All of these parameters also depend on the distribution ofthe deblocking filters applied, and the correlation between thechannel-induced errors in adjacent pixels. Deriving the modelparameters based on the aforementioned statistics is, however,quite difficult. Instead, we choose to use the least squares fittingtechnique to derive the model parameters based on the trainingdata.

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 725

Fig. 2. Distortion versus frames for selected cases. Constrained intraprediction, no deblocking filtering. B indicates the average intrarate, and P the average lossrate.

Specifically, we apply least square fitting to the recursion for-mula (5) with defined for different cases, using the data ob-tained at different loss rates and intrarates. For example, forModel2, given in (5) and (23), the parameters are deter-mined by minimizing

(46)

where the sum over covers all of the tested pairs ofloss rate and forced intrarate, and the sum over considers allP-frames in the decoded sequence obtained with a particularpair of and . The terms denote themeasured channel distortion, concealment distortion, intrarate,and loss rate for frame obtained with this pair of and

. The parameter estimation for other models are carried outsimilarly.

C. Results With the Frame-Copy Error Concealment Method

In the first set of experiments, the decoder conceals a lostframe by copying from the previous reconstructed frame. Theconcealment distortion is simply the mean squared dif-ference between two encoded frames and can be easily mea-sured. Recall that with frame copy, the parameter is ac-cording to our analysis. So we set and obtain the other

model parameters through least squares fitting as described inSection IV-B. For this experiment, we only tested two sequences“Foreman” and “Football,” encoded at 15 and 30 fps. The forcedintrarates included 3/99, 9/99, and 33/99. For a given target lossrate and a forced intrarate, 500 simulations were run.

We first examine the results for the case with constrained in-traprediction using Model1 and Model2. Fig. 1 shows the av-erage channel-induced distortion over all P-frames versus thepacket loss rate. The top two figures are obtained without de-blocking filtering, and the bottom two are with deblocking fil-tering. The model parameters derived in each case are indicatedin the figure legend. We see that “Model2” fits the experimentaldata very well over the large range of loss rates and intraratesexamined (less accurate at high loss rates). “Model1,” with oneless parameter, is less accurate for “Foreman” (but almost asgood for “Football”), but still provides a quite good approxi-mation. “Model1” and “Model2” are obtained by using the ac-tual . “Model2-avg-DECP” is computed by using theaverage concealment distortion over all frames, which is almostas accurate as “Model2.” Therefore, the model can estimate theaverage channel distortion accurately even if we only know theaverage concealment distortion.

On each figure, different sets of curves correspond to theresults obtained using different forced intrarates, and thevalues indicated on the figure are the corresponding averageintrarates over all P-frames. Notice that with the same forced

726 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Fig. 3. Average channel distortion over all P-frames versus packet-loss rates for different average intrarates. Nonconstrained intraprediction.

intrarate, the actual intrarate for “Football” is much higher thanfor “Foreman,” because “Football” has a complicated motionand interprediction is not effective for many blocks.

Comparing the results obtained with and without deblockingfiltering, we see that the distortion curves and the model parame-ters are very close in these two cases for the same video sequence.This reveals that the multiplicative factor due to deblocking fil-tering and are fairly close to 1. As shown in our modelderivation, deblocking may either increase or decrease the var-ious model parameters due to the recursive nature of the filteringoperation. Therefore, recursive deblocking filtering does Notnecessarily further suppress the error propagation.

The previous figure shows that the estimated average distor-tion over all P-frames in a GOP matches quite well with themeasured average distortion. Fig. 2 shows the distortion versusframe number for a few selected pairs of forced intrarate andpacket-loss rates. We see that the estimated distortion matcheswith the actual frame-level distortion quite well.6 To show thedependency between the channel distortion in a frame with theconcealment distortion for that frame, we also show the con-cealment distortion (scaled down by the packet-loss rate forthat frame) versus the frame number curve as well. Recall thatif there was no error propagation, the channel distortion for aframe would be equal to the concealment distortion multipliedby the packet-loss rate. The difference between the two curves

6Only the results with Model2 are shown. The curves corresponding toModel1 follow the same trend, but are slightly less accurate.

shows the effect of error propagation due to interprediction. Bycomparing the two curves, we can also see the influence of theconcealment distortion on the total channel distortion.

On the same figure, we also show the estimated distortionbased on the average concealment distortion and average in-trarate over all P-frames. As expected, the result does not matchwell with the actual distortion. But as shown in Fig. 1, the av-erage channel distortion over all P-frames based on the modeldoes match well with the measured average distortion. This tellus that if the goal is to estimate the channel distortion in eachframe accurately, one needs to estimate or measure the conceal-ment distortion for that frame accurately. But if one is only con-cerned about the average channel distortion over a video seg-ment, then having an accurate estimate for the average conceal-ment distortion for this segment is sufficient.

Next, we examine results obtained with nonconstrained in-traprediction using Model3. Fig. 3 shows that Model3 fits theexperimental data quite well, both with the actual concealmentdistortion in each frame and with their average value. Com-paring Figs. 1 and 3, we see that under the same intrarate andpacket-loss rate, the channel distortion is much higher (espe-cially for the “Football” sequence) when nonconstrained in-traprediction is used. Therefore, for error resilience purposes,constrained intraprediction is much preferred. As with Fig. 1,the distortion curves and the model parameters are very closewith and without deblocking filtering. This again confirms thatdeblocking filtering essentially modifies the model parameters

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 727

Fig. 4. Average channel distortion over all P-frames versus average loss rates, for different intrarates. Constrained prediction with deblocking filtering. Motioncopy error concealment.

slightly. For this set of results, the least squares method did notgive the best fit and, for some cases, the derived parameter ex-ceeds 1. We have manually adjusted the parameters slightly toobtain a better fit with the experimental data.

Note that for “Foreman,” the experimental curves corre-sponding to the first two forced intrarates are very close to eachother. This is because with nonconstrained intraprediction,INTRA-coded MBs do not always stop error propagation,unless all of the blocks are coded as INTRA. Therefore, in-creasing the intrarate does not necessarily decreases channeldistortion. In fact, the error propagation factor , as defined in(31), is a nonmonotonic function of the intrarate, and reachesits maximum at some intermediate intrarate. We will showfigures relating the channel distortion to the intrarate for thenext set of experiments.

D. Results With the Motion Copy Error Concealment Method

In the second set of experiments, the decoder conceals alost frame by using the motion information of the previouslyreconstructed frame. Specifically, each MB in the lost frame isassigned the same mode and motion vectors of its co-locatedMB in the previous decoded frame. If the co-located MB iscoded in an INTRA mode, then the current MB is assigned theSKIP mode, and the motion vector for this MB is estimatedfrom the MVs of its surrounding previously decoded MBs. Thedecoder then invokes the conventional motion compensationoperation to recover each MB [9]. For this experiment, wetested four sequences: “News,” “Foreman,” and “Stefan,” en-

coded at 10 fps, and “Football,” encoded at 30 fps, respectively.The forced intrarates included 0, 11/99, 22/99, and 33/99. Fora given target loss rate and a forced intrarate, 200 simulationswere run. For each sequence, we derive the model parameters

or through least squares fitting as described inSection IV-B. For this experiment, we examined only the fol-lowing two cases: constrained intraprediction with deblocking,and nonconstrained intraprediction with deblocking.

Fig. 4 shows the results with constrained intraprediction forall test sequences. To reduce the clutter of the curves, for eachintrarate, we only show the experimental data and the mod-eled result using Model2. The difference between Model2 andModel2:avg-DECP, and that between Model2 and Model1, isvery similar to the trend shown in Fig. 1. We see the mod-eling results are very accurate for all four sequences, at differentpacket-loss rates and intrarates, as has been shown in the pre-vious experiment (Fig. 1). We see that with motion copy, theparameter is in the range of 0.65–0.95.

To examine the influence of the intrarate on the channel dis-tortion more closely, Fig. 5 shows the same set of results, butplotting the channel distortion versus the intrarate, for differentpacket-loss rates. We see that the model was able to accuratelytrack the variation of the channel distortion with the intrarate.The channel distortion basically decreases with the intrarate inthe form of , as suggested by (45). We also see that in-creasing the forced intrarate from 0 to about 11% greatly re-duces the channel distortion, but further increasing the intraratedoes not bring in as many significant gains.

728 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Fig. 5. Average channel distortion over all P-frames versus average intrarates for different loss rates. Constrained intraprediction with deblocking filtering. Motioncopy error concealment.

We next show the results with nonconstrained intraprediction.Fig. 6 presents the curves of the distortion versus packet-lossrate. Fig. 7 presents the curves of the distortion versus the in-trarate. For this case, the experimental data obtained for sometwo adjacent intrarates are very close to each other (the middletwo intrarates for “Foreman” and “News,” the first two intraratesfor “Football”) and Model3 was not able to reproduce this non-monotonic relation with the intrarate accurately. But overall, themodel is still fairly accurate over the range of the intrarates andpacket-loss rates tested.

Recall that with Model3, we model the channel distortion inreceived INTRA-coded MBs using (29). By assuming the factor

is proportional to , we further approximate(29) with (30). This approximation leads to defined in (31).Although the so-derived reflects the general nonmonotonicrelation of the distortion with the intrarate, it fails to capture theexact relation. We believe that is probably relatedto in a more complicated form than assumed, and a more ac-curate model will require more parameters.

E. Summary of Model Parameters

Table I summarizes the model parameters for different se-quences under different encoder and decoder configurations. Itcan be seen that the parameter for Model1 (for constrained in-traprediction) and that for Model3 (for nonconstrained intrapre-diction) falls in the range of 0.86–0.99. With Model2 (for con-strained intraprediction), the sum of and should be

in the same range as in Model1. For nonconstrained intrapre-diction, the parameter is around 0.7 for all sequences exceptfor “Foreman.” With motion copy, is a good choice,except for “Stefan,” which requires a much higher value. Thissequence also requires a higher value.

In practical applications, it may not be feasible to determinethe model parameters based on training data. We had hopedsequences with similar motion content would have similar pa-rameters. However, although “Football” and “Stefan” are mostsimilar in terms of motion content among our test sequences,they have quite different model parameters. Recall that param-eters and depend on the distribution of the motion vectorresolutions, and parameter depends on the distribution ofthe intrapredictor. It is possible that these two high-motion se-quences have very different distributions in this regard. Ideally,one should categorize video sequences based on these statisticsand predetermine the parameters for each category.

V. CHANNEL DISTORTION MODEL FOR SINGLE FRAME LOSS

The distortion model described so far assumes that packet lossoccurs randomly with a loss rate of . We now extend it to con-sider the case when only a single frame in a GoP is lost. As-suming one can determine the concealment distortion for anyframe if this frame alone is lost, we would like to model thechannel distortion in all subsequent frames in the GoP due tothe prediction loop in the encoder and decoder. This model al-lows one to determine the average distortion over the GoP due

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 729

Fig. 6. Average channel distortion over all P-frames versus average loss rates, for different intrarates. Nonconstrained intraprediction with deblocking filtering.Motion error concealment.

to the loss of any one frame in the GoP. Such a measure isuseful, for example, when unequal error protection is being con-sidered over different frames. Stronger error protection shouldbe applied to the frames whose loss will lead to higher averagedistortion.

Assume frame is lost and concealed with a distortion. Due to the use of interprediction, the channel distor-

tion in frame will be propagated to the following frames,even if they are all received. Denote as the distortion inframe due to the loss of frame . This distortioncan be determined by using (5) with . This yields

(47)

where can be determined using (22) or (23) if constrainedintraprediction is used, or (31), if nonconstrained intrapredictionis used, with . Specifically, we have

(48)

(49)

(50)

When the intrarates of different frames are fairly constant andcan be approximated by an average intrarate , the above recur-sion is reduced to

(51)

with

(52)

(53)

(54)

Equation (51) tells us that the channel distortion in the subse-quent frames decays exponentially. The average distortion overthe GoP is

(55)

Note that reveals the impact of frame on the av-erage video quality if this frame alone is lost. Equation (55) thusprovides a simple way to determine the importance of frame .

To verify the proposed model for single-frame loss, weconducted another set of experiments, with the test sequences“Foreman” (10 fps) and “Football” (30 fps). We only verifiedthe results when the encoder applies constrained intrapredic-tion. For each sequence, we coded the first 60 frames (one

730 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

Fig. 7. Average channel distortion over all P-frames versus average intrarates, for different loss rates. Nonconstrained intraprediction with deblocking filtering.Motion copy error concealment.

TABLE ISUMMARY OF MODEL PARAMETERS FOR DIFFERENT SEQUENCES

GoP) with constrained intraprediction and deblocking filteringturned on. We then set each of the 59 P-frames to be lost and

decode the entire GoP with the motion-copy error concealmentmethod. For each chosen index of the lost frame, we mea-sured the channel distortion of all decoded frames . Wealso recorded the average distortion over the GoP . Toapply our model, we use the measured data forto determine for Model1 or and for Model2 using theleast squares method. The so-determined parameters yielded aslightly better fit with the experimental data than the parametersderived previously for random packet-loss data.

Fig. 8 compares the experimental data and the modeled re-sults. The top two figures show the distortions of all subsequentframes when the first P-frame is lost. Model1 refers to (47) and(48), and Model2 refers to (47) and (49). For this experiment,we did not force any MBs to be coded in the INTRA mode. For“Foreman,” there are very few MBs coded in the INTRA modeso that the actual intrarate is very close to zero in all frames. Thiscaused a problem in finding the correct parameters for Model2with the least squares method. So we only show the result withModel1. For this sequence, the parameter is very close to 1,and so is the factor . This is why the channel distortion de-creases quite slowly. For “Football,” the parameter for Model1(or for Model2) is smaller and we observe the expectedexponentially decay behavior. The modeled results with eithermodels follow the measured data quite closely, but consistentlyunderestimated the data. We believe this is because the simpleleast squares method does not yield the best possible parameters.When we manually increased the parameters slightly, better fit-ting was obtained.

WANG et al.: MODELING OF TRANSMISSION-LOSS-INDUCED DISTORTION IN DECODED VIDEO 731

Fig. 8. Channel distortion when a single frame is lost. Constrained intraprediction with deblocking filtering. Motion copy error concealment.Forced intra rate = 0.

The bottom two figures in Fig. 8 compare the measured av-erage distortion over 60 frames derived based on our model forall possible loss-frame indices. We see that the model yieldedvery accurate estimates for most cases. On the same figure, wealso plot the data. From (55), the average distortiondepends both on the concealment distortion of the lost frame

and the location of that frame . When all frameshave the same concealment distortion, a loss that occurs earlyin a GoP affects more frames in the GOP and, hence, leads toa larger average distortion. In general, is not a con-stant, so that the average distortion does not alwaysdecrease with . In fact, as shown in the figure, the loss of somelater frames with large concealment distortion can lead to higheraverage distortion.

VI. CONCLUSION

In this paper, we proposed a mathematical model for the av-erage channel distortion at frame level. The model takes intoaccount the two new features of the H.264 video-coding stan-dard, namely intraprediction and in-loop deblocking filtering.The proposed model represents the current-frame distortion asthe sum of the current-frame concealment distortion multipliedby the packet-loss rate, and the previous-frame distortion mul-tiplied by an error propagation factor. This factor depends onthe packet-loss rate and the intrarate. Three variations of themodel with different error propagation factors were introduced:

Model1 and Model2 are for the case when the encoder eitherdoes not apply intraprediction or applies only to constrained in-traprediction. Model2 is more accurate but requires one extraparameter. Model3 is for the case when the encoder employsnonconstrained intraprediction. We showed that deblocking fil-tering does not change the functional form of the error prop-agation factor, rather it merely changes the model parametersslightly. We also discussed how to derive the model parametersbased on simulation data. Comparison of the measured channeldistortion data from simulations and modeled results showedthat Model2 is very accurate over a large range of packet-lossrate (1% to 15%) and intrarate (0 to 33%). Model1 and Model3are slightly less accurate, but still quite good.

One potential application of the proposed model is in therate-distortion-optimized configuration of source and channelcoders. Given the total channel bandwidth, the proposedchannel distortion model, together with appropriate modelsfor source rate and distortion that relate to the source rateand encoder distortion with the intrarate and quantizationparameter, enables one to determine the optimal intrarate,quantization parameter, and the channel code rate that willminimize the total distortion.

We also showed how to adapt the proposed distortion modelfor random packet losses to consider single frame loss. A com-parison with simulation data showed that the model for thisscenario is also very accurate. In applications where unequal

732 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 6, JUNE 2006

error protection may be provided to different encoded frames,the most challenging problem is how to rank the frames basedon the impact of the loss of a single frame on the overall videoquality. The proposed channel distortion model for the singleframe loss enables one to easily predict the channel distortionin subsequent received frames once a single frame is lost and,consequently, determine the average distortion over a group offrames when any frame alone in this segment is lost.

To apply either model, one must choose appropriate modelparameters. This is the major obstacle for the application of theproposed models. By definition, the model parameters dependon the distribution of motion vector resolutions (integer versushalf versus quarter-pel) and the distribution of the intrapredic-tors. However, categorizing video sequences based on these sta-tistics is not an easy task. We have found that video sequenceswith high motion can have quite different parameters, probablybecause the distributions of their motion vector resolutions arequite different. We also found that the least squares method ap-plied to the recursion equation directly does not always yieldvery good parameters. How to determine the model parametersmore accurately and efficiently (without actually running manychannel simulations for the underlying sequence) remains anopen question.

REFERENCES

[1] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with op-timal inter/intra-mode switching for packet loss resilience,” IEEE J.Sel. Areas Commun., vol. 18, no. 6, pp. 966–976, Jun. 2000.

[2] H. Yang and K. Rose, “Recursive end-to-end distortion estimation withmodel-based cross-correlation approximation,” in Proc. IEEE Conf.Image Processing, Barcelona, Spain, 2003, vol. 3, pp. 469–472.

[3] G. Cote, S. Shirani, and F. Kossentini, “Optimal mode selection andsynchronization for robust video communications over error-prone net-works,” IEEE J. Sel. Areas Commun., vol. 18, no. 6, pp. 952–965, Jun.2000.

[4] S. Ekmekci and T. Sikora, “Recursive decoder distortion estimationbased on AR(1) source modeling for video,” in Proc. IEEE Conf. ImageProcessing, Singapore, 2004, vol. 1, pp. 187–190.

[5] K. Stuhlmueller, N. Faerber, M. Link, and B. Girod, “Analysis of videotransmission over lossy channels,” IEEE J. Sel. Areas Commun., vol.18, no. 6, pp. 1012–1032, Jun. 2000.

[6] Z. He, H. Cai, and C. W. Chang, “Joint source channel rate-distortionanalysis for adaptive mode selection and rate control in wireless videocoding,” IEEE Trans. Circuits. Syst. Video Technol., vol. 12, no. 6, pp.511–523, Jun. 2002.

[7] J. Ostermann, “Video coding with H.264/AVC: Tools, performance,and complexity,” IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, FirstQuarter 2004.

[8] Y. Wang, Z. Wu, J. Boyce, and X. Lu, “Modeling of distortion causedby packets in video transport,” in Proc. IEEE Int. Conf. Multimedia andExpo, Amsterdam, The Netherlands, Jul. 2005, pp. 1206–1209.

[9] S. Bandyopadhyay, Z. Wu, P. Pandit, and J. Boyce, “Frame loss errorconcealment for H.264/AVC,” in Proc. ISO/IEC MPEG and ITU-TVCEG, JVT-P072, Poznan, Poland, Jul. 2005.

Yao Wang (F’04) received the B.S. and M.S. degreesin electronic engineering from Tsinghua University,Beijing, China, in 1983 and 1985, respectively,and the Ph.D. degree in electrical and computerengineering from University of California, SantaBarbara, in 1990.

Since 1990, she has been with the Electricaland Computer Engineering faculty of PolytechnicUniversity, Brooklyn, NY. She is the leading authorof a textbook Video Processing and Communications(Prentice-Hall, 2002).

Dr. Wang received the New York City Mayor’s Award for Excellence inScience and Technology in the Young Investigator Category in 2000. She waselected Fellow of the IEEE for contributions to video processing and communi-cations. She was a co-winner of the IEEE Communications Society Leonard G.Abraham Prize Paper Award in the field of communications systems in 2004.

Zhenyu Wu (S’99–M’05) received the B.S. degreein telecommunication engineering and the M.S.degree in signal and information processing engi-neering from Shanghai University, Shanghai, China,in 1996 and 1998, respectively, and the Ph.D. degreein electrical engineering from the University ofArizona, Tucson, in 2004.

Currently, he is a Technical Staff Member atThomson Inc. Corporate Research, Princeton, NJ.His research interests include multimedia commu-nications, source and channel coding, and signal

processing.

Jill M. Boyce (S’86–M’92) received the B.S. degreein electrical engineering from the University ofKansas, Lawrence, in 1988, and the M.S.E. degreein electrical engineering from Princeton University,Princeton, NJ, in 1990.

Currently, she is Manager of Mobility Sys-tems Research at Thomson Corporate Research,Princeton, NJ. Her research specialties encompass awide range of video compression and transmissiontopics, including preprocessing, encoding, decoding,postprocessing, network streaming, and error con-

siderations. Her current primary research focus is enabling video services tomobile devices over wireless networks. She was previously with Lucent Tech-nologies Bell Laboratories, Holmdel, NJ; and AT&T Labs, Holmdel; and HitachiAmerica, Princeton, NJ. She has actively participated in several internationalstandardization bodies, including ITU-T, MPEG, ATSC, and SMPTE. She wasan active contributor to the Joint Video Team (JVT) standardization of ITU-TH.264/MPEG-4 AVC, and had several contributions adopted into the standard.She also chaired JVT breakout groups on B-frames and Weighted Predictiontopics and participated in the editing of the H.264/AVC standard. She gave aninvited tutorial on H.264/AVC at the International Conference on ConsumerElectronics (ICCE) 2005. She has also been an active contributor to the MPEGScalable Video Coding (SVC) standardization effort currently under develop-ment by the JVT. She currently holds 46 granted U.S. patents, with many morepending. She has authored more than 30 conference and journal papers and bookchapters, and has made over 30 contributions to international standards bodies.

Ms. Boyce is a member of the IEEE Circuits and Systems Society MultimediaApplications Technical Committee. She was the Video Networking Track Chairand co-organized a tutorial on IPTV for ICCE 2006. Currentky, she is an As-sociate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO

TECHNOLOGY.