maximum-likelihood header estimation: a cross-layer methodology for wireless multimedia

3946 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 6, NO. 11, NOVEMBER 2007

Maximum-Likelihood Header Estimation:A Cross-Layer Methodology for

Wireless MultimediaSyed Ali Khayam, Member, IEEE, and Hayder Radha, Member, IEEE

Abstract— We propose a novel cross-layer header estimationmethodology that can be used by UDP-based wireless multime-dia applications to estimate corrupted packet headers, therebyrealizing significant throughput improvements. The proposedmethodology requires only minor modifications to the protocolstack at the receiver while no modifications are needed to sendersor intermediate nodes. We formulate header estimation as aproblem of maximum-likelihood estimation of known parametersin noise. We derive likelihood functions for two wireless channelmodels, namely Markov and multifractal wavelet models. Ourtrace-driven video simulations at 2, 5.5 and 11 Mbps data ratesof an 802.11b LAN demonstrate that significant improvementsover normal UDP and UDP Lite can be achieved by employingheader estimation with UDP.

Index Terms— Multimedia communications, crosslayer design,wireless multimedia, 802.11 LANs.

I. INTRODUCTION

W IRELESS channels incur unpredictable and time-varying packet losses due to channel interference

and node mobility. This data loss is particularly detrimentalfor real-time communications since their delay constraintsgenerally do not allow retransmission-based recovery of lostpackets. Consequently, recent multimedia standards have in-troduced enhanced error resilience and concealment features(e.g., slices in JVT/H.264 [1] and reversible VLC in MPEG-4[2]) to cater for bandwidth-constrained and error-prone wire-less channels. Distortion in multimedia quality at a wirelessreceiver can be substantially decreased if corrupted packets,instead of being dropped, are relayed to the multimediaapplication.

To improve packet throughput at a wireless receiver, en-hanced robustness is provided at the physical layer of emerg-ing wireless protocol stacks. Nevertheless, residual errors [3]not corrected by the physical layer cause checksum failures athigher (MAC and transport) layers, leading to a significantnumber of packet drops. To address this problem, a UDPLite protocol was proposed in [4]. UDP Lite provides partialprotection by a transport layer checksum that only covers a

Manuscript received July 12, 2005; revised September 4, 2006; acceptedJune 3, 2007. The associate editor coordinating the review of this paper andapproving it for publication was Q. Zhang.

S. A. Khayam is with the NUST Institute of Information Technology(NIIT), National University of Sciences & Technology (NUST), Rawalpindi,46000 Pakistan (e-mail: [email protected]).

H. Radha is with the Department of Electrical & Computer Engineer-ing, Michigan State University (MSU), East Lansing, 48824 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TWC.2007.05532.

packet’s headers, while no checksum is provided for the datapayload. Moreover, MAC layer checksum is disabled so thatcorrupted packets can be passed to higher layers. UDP Liteverifies the partial checksum and drops a packet only if ithas errors in the transport or the application layer headers. Insummary, UDP Lite based transport schemes ignore errors inthe application layer payload, but drop all packets that haveone or more bit-errors in the IP, the UDP, or the applicationlayer headers.

It has been shown that UDP Lite based partial protec-tion with application layer forward error correction (FEC)improves wireless bandwidth utilization [4]–[10]. Supportof partial protection necessitates changes to the standardprotocols at the multimedia transmitter and/or intermediatenetwork nodes. In many realistic scenarios, modificationsto multimedia servers and/or intermediate nodes cannot bedictated by the end-receivers. We argue that the requirementof transmitter modifications in UDP Lite has hampered itswide-spread deployment. Furthermore, frequent header errorsresult in significant packet drops for UDP Lite, especially athigh data rates1.

In this paper, we remedy the above shortcomings by propos-ing a cross-layer header estimation methodology that onlyrequires an accurate MAC layer bit-error channel model toestimate the corrupted critical header fields (CHF) of a packetwhile non-critical header fields are simply ignored. At aheader estimation-based UDP multimedia receiver, the mostlikely transmitted CHF are estimated solely through channelparameters. The proposed scheme requires no modifications tothe standard protocols at senders and/or intermediate nodes.Only minor protocol stack modifications are needed at thereceiver. We map header estimation to a problem of maximum-likelihood (ML) estimation of known parameters in noise [11].We derive likelihood functions for two important classes ofwireless MAC layer channels – Markov [12] and multifractalwavelet [13] wireless channels. Trace-driven video simulationsat varying data rates of an 802.11b LAN show that theproposed scheme provides significantly better throughput andmultimedia quality than normal UDP and UDP Lite.

The rest of this paper is structured as follows. Section II pro-vides background on Markov and multifractal channel modelsfor wireless channels. Section III details the header estimation

1In [7] the authors showed that under realistic settings of an 802.11bnetwork, packets dropped by a UDP Lite based protocol stack are 5.87%and 36.7% at 5.5 and 11 Mbps, respectively.

1536-1276/07$25.00 c© 2007 IEEE

KHAYAM and RADHA: MAXIMUM-LIKELIHOOD HEADER ESTIMATION: A CROSS-LAYER METHODOLOGY FOR WIRELESS MULTIMEDIA 3947

methodology. Section IV derives header estimation likelihoodfunctions for the channel models under consideration. SectionV compares the performance of ML header estimation withUDP and UDP-Lite. Section VI summarizes key conclusionsof this paper.

II. BACKGROUND

A. Full-State Markov Chains

Wireless bit-error traces are generally represented as abinary time-series {x [i]}l

i=1, where x [i] ∈ {0, 1}, zerorepresents an error-free bit, l is the length of the time-series,and i represents the bit time index. For a memory-length of kbits, states of a full-state Markov (FSM) chain [15] correspondto all the 2k possible combinations of k consecutive bits.(Memory-length of a Markov chain is also referred to as itsorder.) Transition probabilities between states are computedby sliding a k bit window over the data and counting thenumber of times a bit-pattern [x1, x2, . . . , xk] is followed byanother bit-pattern [y1, y2, . . . , yk]. Due to the binary nature ofthe FSM channel, in one transition an FSM chain in state i cantransit to either state (2i) mod 2k or state (2i + 1) mod 2k.The authors’ prior work [15] showed that FSM chains oforder-9 (512 states) and order-10 (1024 states) are required tocharacterize 802.11b bit-errors at 5.5 and 2 Mbps, respectively.

B. Multifractal Wavelet Model

A multifractal wavelet model (MWM) to analyze and modellong-range dependent (LRD) network data was proposed in[13]. The MWM relies on the premise that network data areinherently non-negative and generally spiky. To capture thesecharacteristics, the MWM employs the Haar wavelet familyand applies a constraint that the input training data are alwaysnon-negative. For the Haar wavelet, the scaling and waveletcoefficients are computed recursively as

Uj+1,2k = (Uj,k + Wj,k)/√

2 andUj+1,2k+1 = (Uj,k − Wj,k)

/√2,

(1)

where Uj,k and Wj,k respectively represent the scaling andwavelet coefficients at time k and scale/level j. With the Haarscaling function, the scaling coefficients are simply averagedversions of the input signal and thus, due to the non-negativenature of the training data, the scaling coefficients are alwaysnon-negative, Uj,k ≥ 0. In the first equation of (1), to keepthe next level’s scaling coefficients (Uj+1,2k’s) non-negative,the wavelet coefficients are constrained as Wj,k ≥ −Uj,k.Similarly, to keep the Uj+1,2k+1’s non-negative, the waveletcoefficients are constrained as Wj,k ≤ Uj,k. Combining thesetwo constraints, gives a non-negativity constraint as

|Wj,k| ≤ Uj,k. (2)

The above constraint simply ensures that once the inversetransform is taken, the resultant process is always non-negative. Alternatively, the constraint can be implemented as

Wj,k = Aj,kUj,k, (3)

where Aj,k is a random variable defined over the interval[−1, 1].

In order to train the MWM to match the wireless bit-errortraces, two random variables need to be characterized. Thefirst random variable is the scaling coefficient at the coarsestscale Uj0,k0 . The second set of random variables comprises theAj,k’s at each level which in turn yield the wavelet coefficientsWj,k at that level. Once a general sense of probability distri-bution is ascertained for these random variables, expectation-maximization is used to fit that distribution to the actualdataset. The training and synthesis algorithm is detailed in[13]. The complexity of synthesizing a length N trace usingthe MWM is O (N). In [15], it was shown that 802.11b 11Mbps bit-errors can be modeled using an MWM.

III. THE HEADER ESTIMATION METHODOLOGY

In this paper, we focus on 802.11b wireless channels.Header estimation only estimates the so-called critical headerfields (CHF) [16] that can uniquely identify a UDP multimediasession at a receiver and are not liable to change during thecourse of the multimedia session. In our experiments, we treatthe following as CHF: (i) destination MAC address, (ii) sourceIP address, (iii) destination IP address, (iv) source port, and(v) destination port. Nevertheless, all mathematical treatmentis provided for a general case of N critical fields. Also, theheader estimation scheme presented here is different from [16]since it does not require any a priori information.

Under the proposed methodology, a list of active CHF(i.e., CHF of sessions that are currently being received) isprovided to a header estimation module by the multimediaapplication(s). On receiving the first error-free packet of a newsession, the multimedia application adds the new session’sCHF information to the list of active multimedia sessions.Whenever a corrupted packet is received, a likelihood scoreof its critical fields is computed with respect to each entryof the CHF list. The CHF rendering the highest likelihoodare chosen as the estimated CHF of the received (corrupted)packet.

At this point, let us reiterate that the main objective ofheader estimation is to pass maximum number of (error-free and corrupted) packets to the application layer usingonly parameters of a MAC layer bit-error channel model. Wedefer discussion on how an application can make use of thecorrupted packets to subsequent sections.

A. Functionality at and below a Receiver’s MAC layer

Fig. 1 outlines the interactions between the proposed headerestimation module and different layers of a wireless receiver’sprotocol stack. The packets after wireless physical layerprocessing are passed to the MAC layer which verifies thepacket’s checksum to determine if the received packet haserrors. Instead of dropping a corrupted packet, the packetand its checksum information (i.e., packet passed/failed thechecksum) is passed to a module that checks the transporttype, the destination MAC address, and the destination IPaddress of the received packet. Header estimation is invokedonly for UDP packets, while TCP and network layer traffic arehandled by the conventional protocol stack. Furthermore, theMAC layer does not attempt retransmission-based recovery ofcorrupt UDP packets, i.e., ACKs are sent even for corrupt UDP


pktWirelessChannel

Pkt after network and transport layer processing

Updated channel model parameters

Corrupt UDP pktwith estimated CHF

Corrupt UDP pktwhich has eitherdst MAC or dstIP address oflocal receiver

Error-free pkt

Received pkt afterphysical layer

processing

Physical Layer

MAC Layer withoutUDP Pkt Drops Header Estimation

Module

Network andTransport Layers

Network and TransportLayers with Disabled

Checksums

Application Layer

List of active CH

F

Fig. 1. Interactions between the UDP-based header estimation module anddifferent layers of a wireless receiver’s protocol stack; modified protocolstack layers are shown in broken-line boxes and dotted lines representcommunications that are not related to packet reception.

packets. Instead of MAC retransmissions, header estimationwith application layer FEC is used to recover from errors inthe packet. Such retransmission-less recovery is well-suitedfor delay-sensitive real-time communications.

Header estimation is invoked when all of the followingconditions are satisfied: (i) a corrupt UDP packet is received,(ii) either the destination MAC or the destination IP addressmatches the local receiver’s addresses, and (iii) there are oneor more active multimedia sessions on the receiver. Threescenarios exist when a packet is received:

1) Packet is error-free: No need to perform header estima-tion.

2) Packet is corrupt and the packet is intended for the localreceiver: Header estimation is invoked and an ACK issent to the last hop network entity to avoid MAC layerretransmissions.

3) Packet is corrupt and the packet is not intended forthe local receiver: This case represents a false alarmwhen, due to channel errors, either destination MACor destination IP of a packet not intended for the localreceiver gets mapped to the MAC or IP address of areceiver. Due to the receiver-based nature of the presentscheme, false alarms cannot be detected at a receiver’sMAC layer. Thus header estimation is invoked even forfalse alarm packets, and a MAC layer ACK is sent tothe last hop network entity.

B. The Header Estimation Module

The header estimation module employs a likelihood functionto find the most likely transmitted CHF given: (i) the receivedCHF, (ii) a list of active CHF, and (iii) parameters of the MAClayer error channel model. The list of active CHF is providedby the receiver’s application layer as shown in Fig. 1. Thetransmitted/active CHF that renders the maximum value ofthe likelihood function is chosen as the estimated CHF. Thecorrupt packet and the estimated CHF are passed to higherlayers. In essence, the present header estimation problem isthe estimation-theoretic problem of maximum-likelihood (ML)

estimation of known parameters in noise [11]. Details of theestimation technique are provided in the following section.

C. Processing at a Receiver’s Network, Transport and Appli-cation Layers

The corrupted packets along with the estimated CHF arepassed by the header estimation module to the receiver’s net-work layer. The network layer performs its regular operationwith two modifications: (a) instead of the (possibly corrupted)IP addresses in the network layer header, the estimated IPaddresses are treated as the true IP addresses; (b) network layerchecksum on IP headers is disabled. At the UDP layer, sourceand destination ports are taken from the estimated CHF andthe corrupted packets are passed to the (estimated) multimediaapplication.

IV. LIKELIHOOD FUNCTIONS FOR HEADER ESTIMATION

In this section, we derive header estimation likelihoodfunctions for two important classes of MAC layer wirelesschannel models, namely the full-state Markov (FSM) modeland the multifractal wavelet model (MWM). Let Λi ={xi1, xi2, . . . , xiN} denote an ordered set of N critical headerfields for an arbitrary multimedia session i. For the simulationsin this paper we have N = 5, where xi1, xi2, xi3, xi4, andxi5 correspond to the destination MAC, source IP, destinationIP, source port, and destination port of multimedia sessioni, respectively. A receiver receives M ≥ 1 simultaneousmultimedia streams. Let Ω = {Λ1, Λ2, . . . , ΛM} denote anunordered set of CHF each corresponding to a currentlyactive multimedia sessions on a given receiver. Note that eachΛi = {xi1, xi2, . . . , xiN} ∈ Ω is in turn a set of critical fieldscorresponding to a given session, where the first subscript ofx is the session index and the second subscript is the CHFindex. Let Λ̃r denote the set of CHF of a received packet,i.e., Λ̃r = {x̃r1, x̃r2, . . . , x̃rN} is a possibly corrupted versionof an Λi ∈ Ω. Λ̂r denotes the estimated CHF.

Let X represent a stochastic MAC layer channel modelcharacterizing the bit-error channel over which a receiver isreceiving its packets. Then, for a critical header field xij (i.e.,critical field j for a multimedia session i), our objective is toderive the likelihood function Pr { x̃rj |xij , X} in terms of theparameters of X. In other words, given parameters of a channelmodel X, we want to find the likelihood that a transmittedcritical header field xij (after possible channel corruptions)was received as x̃rj . We assume that the likelihood functionsof all CHF are independent. Thus Pr { x̃rj |xij , X}’s for eachcritical field can be ascertained independently and then theoverall likelihood considering all critical fields is:

Pr{

Λ̃r

∣∣∣ Λi, X}

=N∏

j=1

Pr { x̃rj |xij , X}, (4)

where 1 ≤ i ≤ M is the session index and j is the CHFindex. Once Pr

{Λ̃r

∣∣∣Λi, X}

has been computed for all 1 ≤i ≤ M , the CHF estimate Λ̂r is simply the Λi that rendersthe maximum Pr

{Λ̃r

∣∣∣ Λi, X}

.The challenge of this ML-based header estimation lies in

the derivation of a likelihood function Pr { x̃rj |xij , X} of a


critical field, given parameters of wireless channel model. Inthe following sections, we derive likelihood functions of twoimportant channel models.

A. Header Estimation Likelihood Function for FSM Chains

Errors and losses over many contemporary wireless chan-nels have been shown to be Markov in nature (see [15] andreferences therein). These channels are modeled using full-state Markov (FSM) chains. In this section, we derive the CHFlikelihood function for a k-th order FSM chain Xn, wheren is the bit time index. We focus on one arbitrary criticalfield xij ∈ Λi by fixing the CHF index j. Henceforth xi andx̃r respectively represent the critical field j from Λi and thereceived critical field j. Let us define a new variable:

zi = x̃r ⊕ xi, (5)

where ⊕ represents a binary XOR operation. zi comprises bitlocations that are different between x̃r and xi. Assuming thatthe different bits are in fact the errors introduced by FSMchannel, Pr {x̃r |xi, Xn} is the likelihood of observing errorpattern zi on the channel.

Recall from Section II-A that an FSM chain in state vi canonly transit to two FSM states, 2vi+0 or 2vi+1; all FSM statesare mod2k. Thus, when the bit added to 2vi is zi [k + 1] (thebit at the k + 1-st location of zi), the FSM chain transitsto state vi → 2vi + zi [k + 1]. From state 2vi + zi [k + 1],the process will transit to 2 (2vi + zi [k + 1]) + zi [k + 2] =4vi +2zi [k + 1] + zi [k + 2]. Using similar logic, the processwill next transit to

2 (4vi + 2zi [k + 1] + zi [k + 2]) + zi [k + 3]= 8vi + 4zi [k + 1] + 2zi [k + 2] + zi [k + 3] .

Generalizing this notion yields the following recursive multi-plicative expression for the FSM likelihood function:

Pr {x̃r|xi, Xn} = πvi Pr {vi → (2vi + zi [k + 1])}

W−k−1∏a=2

Pr

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

(2a−1vi +

a−2∑b=0

2a−1−bzi [k + 1 + b])

↓(2avi +

a−1∑b=0

2a−1−bzi [k + 1 + b])

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

,

(6)where all state indices are mod 2k, n is the bit time index, Wrepresents the number of bits in the critical field, πx representsthe steady-state probability of being in FSM state x, andPr {x → y} is the transition probability of going from FSMstate x to state y.

The FSM likelihood function given above answers thefollowing question: What is the probability that channel errorshave changed xi to x̃r? Since zi = x̃r ⊕ xi denotes the bitpattern that would be observed if the channel changed xi to x̃r,we have to find the probability that the channel Xn producedthe bit-error pattern zi. Clearly, the FSM channel’s initial statemust be vi, where vi denotes the FSM state represented bythe first memory-window of zi, leading to the πvi term. Thisinitial state must be followed by a unique sequence of statetransitions that result in the bit-error pattern zi. To quantifythe probability that an FSM channel will follow this “uniquestate sequence”, recall that in one transition the FSM process

can only transit to two possible states. Also, due to the well-known Markov property, the probability of transiting to one ofthe two possible states is only dependent on the present state.The final likelihood score of xi is hence characterized by amultiplication of the transition probabilities of this unique statesequence, as represented by the multiplicative Pr {x → y}terms in the likelihood function.

B. Header Estimation Likelihood Function of MWM

The multifractal wavelet model (MWM) [13] captures self-similar and long-range-dependent network phenomena. As ex-plained in Section II-B, MWM uses expectation-maximizationto model two random variables: (i) the scaling coefficient at thecoarsest scale Uj0,k0 , where j0 and k0 represent the coarsestscale and time, respectively; (ii) Aj,k random variables definedover a [−1, 1] interval, j and k representing the scale and time,respectively. In [15], we showed that the 11 Mbps bit-errorshave long-range dependence which can be captured using theMWM. We used the number of bit-errors in a packet as thetraining sequence for the MWM.

It was shown in [13] that due to the use of the Haarwavelet transform, the MWM-predicted number of errors e[n]in packet n can be expressed as e[n] = 2−j/2Uj,k for n =0, 1, . . . , 2j−1. If the packets have a fixed size C, then theprobability of bit-errors in the packet received at packet timeindex n is p[n] = e[n]/C = 2−j/2Uj,k/C. Now note thateach received bit is basically a value taken from a binarytime series of length l, i.e., {x [i]}l

i=0, x [i] ∈ {0, 1}, and irepresents the discrete bit time index. Based on equation (5),∑W

a=1 zi [a] yields the total number of bits that are differentbetween x̃r and xi, i.e., the hamming distance between x̃r andxi. If the bits of zi are in fact the errors introduced by a MWMwireless channel then given a probability of having

∑Wa=1 zi[a]

errors is (p[n])�W

a=1 zi[a], and the probability of having W −∑Wa=1 zi [a] correct bits is (1 − p [n])W−�W

a=1 zi[a]. Likeli-hood of the bit pattern zi is then a multiplication of the aboveevents. Thus for a critical header field x̃r received over aMWM channel Xn, the likelihood of a known critical headerfield xi corresponding to session i is

Pr {x̃r |xi, Xn}

=(

2−j/2Uj,k

C

) W�

a=1zi[a](

1 − 2−j/2Uj,k

C

)W−W�

a=1zi[a] (7)

where n = 0, 1, . . . , 2j−1 is the packet time index, j is thenumber of scales used to train the MWM, C is the numberof bits in a packet, W is the number of bits of in the criticalfield, and Uj,k is the scaling coefficient at scale j and level k.

Similar to (6), the MWM likelihood function rendersthe probability that the bit-error pattern zi = x̃r ⊕ xi isobserved on an MWM channel. In the above derivation,we discussed that the probability of bit-error in packet nis given by 2−j/2Uj,k/C. Thus the probability of observ-

ing∑W

a=1 zi[a] bit-errors in packet n is (p[n])W�

a=1zi[a]

=(2−j/2Uj,k/C

) W�

a=1zi[a]

. Treating error-free and corrupted bitsas the two outputs of a Bernoulli random variable yields theMWM likelihood expression.


Once Pr {x̃r|xi, Xn}’s for all currently active sessions,1 ≤ i ≤ M , are computed using the FSM or MWMlikelihood functions, the session i that renders the maximumPr

{Λ̃r

∣∣∣Λi, X}

is chosen as the estimated CHF, Λ̂r. We alsointroduce a provision that if the most maximum-likelihoodis less 0.25 then the packet is dropped since the estimationconfidence is very low.

C. Complexity Reduction for Effective Header Estimation

The complexity of an MWM to generate a length l sequenceis linear. However, the complexity of FSM chains grows ex-ponentially with respect to memory-length. Such exponentialcomplexity of FSM chains requires very high computationalpower and memory at the receiver, both scarce resourcesin complexity- and power-constrained wireless environments.Thus, FSM chains in their present form are unreasonablycomplex to be employed in the header estimation frameworkon resource-constrained wireless devices.

In the Appendix, we derive a new class of folded Markovchains (FMCs) which aggregate similar FSM states to create alumped process. Using the FMC state aggregation procedure,any 2k state FSM chain can be reduced to a 2m state Markovchain, where m < k. The FSM chain likelihood function givenin (6) can be easily extended to the FMCs because FMCsretain the inherent structure of the FSM transition probabilitymatrix, i.e., each row of the FMC transition probability matrixhas only two non-zero entries. Thus, using (6) the likelihoodfunction for an order-m FMC can be rewritten as:

Pr {x̃r|xi, Xn} = πSviPr

{Svi → S2vi+zi[k+1]

}W−m−1∏

a=2Pr

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

S2a−1vi+

a−2�

b=02a−1−bzi[k+1+b]

↓S

2avi+a−1�

b=02a−1−bzi[k+1+b]

⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

,

(8)where the subscripts of all aggregate FMC states Sx aremodulo 2m and all other parameters are defined in equation(6). The low-complexity of FMCs makes them a naturalalternative to FSM chains in the present header estimationmethodology. In all subsequent performance evaluations of theheader estimation methodology, we use FMCs instead of FSMchains and show that the likelihood function rendered by theFMCs is highly accurate.

D. Practical Considerations

The number of multimedia sessions on a receiver is afunction of the multimedia application. If the number ofmultimedia sessions is small, a simple estimator – using,for example, a distance measure with a memory-less channelassumption – should be sufficient. The likelihood functionsderived in previous sections are merely exemplary in nature,detailing how different channel models can be leveraged forheader estimation. The main message of this paper is thatUDP with header estimation over wireless channels improvesapplication layer throughput.

The cost of using a channel model for ML header estimationis incurred in terms of the model’s training complexity. Both

MWM and FMC have linear complexity with respect to thetraining data. For stationary channels, this training has tobe done only once and hence does not pose a complexityissue. As a final comment, note that the header estimationmethodology does not require any hardware modifications.Thus header estimation can be easily implemented on an open-source protocol stack. Even for a closed-source protocol stack,the proposed modules can be implemented by opening a rawsocket on the link layer and implementing a simple packetprocessor above the socket.

V. PERFORMANCE EVALUATION OF THE HEADER

ESTIMATION METHODOLOGY

A. Experimental Setup

1) Wireless Trace Collection: For this study, five wirelessreceivers were used to simultaneously collect error traceson an 802.11b LAN. The receivers were placed at differentlocations in a room, while the access point (AP) was placedin a room across a hallway from the receivers to simulate arealistic home/classroom/office setting. The receivers’ MAClayer device drivers were modified to pass corrupted packetsto higher layers. To capture packets at high transmission rates,packet dissectors were implemented inside the device drivers.These packet dissectors ensured that only packets pertinent toour wireless experiment are processed, while all other packetsare simply dropped. Each experiment comprised of one millionpackets with a payload of 1, 000 bytes each, i.e., each tracehas approximately 1 GB of data.

A wired sender was used to send multicast packets witha predetermined payload on the wireless LAN; multicastingdisabled MAC layer retransmissions. The sender used differenttransmission rates ranging from 500 Kbps to 1 Mbps foreach experiment. At the physical layer, the auto rate selectionfeature of the AP was disabled and for each experiment theAP was forced to transmit at a fixed data rate. Each tracecollection experiment was repeated multiple times at 2, 5.5and 11 Mbps physical layer data rates and at different timesof day.

Table I provides some statistics of the traces collected forthis study. As expected, the average packet error rate increaseswith an increase in the physical layer data rate. (Note thatthese packet errors directly correspond to packet drops for thenormal UDP protocol stack.) In particular, the average packeterror rate increases from approximately 10% at 5.5 Mbpsto almost 40% at 11 Mbps. Thus normal UDP experiencesprofound losses at 11 Mbps, thereby presenting a potentialfor considerable improvement. Since the wireless receiverswere placed at different locations, the receivers experienceddifferent packet error rates. The minimum and maximum errorrates in Table I outline that the receivers were experiencingboth good and bad link conditions.

2) Video Simulations using Wireless Traces: For videoevaluations, we report throughput, FEC and PSNR resultsfor five multimedia receivers. Each receiver receives multiplevideo streams with a maximum of five video streams. Ateach physical layer data rate, we repeat video experimentsusing three distinct wireless trace-sets that were collected atdifferent times of day. Video experiments for each trace-set


TABLE I

STATISTICS OF ERROR TRACES USED IN THIS STUDY

Data rate Avg Pkt Err Rate (PER) Min PER Max PER

2 Mbps 5.97% 0.75% 14.31%

5.5 Mbps 9.79% 0.61% 22.74%

11 Mbps 39.5% 10.99% 77.83%

are repeated 25 times starting at different randomly selectedlocations inside the error traces. Thus the throughput andFEC results for 2, 5.5 and 11 Mbps are each averaged over3 × 5 × 5 × 25 = 1, 875 received video streams. Due tothe high complexity of video decoding, for each trace-set thePSNR results are reported for one (randomly selected) videoexperiment, that is, PSNR results for 2, 5.5 and 11 Mbps areeach averaged over 3 × 5 × 5 = 75 received video streams.

For each packet transmission, a 512 byte packet (452 bytesof video payload and 60 bytes of headers) was corruptedusing the bit-error traces. The models used for likelihoodcomputation on all receivers were trained using error traceswhich were not used in the video experiments. In accordancewith [15], FSM chains of order-9 (512 states) and order-10(1024 states) were employed for the 5.5 and 2 Mbps bit-errorprocesses, and an MWM was used for the 11 Mbps process.Each FSM chain was folded to a 4-state FMC.

Video sequences were compressed using the latest availableversion of the H.264 video standard [1], [17]. The sequenceshad a QCIF frame size and were encoded at a frame-rate of30 fps. The streams were encoded at different source codingbitrates ranging from 100 kbps to 1 Mbps. A slice mode withfixed number of 452 bytes per slice was used for encoding [1].Intra frame period was set to 12, i.e., each group of pictures(GOP) had 12 frames. Varying numbers of video streams wereassigned to the wireless receivers. Transmission of packetsfrom each stream was simulated in a round robin fashionaccording to source bitrates. In order to achieve successfulvideo decoding, in the simulations we introduced a provisionthat the first frame of the video sequence (i.e., the very firstI-frame of the first GOP) is always received correctly.

B. Throughput Performance

The term throughput here refers to the ratio of the totalnumber of packets relayed to the receiver’s application to thetotal number of packets sent by the sender’s application layer.That is, throughput comprises of both error-free and corruptedpackets. The percentage packet drop rate is (1 − throughput)×100. Fig. 2 outlines the packet drops incurred by UDP Normal,UDP Lite and UDP with header estimation at 2, 5.5, and11 Mbps. The results are averaged over all receivers andmultimedia streams and hence the packet drops are referredto as average packet drops. The leftmost points in Fig. 2(a),(b), and (c) depict the simplest case where each receiver isreceiving only one multimedia stream. The number of videostreams per receiver is then incremented. More than onemultimedia per receiver is an important scenario for videoconferencing applications.

1) Comparison of Packet Drops: It can be clearly seen inFig. 2 (a), (b) and (c) that header estimation always incurs

lesser packet drops than normal UDP and UDP Lite. Theheader estimation packet drops include: (i) packets that weredropped because both the destination IP and the destinationMAC address were corrupted, and (ii) packets whose criticalfields were incorrectly estimated (resulting in false alarms).At 2 Mbps, header estimation packet drops are approximately0.2%, as opposed to approximately 0.4% and 1% in case ofUDP Lite and normal UDP. From Table I it can be seen thatthe 2 Mbps channel has receivers with very low packet errorrates and therefore the margin of improvement is small. At 5.5Mbps, UDP with header estimation provides approximately4% and 2% throughput improvements over normal UDP andUDP Lite. Due to the very high data rate at 11 Mbps, theheader estimation packet drops increase to about 3%, but thispacket drop rate is still substantially lower than that of UDPLite (≈ 15%) and normal UDP (≈ 30%).

2) False Alarm Rate: A false alarm is a packet that isnot intended for a multimedia session, but is relayed to thatsession. There are three sources of false alarms: (i) dueto channel errors, either destination MAC or destination IPaddress of a packet (not intended for the local receiver)gets mapped to the MAC or IP address of the receiver; (ii)a corrupted packet is inaccurately estimated; (iii) a corruptnon-multimedia UDP packet is received when one or moremultimedia sessions are active.

For the five streams per receiver case, cumulative falsealarm rates are 0.07%, 0.52% , and 1.3% at 2, 5.5 and 11Mbps. While these false alarms are quite low, they must bedetected because they can desynchronize the video and/or FECdecoders. To detect false alarms, we protected the 2 byteH.264 slice sequence numbers with 4 bytes of redundancy toensure that these sequence numbers can always be recoveredat the receiver. A receiver dropped all packets whose slicenumbers were much larger or smaller than the next/expectedslice number. For applications which do not have a slice/packetsequence number, a small incremental packet sequence num-ber with parity bytes can be easily inserted into each packetby the sender’s application layer. This sequence number basedscheme also provides erasure locations (i.e., dropped packets)to the FEC decoder.

C. FEC Performance

We now evaluate the amount of FEC redundancy requiredby the application to recover from errors and packet drops inthe multimedia content. Since the corrupted packets containmany error-free bytes, this error-free data should facilitateapplication layer FEC decoding. It is well-known that for amaximum-distance separable (MDS) FEC code, if a codewordhas 2t number of redundant symbols then a maximum of ttransmission errors in that block can be corrected [18]. Forthe same amount of redundancy, 2t erasures can be recovered.In the UDP Lite and UDP with header estimation scenarios,for an FEC codeword with e1 erasures (i.e., packet drops) ande2 errors, if e1 ≤ 2t then the FEC decoding algorithm canrecover the e1 erasures. After erasure decoding, e2 errors canbe corrected if e2 ≤ (2t − e1)/2�.

We simulate a Reed-Solomon (RS) forward error correctionwith joint error-erasure decoding algorithm for all three (UDP


1 2 3 4 50.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

video streams per receiver

aver

age

pack

etdr

ops

%

UDP Normal

UDP Lite

UDP Hdr E s t

(a) 2 Mbps

1 2 3 4 50.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5


aver

age

pack

etdr

ops

%

UDP Normal

UDP Lite

UDP Hdr E s t

(b) 5.5 Mbps

1 2 3 4 50

5

10

15

20

25

30


aver

age

pack

etdr

ops

%

UDP Normal

UDP Lite

UDP Hdr E s t

(c) 11 Mbps

Fig. 2. Average packet drops for UDP Normal, UDP Lite and UDP with Header Estimation at different data rates and for varying number of video streamsper receiver; each point is averaged over 3 × (# of video streams) × 5 × 25 received video streams.

pkt hdr pkt payload=452 bytes

1

2

3

30

RS codeword 3

RS codeword 452

RS codeword 451

RS codeword 2

RS codeword 1

pkt numA pkt drop will

introduce anerasure in all the

452 RS codewords

Fig. 3. RS codeword construction for video FEC simulations.

Normal, UDP Lite, UDP with header estimation) protocolvariants. An RS codeword length of N = 30 bytes is usedfor all experiments. Each RS codeword is composed of onebyte from a different packet, where each packet consists of452 bytes of data payload. Thus each packet contributes to452 separate RS codewords, and each codeword spans over30 packets. The FEC construction is shown pictorially in Fig.3. For all protocol stack variants, we treat packet drops aserasures in the received codewords. Note in Fig. 3 that a packetdrop results in an erasure in 452 codewords.

Since normal UDP does not have corrupted packets, allparity bytes are used for erasure decoding. Unlike normalUDP, FEC codewords for UDP with header estimation haveerrors due to corrupted packets and erasures due to incorrectestimations and/or false alarms. Similarly, FEC codewordsfor UDP Lite have errors due to corrupted packets and era-sures due to packets with corrupted headers. For performanceevaluation, we define a simple measure called decodableprobability:

pd = decodable codewords received / codewords transmitted,

where a codeword with e1 erasures and e2 errors is decodableonly if 2t ≥ e1 +2e2. Clearly, 0 ≤ pd ≤ 1 and pd = 1 impliesthat all received codewords were successfully decoded.

Fig. 4 outlines the decodable probability as a function ofthe number of message bytes in an RS codeword for the five

streams per receiver experiment. At each data rate, the resultsare averaged over all the experiments. From Fig. 4(a), it isclear that at 2 Mbps normal UDP and UDP Lite require 6bytes per RS codeword for almost 100% recovery; that is,approximately 20% bandwidth is wasted in redundancy. UDPwith header estimation achieves almost error-free recoveryeven if two redundant bytes are sent per 28 message bytes -approximately 7% bandwidth is used for redundant symbols.From Fig. 4(b), it can be observed that, due to the increasederror-rate at 5.5 Mbps, the performance gap between UDPwith header estimation and the other protocols widens. NormalUDP and UDP Lite waste approximately 33% bandwidthon FEC redundancy to achieve almost 100% recovery. UDPwith header estimation achieves almost 100% recovery bywasting much lesser ( 20%) bandwidth on FEC redundancy.Fig. 4(c) shows that at 11 Mbps the improvements provided byUDP with header estimation are quite significant; UDP withheader estimation requires approximately 27% redundancy foralmost 100% recovery, while both normal UDP and UDPLite require approximately 53% redundancy. Thus headerestimation salvages the high error rate 11 Mbps channel.

In general UDP Lite and UDP with header estimation willperform worse than normal UDP over channels where the FECsymbol error-rate is approximately equal to the packet-lossrate; for instance, in extremely low or extremely high errorrate channels. Due to the present FEC construction [see Fig.3], at extremely high and extremely low error-rates the numberof errors in the received codewords of UDP Lite and UDP withheader estimation will be equal to or more than the number oferasures observed in normal UDP’s received codewords. Sinceerasure decoding requires lesser parity than error decoding,when the number of errors is approximately the same as thenumber of erasures, the performances of the joint error-erasureFEC decoders of UDP Lite and UDP with header estimationwill be inferior to the erasure-only decoder of normal UDP.We are currently working on deriving tighter conditions wherenormal UDP will perform better than header estimation UDP.

D. Video Performance

In this section, we present results for the 5 streams perreceiver experiment, with a fixed rate FEC having two redun-dant bytes per RS codeword of 30 bytes. The average GOP-by-GOP peak signal-to-noise ratio (PSNR) plots at differentdata rates are given in Fig. 5. All PSNR results are averaged


16 18 20 22 24 26 280.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

1.0

message bytes per block

aver

age

deco

dabl

epr

obab

ility

UDP NormalUDP LiteUDP Hdr E st

(a) 2 Mbps

16 18 20 22 24 26 280.86

0.88

0.9

0.92

0.94

0.96

0.98

1


aver

age

deco

dabl

epr

obab

ility


(b) 5.5 Mbps

16 18 20 22 24 26 280.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


aver

age

deco

dabl

epr

obab

ility


(c) 11 Mbps

Fig. 4. Average FEC redundancy required by UDP Normal, UDP Lite and UDP with Header Estimation at different data rates of an 802.11b LAN; eachpoint is averaged over 3 × 5 × 5 × 25 = 1875 received video streams.

5 10 15 2015

20

25

30

35

40

45

aver

age

PS

NR

G OP

E rror-freeUDP NormalUDP LiteUDP Hdr E s t

(a) 2 Mbps

5 10 15 20

15

20

25

30

35

40

45

aver

age

PS

NR

G OP


(b) 5.5 Mbps

5 10 15 205

10

15

20

25

30

35

40

45

aver

age

PS

NR

G OP


(c) 11 Mbps

Fig. 5. Average PSNR of video sequences for UDP Normal, UDP Lite and UDP with Header Estimation using a 30 byte RS codeword with 2 parity bytes;each graph is averaged over 3 × 5 × 5 = 75 received video streams.

over 75 received video streams. Since we allow the very firstvideo (I) frame of the first GOP to be received without anyerrors and losses, PSNR of the first GOP is not plotted. Thedotted line in Fig. 5 represents PSNR of error-free videowhich is the performance upper bound for the protocols underconsideration. PSNR of UDP with header estimation is theclosest to the PSNR of the error-free video at all data rates. At2 and 5.5 Mbps, respective average PSNRs of normal UDP andUDP Lite are approximately 10 dB and 25 dB lower than thePSNR of UDP with header estimation. However, at 11 Mbpsthe PSNR of UDP with header estimation is approximately25 dB higher than the PSNRs of normal UDP and UDP Lite,both of which render equally and extremely low PSNRs at 11Mbps.

VI. CONCLUSIONS

In this paper, we proposed a receiver-based maximum-likelihood header estimation methodology to improve multi-media quality over wireless channels. Evaluations for 802.11bLANs demonstrated that the proposed methodology providessignificantly better throughput and multimedia quality thannormal UDP and UDP Lite.

APPENDIX

FOLDED MARKOV CHAINS FOR WIRELESS CHANNEL

MODELING

Let the state space of an FSM chain with 2k states be givenas H =

{0, 1, . . . , 2k − 1

}, where k is the FSM memory-

length. Now consider a new lumped process with state spaceS = {S0, S1, . . . , SN−1}, where N ≤ 2k. To create a lumped

process, FSM states are partitioned and each partition isaggregated into a state of the lumped process [12]. We firstprove the following necessary condition for defining partitionsof the FSM state space:

Lemma 1: The next state in an aggregate process can beaccurately determined only if the FSM states (2i) mod 2k and(2i + 1) mod 2k do not belong to the same aggregate state,

(2i) mod 2k ∈ Sj ⇒ (2i + 1) mod 2k /∈ Sj , (9)

where k is the memory-length, i ∈ H and Sj ∈ S.Proof: This lemma implies that both transition possibili-

ties of an FSM state cannot be aggregated in a single state. Letthere exist an aggregate state Sj that contains both FSM states2i and 2i+1, and let Sq be an aggregate state containing statei. Then, Pr {Sq → Sj} does not give any information aboutwhether a good- or a bad-bit should be added to the memory-window.

The strong lumpability theorem [12] asserts that all FSMstates belonging to an aggregate state should have the sameprobability of moving out of the aggregate state. It is unlikelythat FSM chains trained using actual wireless traces willsatisfy this condition. Therefore, we modify an FSM chain’sstate transition probabilities such that the modified chain canbe divided into two equal-sized partitions that satisfy thestrong lumpability condition.

We first note that to reach FSM states 2i and 2i + 1 for0 ≤ i ≤ 2k−1−1 in a single transition, the current state of theFSM chain should be either state i or state

(2k−1 + i

). In other

words, the following pairs of FSM states have the same setof next possible states: (0, 2k−1), (1, 1 + 2k−1), . . . , (2k−1 −


1, 2k−1 − 1 + 2k−1 = 2k − 1). Based on the observation thatthe state pair

(i, 2k−1 + i

)have the same one-step transition

possibilities, we propose to modify an FSM chain’s transitionprobabilities matrix as follows:

p̂i,(2i) mod 2k = p̂2k−1+i,(2i) mod 2k

=p

i,(2i) mod 2k +p2k−1+i,(2i) mod 2k

2

andp̂i,(2i+1) mod 2k = p̂2k−1+i,(2i+1) mod 2k = 1 − p̂i,(2i) mod 2k .

For i = 0, 1, . . . , 2k − 1, where pi,j and p̂i,j representthe transition probabilities of the original and modified FSMchains. After this transformation, state pairs

(i, 2k−1 + i

)in

the modified transition probability matrix clearly satisfy thelumpability constraint and can be aggregated together.

Using the above strategy, any 2k × 2k FSM transitionprobability matrix can be modified and folded about 2k−1 togive a Markov chain with exactly half the number of states.Since the basic transition probability structure is retainedafter the folding operation, this state reduction procedurecan in fact be applied recursively to a 2k state FSM chainto give a 2m state folded Markov chain, where m is aninteger such that 1 ≤ m < k. We refer to these models asfolded Markov chains (FMCs). A folded process is a coarseon-average approximation of an FSM chain because foldingsimply ensures a non-zero transition probability between twoaggregate states.

REFERENCES

[1] ISO/IEC JTC 1/SC29/WG11 and ITU-T SG16 Q.6, “Draft ITU-T Rec-ommendation and Final Draft International Standard of Joint VideoSpecification (ITU-T Rec. H.264|ISO/IEC 14496-10 AVC),” Mar. 2003.

[2] ISO/IEC JTC 1/SC29/WG11, “Text of ISO/IEC 14496-2:2001 (UnifyingN2502, N3301, N3056, and N3664,” Doc. N4350, July 2001.

[3] M. Zorzi and R. R. Rao, “On the statistics of block errors in burstychannels,” IEEE Trans. Commun., vol. 45, no. 6, pp. 660–667, 1997.

[4] L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson, and G. Fairhurst,“The lightweight user datagram protocol (UDP-Lite),” RFC 3828, July2004.

[5] A. Singh, A. Konrad, and A. D. Joseph, “Performance evaluation of UDPLite for cellular video,” ACM NOSSDAV, 2001.

[6] H. Zheng and J. Boyce, “An improved UDP protocol for video trans-mission over Internet-to-wireless networks,” IEEE Trans. on Multimedia,(3)3, pp. 356–365, 2001.

[7] S. A. Khayam, S. Karande, H. Radha, and D. Loguinov, “Performanceanalysis and modeling of errors and losses over 802.11b LANs for high-bitrate real-time multimedia,” Signal Proc.: Image Commun., vol. 18, no.7, 575–595, 2003.

[8] A. Servetti and J. C. De Martin, “Error tolerant MAC extension for speechcommunications over 802.11 WLANs,” IEEE VTC, 2005.

[9] C. H. Shih, Y. M. Tou, C. K. Shieh, and W. S. Hwang, “A self-regulated redundancy control scheme for wireless video transmission,”IEEE WirelessCom, 2005.

[10] E. Masala, M. Bottero, and J. C. De Martin, “MAC-level partialchecksum for H.264 video transmission over 802.11 ad hoc wirelessnetworks,” IEEE VTC, 2005.

[11] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I.New York: Wiley, 2001.

[12] J. G. Kemeny and J. L. Snell, Finite Markov Chains. New York:Springer-Verlag, 1976.

[13] R. Riedi, M. Crouse, V. Ribeiro, and R. Baraniuk, “A multifractalwavelet model with application to network traffic,” IEEE Trans. Inf.Theory, vol. 45, no. 3, pp. 992-1018, 1999.

[14] S. A. Khayam and H. Radha, “Linear-complexity models for wirelessMAC-to-MAC channels,” ACM WINET, vol. 11, no. 5, pp. 543-555, 2005.

[15] S. A. Khayam, H. Radha, S. Aviyente, and J. R. Deller, Jr. “Markovand multifractal wavelet models for wireless channels,” PerformanceEvaluation, vol. 64, no. 4, pp. 298–314, May 2007.

[16] S. A. Khayam, S. Karande, M. U. Ilyas, and H. Radha, “Header detectionto improve multimedia quality over wireless networks,” IEEE Trans.Multimedia, vol. 9, no. 2, pp. 377–385, Feb. 2007.

[17] H.264/AVC Software Coordination Web page: http://iphome.hhi.de/suehring/tml.

[18] R. E. Blahut, Theory and Practice of Error Control Codes. Addison-Wesley, May 1984.

Syed Ali Khayam received his B.E. degree inComputer Systems Engineering from National Uni-versity of Sciences and Technology (NUST), Pak-istan, in 1999 and his M.S. and Ph.D degrees inElectrical Engineering from Michigan State Univer-sity in 2003 and 2006, respectively. In February2007, he joined the NUST Institute of InformationTechnology (NIIT), National University of Sciences& Technology (NUST), Pakistan as an assistant pro-fessor. He also worked at Communications EnablingTechnologies from October 2000 to August 2001.

His research interests include analysis and modeling of statistical phenomenain computer networks, network security, cross-layer design for wirelessnetworks, and real-time multimedia communications.

Hayder Radha received the B.S. degree (withhonors) from Michigan State University (MSU)in 1984, the M.S. degree from Purdue Universityin 1986, and the Ph.M. and Ph.D. degrees fromColumbia University in 1991 and 1993, respectively(all in electrical engineering). He joined MSU in2000 as associate professor in the Department ofElectrical and Computer Engineering where he is afull professor now. From 1986–1996, he was withBell Laboratories. From 1996–2000, he worked atPhilips Research USA and became a Philips Re-

search Fellow in 2000. His research interests include wireless and multimediacommunications and networking, stochastic modeling, and image and videocoding and compression. He has more than 25 patents in these areas. Heserved as cochair and editor of the ATM and LAN Video Coding ExpertsGroup of the ITUT in 1994-1996. he is a member of the IEEE SignalProcessing Multimedia Technical Committee. He is a recipient of the BellLabs Distinguished Member of Technical Staff Award (1993), the WithrowDistinguished Scholar Award (2003), and the Microsoft Research Content andCurriculum Award (2004).

maximum-likelihood header estimation: a cross-layer methodology for wireless multimedia

Documents