wireless scalable video coding using a hybrid digital ...staff.ustc.edu.cn/~yulei/wsvc.pdf ·...

14
1 Wireless Scalable Video Coding Using a Hybrid Digital-Analog Scheme Lei Yu, Houqiang Li, Senior Member, IEEE, and Weiping Li, Fellow, IEEE Abstract—Wireless video broadcast/multicast and mobile video communication pose a challenge to the conventional video trans- mission scheme (which consists of separate digital source coding and digital channel coding). The reason is that the separate coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff effect” as well as limits the ability to support multiple users with diverse channel conditions. In this paper, we propose a novel wireless scalable video coding (WSVC) framework. Specifically, we present a hybrid digital-analog (H- DA) joint source-channel coding (JSCC) scheme that integrates the advantages of digital coding and analog coding. Moreover, the proposed JSCC is able to broadcast one video with different resolutions to fit various devices with different display resolutions. Compared to most state-of-the-art video delivery methods, it avoids the “staircase effect” and realizes Continuous Quality Scalability (CQS) on condition that the channel quality is within the expected range, and it has strong adaptability to channel variation with higher coding efficiency and better fairness among all receivers. Therefore, it is very suitable for wireless video broadcast/multicast transmission and mobile video applications. The experimental results show that for broadcasting/multicasting the videos with CIF and QCIF resolutions the proposed WSVC 1) outperforms SoftCast (which is a new analog scheme) average 0.605.90 dB and 3.399.97 dB respectively, 2) outperforms SVC+HM (which combines H.264/SVC codec and hierarchical modulation technique) average 3.879.13 dB and 0.2710.47 dB respectively, and 3) outperforms DCast (which is an up-to-date video delivery scheme) about 0.23.3 dB for the video with CIF resolution. The experimental results verify the effectiveness of our proposed WSVC framework. Index Terms—Wireless scalable video coding (WSVC), hybrid digital-analog (HDA), wireless video broadcast/multicast, joint source-channel coding (JSCC), robust video communication, continuous quality scalability (CQS). I. I NTRODUCTION W ITH the rapid development of wireless network and mobile terminals, wireless video services are becom- ing increasingly important and popular, involving a number of diverse applications, such as mobile TV, wireless video Manuscript received January 7, 2013; revised June 14, 2013. This paper was recommended by Associate Editor Enrico Magli. This work was sup- ported by Natural Science Foundation of China (NSFC) under contract No. 61272316, 973 Program under contract No. 2013CB329004, and National Major Special Projects in Science and Technology of China under contract No. 2012ZX03001033-2 The authors are all with the Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China (e-mail: [email protected], [email protected], w- [email protected]). Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. surveillance, mobile video conference, etc. Conventional wire- less video delivery scheme typically consists of separate digital video compression coding part (e.g., H.264/AVC [9]) and digital channel coding part. Such a separate “digitally-coded” framework is appropriate for many scenarios, which involve a single sender-receiver pair that communicates over a relatively static channel, whose characteristics vary slowly over time. However, it is unsuitable for the emerging wireless video services for two main reasons as follow. The first reason is that conventional scheme is unable to transmit video with reliable quality for mobile receivers, more- over, it may cause performance bottleneck for the receivers with high channel signal-to-noise ratio (SNR). This comes from the well-known “cliff effect”, which includes “threshold effect” and “leveling-off effect” [23], [16], [32]. 1) The thresh- old effect [18] refers to the fact that when the channel SNR falls beneath a certain threshold then the received video quality degrades drastically. This problem, which is well known in the literature, is due to entropy coding’s sensitivity to bit errors and the total breakdown of most powerful error-correcting codes at low channel SNRs. 2) The leveling-off effect refers to the fact that when the channel SNR increases above the threshold, the received video quality does not improve. This effect is due to the unrecoverable quantizing distortion which limits the system performance at high channel SNRs. For mobile terminals, wireless channels are almost varying at all times, which may make the receiver sometimes suffer threshold effect and sometimes suffer leveling-off effect. Therefore, conventional digital scheme for wireless video unavoidably faces the problem of cliff effect. The second reason is that for video broadcast/multicast services, the bitrate selected by conventional wireless video delivery scheme cannot fit all receivers at the same time. If the video is transmitted at a high bitrate, it can be decoded only by those receivers with better quality channels, but it is unfair to those receivers with worse quality channels; on the contrary, if it transmits at a low bitrate supported by all receivers, it reduces the performance of the receivers with better quality channels, and it is unfair to them. In order to overcome cliff effect for the target channel SNR range (all receivers’ channel SNR range), some layered digital schemes consisting of layered source coding and layered channel coding schemes are proposed [5]-[8]. These schemes are based on purely-digital source-channel coding approaches. Typically, they couple scalable video coding (SVC) (e.g., H.264/SVC) with hierarchical modulation (HM) [1]-[4], where each layer of SVC is mapped onto the corresponding layer of HM. However, the layered digital scheme is unable to remove

Upload: others

Post on 22-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

1

Wireless Scalable Video Coding Using a HybridDigital-Analog Scheme

Lei Yu, Houqiang Li, Senior Member, IEEE, and Weiping Li, Fellow, IEEE

Abstract—Wireless video broadcast/multicast and mobile videocommunication pose a challenge to the conventional video trans-mission scheme (which consists of separate digital source codingand digital channel coding). The reason is that the separatecoding scheme is based on non-scalable coding design, and thisunavoidably leads to “cliff effect” as well as limits the ability tosupport multiple users with diverse channel conditions. In thispaper, we propose a novel wireless scalable video coding (WSVC)framework. Specifically, we present a hybrid digital-analog (H-DA) joint source-channel coding (JSCC) scheme that integratesthe advantages of digital coding and analog coding. Moreover,the proposed JSCC is able to broadcast one video with differentresolutions to fit various devices with different display resolutions.Compared to most state-of-the-art video delivery methods, itavoids the “staircase effect” and realizes Continuous QualityScalability (CQS) on condition that the channel quality is withinthe expected range, and it has strong adaptability to channelvariation with higher coding efficiency and better fairness amongall receivers. Therefore, it is very suitable for wireless videobroadcast/multicast transmission and mobile video applications.The experimental results show that for broadcasting/multicastingthe videos with CIF and QCIF resolutions the proposed WSVC1) outperforms SoftCast (which is a new analog scheme) average0.60∼5.90 dB and 3.39∼9.97 dB respectively, 2) outperformsSVC+HM (which combines H.264/SVC codec and hierarchicalmodulation technique) average 3.87∼9.13 dB and 0.27∼10.47 dBrespectively, and 3) outperforms DCast (which is an up-to-datevideo delivery scheme) about 0.2∼3.3 dB for the video with CIFresolution. The experimental results verify the effectiveness ofour proposed WSVC framework.

Index Terms—Wireless scalable video coding (WSVC), hybriddigital-analog (HDA), wireless video broadcast/multicast, jointsource-channel coding (JSCC), robust video communication,continuous quality scalability (CQS).

I. INTRODUCTION

W ITH the rapid development of wireless network andmobile terminals, wireless video services are becom-

ing increasingly important and popular, involving a numberof diverse applications, such as mobile TV, wireless video

Manuscript received January 7, 2013; revised June 14, 2013. This paperwas recommended by Associate Editor Enrico Magli. This work was sup-ported by Natural Science Foundation of China (NSFC) under contract No.61272316, 973 Program under contract No. 2013CB329004, and NationalMajor Special Projects in Science and Technology of China under contractNo. 2012ZX03001033-2

The authors are all with the Department of Electronic Engineeringand Information Science, University of Science and Technology of China,Hefei 230027, China (e-mail: [email protected], [email protected], [email protected]).

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending an email to [email protected].

surveillance, mobile video conference, etc. Conventional wire-less video delivery scheme typically consists of separate digitalvideo compression coding part (e.g., H.264/AVC [9]) anddigital channel coding part. Such a separate “digitally-coded”framework is appropriate for many scenarios, which involve asingle sender-receiver pair that communicates over a relativelystatic channel, whose characteristics vary slowly over time.However, it is unsuitable for the emerging wireless videoservices for two main reasons as follow.

The first reason is that conventional scheme is unable totransmit video with reliable quality for mobile receivers, more-over, it may cause performance bottleneck for the receiverswith high channel signal-to-noise ratio (SNR). This comesfrom the well-known “cliff effect”, which includes “thresholdeffect” and “leveling-off effect” [23], [16], [32]. 1) The thresh-old effect [18] refers to the fact that when the channel SNRfalls beneath a certain threshold then the received video qualitydegrades drastically. This problem, which is well known in theliterature, is due to entropy coding’s sensitivity to bit errors andthe total breakdown of most powerful error-correcting codesat low channel SNRs. 2) The leveling-off effect refers to thefact that when the channel SNR increases above the threshold,the received video quality does not improve. This effect isdue to the unrecoverable quantizing distortion which limitsthe system performance at high channel SNRs. For mobileterminals, wireless channels are almost varying at all times,which may make the receiver sometimes suffer thresholdeffect and sometimes suffer leveling-off effect. Therefore,conventional digital scheme for wireless video unavoidablyfaces the problem of cliff effect.

The second reason is that for video broadcast/multicastservices, the bitrate selected by conventional wireless videodelivery scheme cannot fit all receivers at the same time. If thevideo is transmitted at a high bitrate, it can be decoded only bythose receivers with better quality channels, but it is unfair tothose receivers with worse quality channels; on the contrary,if it transmits at a low bitrate supported by all receivers, itreduces the performance of the receivers with better qualitychannels, and it is unfair to them.

In order to overcome cliff effect for the target channel SNRrange (all receivers’ channel SNR range), some layered digitalschemes consisting of layered source coding and layeredchannel coding schemes are proposed [5]-[8]. These schemesare based on purely-digital source-channel coding approaches.Typically, they couple scalable video coding (SVC) (e.g.,H.264/SVC) with hierarchical modulation (HM) [1]-[4], whereeach layer of SVC is mapped onto the corresponding layer ofHM. However, the layered digital scheme is unable to remove

Page 2: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

2

the cliff effect completely, and it causes the “staircase effect”.Besides digital source-channel coding schemes, some

analog or near-analog joint source-channel coding (JSCC)schemes for video delivery have been proposed recently, suchas SoftCast1 [15], [16]. SoftCast transmits the linear transformof the video signal directly in analog channel without quan-tization, entropy coding and forward error correction (FEC).Therefore, it is able to provide Continuous Quality Scalability(CQS) owing to the nature of analog coding. However, intheory, such analog scheme with linear mapping (from sourcesignals to channel signals) is relatively inefficient for most ofthe sources with large memory such as video [24], [25].

Through the analysis above, the wireless video deliveryscheme is desired to have better performance at three aspects:

1) Quality Scalability (Mobility): it measures how grace-fully one receiver’s reconstructed video quality varieswith its channel variation over time. It reflects the ro-bustness and adaptability of one scheme to the receiver’schannel variation. The more gracefully the video qualityvaries with the channel varying, the higher the QualityScalability (Mobility) is. Different from wired case, theultimate objective of wireless video delivery is to realizeCQS. In fact, owing to the nature of analog coding, someanalog schemes have realized this, such as SoftCast.

2) Multicast (or Broadcast) Efficiency: it measures therelative video quality among all receivers and the ab-solute video quality of each individual receiver for abroadcast/multicast with multiple receivers. The relativevideo quality reflects the fairness among all receivers;while the absolute video quality reflects overall source-channel coding efficiency. The scheme with high Multi-cast Efficiency should have the characteristic that thedecoded video quality of each receiver matches itschannel quality.

3) Spatial and Temporal Scalability (Multiresolution Scala-bility): it measures the scalability of one scheme to meetthe needs of different spatial and temporal resolutions.The more resolutions one scheme can simultaneouslyprovide, the higher its Spatial and Temporal Scalabilityis. Usually, for the schemes with Spatial and Tem-poral Scalability, the video with different resolutionsis encoded in one bitstream or analog data stream atsender and transmitted to multiple receivers which arein demand of different resolutions. Although the schemewithout Spatial and Temporal Scalability is also able torealize spatial or temporal multiresolution by decodingthe high resolution (HR) video and then downsamplingit to the low resolution (LR) video or by decodingthe LR video and then upsampling it to HR video,it will increase computational complexity and memoryconsumption in the former case, or will fail to achievegood reconstruction quality in the latter case.

1When implemented on hardware in [15], [16], SoftCast adopts almostmost dense constellation available nowadays, 64K-QAM, to modulate itstransmitted signals. Therefore, strictly speaking, the implemented SoftCastsystem is digital or “near-analog”. However, the design of SoftCast is based onanalog coding idea, and theoretically supports purely analog implementation(or digital implementation with arbitrary accuracy). Therefore, SoftCast canbe considered as an analog scheme.

Fig. 1. The objective curve for broadcasting/multicasting video source tomultiple receivers.

Both Quality Scalability and Multicast Efficiency requirethat video delivery scheme is able to provide better receivedquality over all receivers’ channel SNR range. Therefore, theobjective of JSCC for wireless video delivery is changed tooptimizing the received video quality over a given channelSNR range instead of at a specific channel SNR. The receiverat any channel SNR, if only it within the channel SNR range,should be able to reconstruct a video signal with the optimizedquality at that channel SNR. We denote the new Received-Quality Channel-Quality (RC) function for a given channelSNR range as “RC function for multicast”. Fig. 1 illustratesthis point.

For unicast (i.e., point-to-point) communication, accordingto the source-channel separation principle [19], [22], optimalRC performance can be achieved by separate (or independent)design of the source and channel codes, however, for broad-cast/multicast, except for some special cases (e.g., a Gaussiansource communicated over a Gaussian broadcast channel withbandwidth match), optimal RC performance is still an openproblem for other cases, even for parallel Gaussian sourcecommunicated over parallel Gaussian broadcast channel [24],let alone for the cases involving video source. In addition, ithas been proved that for broadcasting/multicasting a Gaussiansource over a Gaussian broadcast channel with bandwidthexpansion (channel symbol rate larger than source symbolrate), it is not able to achieve the optimal RC performanceof unicasting case for each receiver, and this still holds evenfor the broadcast/multicast with only two receivers [20], [21].Therefore, as shown in Fig. 1, it is significant to realize thedesired objective of JSCC for wireless video delivery whichachieves suboptimal RC performance for all SNRs in thechannel SNR range (i.e., high Multicast Efficiency) and thecontinuous curve (i.e., CQS) that parallels the RC curve witha single sending signal stream. This is also the objective ofour scheme in this paper.

In this paper, we propose a novel wireless scalable videocoding (WSVC) framework. In particular, we present a hy-brid digital-analog (HDA) joint source-channel coding (JSCC)scheme (where the digital coding part codes base layer and theanalog coding part codes enhancement layer) that integratesthe advantages of digital coding and analog coding—highcoding efficiency and graceful quality variation with channelquality varying over time; moreover, it is able to broadcast

Page 3: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

3

one video with different resolutions to fit the devices withdifferent display resolutions. Relative to most state-of-the-artvideo delivery methods, our WSVC has remarkable perfor-mance in Quality Scalability, Multicast Efficiency and Spatialand Temporal Scalability. Furthermore, it realizes CQS likeSoftCast. Therefore, it is very suitable for wireless videobroadcast/multicast and mobile video scenarios.

The rest of the paper is organized as follows. SectionII summarizes the related work on wireless video broad-cast/multicast. Section III gives an overview of our proposedWSVC framework. The coding part and the denoising partof our proposed WSVC are elaborated on in Section IV andSection V, respectively. Next, Section VI extends the ability ofour proposed WSVC to support full scalability and bandwidthadaptation, as well as the complexity analysis of WSVC isalso given in Section VI. Then, the evaluation environment andthe experimental results are presented in Section VII. Finally,Section VIII concludes the paper.

II. RELATED WORK

Wireless video delivery scheme can be categorized into threetypes: digital scheme, which consists of digital source-coding(e.g., quantizing and entropy coding), and digital channel-coding (e.g., forward error correction (FEC) and digital modu-lation); analog scheme, which consists of analog source-codingand analog mapping (which is also named as “analog modula-tion”) that maps (analog) output signal of source coding into(analog) channel signal directly; hybrid digital-analog scheme,which combines digital coding techniques with analog codingtechniques by transmitting superposition of digital modulationsignal and analog modulation signal or transmitting themin parallel (e.g., in the way of time-sharing or bandwidth-sharing).

A. Digital Scheme

The conventional video delivery scheme with separatesource coding and channel coding is based on Shannon’ssource-channel separation principle, and it is the most classicdigital scheme. As mentioned in the previous section, owingto its cliff effect, unfairness in broadcast/multicast, and single-resolution characteristic, such a separation coding scheme isusually poor at Quality Scalability, Multicast Efficiency andSpatial and Temporal Scalability.

To overcome the shortcomings of the conventional scheme,some scalable digital video delivery schemes [5]-[8] havebeen proposed, where the video is coded with multiple levelsof quality or resolution and transmitted with unequal errorprotection (UEP), such as HM which is accepted by DVB-T/H/SH standards [1]-[4]. These schemes are usually con-sidered as one kind of digital JSCC for video delivery. Itusually uses the scalable video coding (SVC) (e.g., H.264/SVC[10], [11]) at the sender, which fragments a video streaminto a base layer and several enhancement layers, so thismethod is named SVC+HM for short. In order to achievescalability, the H.264/SVC method increases bitrate by nearly10%-50% compared with H.264/AVC with the same videoquality to encode one video in two layers [10]. However, SVC

encoder has to trade off between coding efficiency and scalablegranularity, and the channel coding efficiency will becomelower while the decoder complexity will become higher withthe number of HM layer increasing. Therefore it is almostunrealistic for the scalable digital schemes to realize FineGranular Scalability (FGS) (which requires them to adoptmuch more layers than Coarse Granular Scalability (CGS)).Although the scalable digital scheme has better performanceat Quality Scalability, Multicast Efficiency and Spatial andTemporal Scalability than separation coding scheme, (owingto using entropy coding and powerful error-correcting codes)it is unable to remove the cliff effect completely, and it causesthe “staircase effect”.

Besides, there are other studies on scalable video streamingin the literature, most of which adopt scalable source codingand adaptive modulation and coding (AMC), such as [12]-[14].Unicast scheduling of scalable video streams by dynamicallyallocating resource among all video streams has been recentlyaddressed in [12], [13]. For each stream, it is actually coded bya scalable source coding, H.264/SVC, and an AMC which dy-namically chooses appropriate FEC and modulation to matchthe channel condition. However, this scheme is not suitable forthe video broadcast scenario, since there is no appropriate FECand modulation accommodating multiple channel conditionssimultaneously (owing to cliff effect). In [14], the problem ofrate allocation for video multicast streaming with the aim ofminimizing video distortion over wireless mesh networks isinvestigated, where the links interfering with each other time-share their transmission opportunities. This means that thelinks interfering with each other transmit streams individuallyby a time-sharing way, even when the end nodes of these linkshave the same video stream or correlated video streams (e.g.,one stream contains the other one) to transmit to their multipleneighboring nodes. Different from the existing scalable videostreaming schemes, we focus on a more general scenarioinvolving both mobile video and multiresolution video broad-cast, and then adopt a one-size-fits-all video delivering schemeto deal with it, which not only need not allocate resourcedynamically but also better accommodate video broadcast byapplying a superposition-based coding technique.

B. Analog Scheme

Some analog JSCC for wireless video delivery schemes,where none of digital coding techniques (e.g., entropy-codingor digital channel-coding) is adopted, have been also proposedto cope with the cliff effect. These schemes can optimizereceived distortion by jointly designing analog source-coding(e.g., decorrelation transform) and analog modulation. Soft-Cast [15], [16] is a typical and new analog scheme. It is asimple but comprehensive design for wireless video broad-cast/multicast and mobile video, which integrates transformcoding with linear analog amplitude modulation. At the sourceend, SoftCast consists of three steps: discrete cosine transform(DCT), power allocation and Hadamard transform.

DCT operation can remove the spatial and temporal redun-dancy of video frames in one Group of Pictures (GOP). Powerallocation is employed to minimize the received distortion

Page 4: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

4

by optimally scaling the transform coefficients. Hadamardtransform is used to transform the coefficients with unequalpower and importance into the ones with equal average powerand equal importance so as to cope with packet loss.

At the receiver end, SoftCast uses a Linear Least Square Es-timator (LLSE) as the opposite operation of power allocationand Hadamard transform. Considering that all the operationsin each step are linear processing, thereby the channel noiseis linearly transformed into reconstructed video.

SoftCast is able to provide CQS thanks to the nature ofanalog coding, but its coding efficiency is relatively low forthe following two main reasons. First, for a parallel Gaussiansource with m source components and a memoryless parallelGaussian channel with an equal number of sub-channels, theloss in performance of the linear analog approach with respectto the digital approach for sufficiently large transmit powerscan be written as

Analog MSE DistortionDigital MSE Distortion =

((σ1+σ2+···+σm)/m

(σ1σ2···σm)1/m

)2

≥(

σmax/m

(σm−1max σmin)

1/m

)2

= 1m2

(σ2max

σ2min

)1/m

,

(1)

where σ2j denotes the variance of the j-th source component,

and σ2max and σ2

min denote the maximum and minimumamong all σ2

j , j = 1, 2, · · · ,m, respectively [24], [25]. Thus,this gap grows with σ2

max/σ2min and can be arbitrarily large.

Unfortunately, after decorrelation, most of videos (with largememory) are transformed into multi-dimensional sources withlarge ratio of maximal variance to minimal variance. Second,(intra- and inter-) prediction coding is one of the most efficientcompression techniques in video coding, but in order toavoid error propagation, SoftCast employs 3D-DCT insteadof prediction coding. Therefore, this will lead SoftCast to beinefficient. In conclusion, SoftCast is able to provide betterfairness among all receivers as well as better Quality Scalabil-ity, but its coding efficiency is relatively low. In addition, dueto the lack of multiresolution characteristic for DCT operation,SoftCast has no Spatial and Temporal Scalability.

C. Hybrid Digital-Analog Scheme

Besides digital scheme and analog scheme, there are sev-eral hybrid digital-analog (HDA) JSCC schemes proposed inrecent years [27]-[30]. These schemes integrate digital codingtechniques with analog coding techniques by transmittingsuperposition of digital modulation signal and analog mod-ulation signal or transmitting them in the way of time-sharingor bandwidth-sharing. However, so far, most of the existingHDA methods (e.g., [23]-[26]) focus on theoretical study (e.g.,researching on performance of Gaussian sources over Gaussianchannels with equal bandwidth or unequal bandwidth); exceptfor the VQHDA image delivery scheme in [27], the proposedHDA image delivery scheme in [28], the DCast video deliveryscheme in [30], and the WaveCast video delivery scheme in[29], there is hardly any practical HDA scheme for image andvideo delivery. The VQHDA scheme in [27] is designed for

image delivery. In order to achieve better Quality Scalabilityas well as low complexity and low delay, it applies a VQcascaded with a binary phase shift keying (BPSK) modulationdirectly (without FEC adopted) in digital part, and linearcoding in analog part. The proposed HDA scheme in [28] isalso designed for image delivery, and moreover, it optimizesfor one time-varying channel, i.e., for unicast. It combinesa digital scalable coding with UEP and analog linear codingin the way of time-sharing or bandwidth-sharing to achievebetter Quality Scalability. There are two versions of DCasthave been proposed: the one in [30], which is a state-of-the-art HDA scheme for video delivery based on SoftCast, and theother in [31] which is a state-of-the-art analog scheme. Here,we briefly introduce the HDA version of DCast. DCast [30]combines a separated digital coding and analog linear coding(SoftCast) in the way of time-sharing or bandwidth-sharing.Considering that the low coding efficiency of SoftCast resultsfrom using purely-analog scheme to transmit DCT coefficients,DCast employs coset coding and syndrome coding (whichare two typical techniques used in distributed source coding(DSC)) to code the coefficients so as to effectively reducetheir amplitudes (variances). According to formula (1), itcan be concluded that these two techniques will make theanalog part of DCast more efficient. In addition, DCast alsoapplies motion compensated prediction (MCP) to remove theinter frame correlation. Therefore, it is reported that DCastoutperforms SoftCast up to 2dB in video PSNR [30]. However,like SoftCast, since DCast applies DCT to remove the spatialand temporal redundancy, it has no Spatial and TemporalScalability too. WaveCast [29] is another HDA scheme basedon SoftCast. It adopts motion compensated temporal filter(MCTF) to exploit more inter-frame redundancy instead ofDCT operation. Similar to DCast, WaveCast also outperformsSoftCast 2dB in video PSNR at low channel SNR [29]. SinceWaveCast applies DWT, it has potential to support Spatial andTemporal Scalability, although this is not mentioned in theoriginal paper [29].

III. WSVC OVERVIEW

The proposed WSVC is able to realize quality scalability,spatial scalability and temporal scalability. In this paper, wemainly focus on the WSVC with quality scalability and spatialscalability. As for the temporal scalability, it can be realizedon this basis by removing and replacing some operations (Thedetailed process will be presented in Section VI).

In order to take advantages of both digital and analogschemes, our WSVC adopts a hybrid scheme which combinesthe low bitrate digital coding (as base layer coding) with thelinear analog coding (as enhancement layer coding), i.e., mod-ified SoftCast (MSoftCast), by superposing digital modulationsignal and analog modulation signal. Such a hybrid schemecan avoid the staircase effect as well as make full use of thechannel capacity on condition that the channel quality is withinthe expected range. In addition, to improve coding efficiency,the video compression codec with (intra- and inter-) predictioncoding is added into the source coding of digital part, and inorder to simplify system design, we directly take H.264/AVC

Page 5: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

5

Analog (MSoftCast) Encoder

Digital Encoder

H264/AVC

Encoder

-

FEC &

Modulation

OFDM

Channel

LL Band

(LR Video)

Temporal

DCT

Power

Allocation

2+

Part ion

&Random

Reordering

&Mapping

Received

Signal

LL Band

Residual Error

H264 Reconstructed

Video

-+

Power

Allocation

1

Side Information

H264 Bitstream

HR

Video

Spatial

2D-DWT

LH, HL,

HH Bands

s ax0ax

dx

x y

(a) The encoder of WSVC

Analog (MSoftCast) Decoder

Digital Decoder Video Denoising

Video Denoising

Demodulation

&Decoder

H264/AVC

Decoder +Recons-

tructed

LR Video

- Temporal

IDCT

LLSE

Decoder

Demapping

&Restoring

Order

Received

Signal

H264

Reconstructed

Video

Reconstructed

LL Band

Residual Error

-+

FEC &

Modulation

H264 Bitstream

Side Infomation

s Reconstru-

cted LH, HL,

HH Bands

Recons-

tructed

HR Video

Spatial

2D-IDWT

Video

Denoising

Denoised

LR Video

Video

Denoising

Denoised

HR Video

Auxiliary Information

0ayaydy

(b) The decoder of WSVC

Fig. 2. Framework of our proposed WSVC with quality scalability and spatial scalability.

[9], one of high efficient video compression codecs, as thedigital source-coding (i.e., base layer coding).

As shown in Fig. 2, WSVC consists of the coding part,which includes both the digital and analog codecs, and thedenoising part. At the sender side, original HR video framesare firstly decomposed by spatial 2D-DWT (discrete wavelettransform), and the resulting data are divided into LL, LH,HL and HH bands; next, as the video source of base layer,the LL band with LR is encoded using H.264/AVC, andthe output bitstream is channel-coded, modulated and power-allocated by the sender; then, the residual of the LL band(i.e., the difference between the original LL band and thereconstructed LL band after decoding) and the other 3 bands,as the source of enhancement layer, are processed sequentiallyby temporal DCT, power allocation and random reordering;finally, the output signals of digital encoder (as coding signalsof base layer) and the output signals of the analog encoder (ascoding signals of enhancement layer) are superposed and thentransmitted. At the receiver side, the decoder first decodes thedigital signal correctly; next, it obtains the analog signal bysubtracting the digital signal from the received signal; then, itreconstructs the LR video by adding the digital part (decodedby H.264/AVC) and the reconstructed residual of the LL band(decoded by the analog decoder); finally, it reconstructs the HRvideo by 2D-DWT based on the above reconstructed LR video(i.e., LL band) and the reconstructed LH, HL and HH bandsdecoded by the analog decoder. However, due to the effect ofanalog coding, it is unavoidable that the reconstructed videousing digital and analog codecs may contain analog noise.Therefore, it is desired to denoise the reconstructed video toachieve better visual quality at the receivers, especially for thereceivers with bad quality channels.

IV. WSVC’S CODING PART

A. Digital Coding Part

For the digital coding part, WSVC combines H.264/AVCwith FEC and modulation. At the sender side, original HRvideo frames are firstly decomposed by spatial 2D-DWT, andthe resulting LL band, as the video source of base layer, isencoded using H.264/AVC. The output bit stream is channel-coded, modulated and power-allocated by the sender. To bespecific, WSVC employs Binary Phase Shift Keying (BPSK)modulation with low rate convolutional codes [33] to adaptrobustly to channel SNR variability.

B. Analog Coding Part

For the analog part, WSVC employs our proposed MSoft-Cast scheme. Compared with SoftCast, MSoftCast changes inthe two aspects.

First, considering the multiresolution nature of DWT, M-SoftCast adopts spatial 2D-DWT and temporal DCT insteadof 3D-DCT in SoftCast to remove spatial correlation andtemporal correlation within and among video frames. Eachframe of the current GOP is firstly decomposed by spatial 2D-DWT to obtain the LR video source. In order to guarantee thatit is equivalent to an approximate orthogonal transformationcascaded with a scale transform, WSVC employs a form ofDaubechies 9-tap/7-tap filter as in Table I, which is suggestedin [37] but a little different to the form adopted by JPEG2000standard [38], [39]. The scaling factor of the equivalent scaletransform is around 1/

√2. Next, as LR video, the LL band is

encoded by H.264/AVC. Then, the residual of LL band as wellas the other bands are transformed by temporal DCT. Finally,every frame of the current GOP is divided into a number ofHP ×WP blocks called power allocation units (PAU, its role

Page 6: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

6

TABLE IDAUBECHIES 9/7 ANALYSIS AND SYNTHESIS FILTER COEFFICIENTS

Analysis Filter Coefficients

i Low-Pass Filter hL(i) High-Pass Filter hH(i)

0 0.602949018236 0.557543526229

±1 0.266864118443 -0.295635881557

±2 -0.078223266529 -0.028771763114

±3 -0.016864118443 0.045635881557

±4 0.026748757411

Synthesis Filter Coefficients

i Low-Pass Filter hL(i) High-Pass Filter hH(i)

0 1.115087052458 1.205898036472

±1 0.591271763114 -0.533728236886

±2 -0.057543526228 -0.156446533058

±3 -0.091271763114 0.033728236886

±4 0.053497514822

is the same as Chunk in SoftCast), and power allocation isperformed on all PAUs (just like SoftCast).

In addition, the standard deviations of the output of temporalDCT, σs (k) , 1 ≤ k ≤ NP , (where σs (k) is the standarddeviation of k-th PAU in one GOP and NP is the number ofthe PAUs in each GOP) as the side information of MSoftCast,are transmitted in the digital part considering their importanceto MSoftCast decoding. Therefore, it is needed to round offσs(k) into digital signal as following,

σs (k) =

⌊σs (k) +

1

2

⌋, (2)

For the side information of MSoftCast, σs (k), Unary Bi-narization [35] and Arithmetic Coding [34], [36] are utilizedto compress them. The Unary Binarization is the binarizationmethod converting the unsigned integer symbol into unarycode word consisting of “1” bits plus a terminating “0” bit.

Second, in order to avoid the burst noise with large powerfor digital part, and furthermore to guarantee the decodingperformance of the HDA scheme, it needs to whiten theoutput signals of MSoftCast encoder. Therefore, MSoftCastemploys the random reordering (its effect is similar to the oneof the interleaving in FEC) instead of Hadamard transformin SoftCast. Before the random reordering, it is needed topartition the PAUs in one GOP into two parts: the oneconsisting of the PAUs with larger variance and the oneconsisting of the PAUs with smaller variance (The reasons fordoing so will be clarified in the next subsection). After powerallocation, the MSoftCast encoder firstly sorts all rounded-off standard deviations of one GOP, σs (k) , 1 ≤ k ≤ NP ,in descending order, i.e., σs (i1) , σs (i2) , · · · , σs (iNP ), andthen divides them into two parts with equal number: the firsthalf part σs (i1) , σs (i2) , · · · , σs

(iNP /2

)and the last half part

σs

(iNP /2+1

), σs

(iNP /2+2

), · · · , σs (iNP ). The coefficients

in the i1, i2, · · · , iNP /2-th PAUs are directly mapped to Q(quadrature) component of the analog transmitted signal xa;while, all the coefficients in the iNP /2+1, iNP /2+2, · · · , iNP

-th PAUs are firstly operated by the overall random reordering,and then mapped to I (in-phase) component of xa. Obviously,the variance of I component of xa is smaller than that of its Qcomponent. Assuming the seeds of the random reordering atthe sender are known to all the receivers, all the receivers can

restore the initial order of the signals by the opposite operationof the random reordering.

For the integrity of the paper, power allocation and LinearLeast Square Estimator (LLSE) in SoftCast [15], which arealso employed in MSoftCast, are briefly described below.Assuming that s (k) is the output coefficient of temporal DCTin k-th PAU, then the power allocation that minimizes themean square reconstruction error for s (k) is given by

xa0 (k) = g (k) · s (k) ,

g (k) =

√NP ·Pa/2

σs(k)∑NP

k=1 σs(k),

(3)

where xa0 (k) is the output of power allocation for s (k), g (k)is the power allocation scaling factor for s (k), σs (k) is therounded-off standard deviation of s (k), Pa is average powerallocated to the analog signal xa. The LLSE provides a high-quality estimate of the DCT components by leveraging theknowledge of the statistics of the DCT components as wellas the statistics of the channel noise, which is presented asfollows:

ya0 (k) = xa0 (k) + n (k) ,

s (k) =g(k)σ2

s(k)g2(k)σ2

s(k)+σ2n(k)

ya0 (k) ,(4)

where ya0 (k) is the received signal for xa0 (k), σ2n (k) is the

variance of the channel noise, and s (k) is the output of LLSEfor s (k).

We assume that the indexes of PAUs for LR video are from1 to NLR

P , where NLRP is the number of PAUs for LR video.

Then, according to power allocation expression (3), the totalpower allocated to LR video is

PLRt =

NLRP∑

k=1

(g (k) σs (k))2. (5)

If we only consider the power allocation for LR video part, andtotal power is also set as PLR

t , then we have scaling factors

gLR (k) =√

PLRt

σs(k)∑NLR

Pk=1 σs(k)

=

√∑NLRP

k=1 (g(k)σs(k))2

σs(k)∑NLR

Pk=1 σs(k)

=

√NP ·Pa/2

σs(k)∑NP

k=1 σs(k)= g (k) , 1 ≤ k ≤ NLR

P .

(6)The expression (6) indicates that the power allocation s-

trategy employed in WSVC (expression (3)) achieves optimalpower allocation for both HR and LR video parts at the sametime.

C. HDA Modulation and Power Allocation

Superposition based HDA modulation is used in our WSVC.The transmitted signal, i.e., overall modulation signal, isobtained by superposing the output of the digital encoder andthe analog encoder. As shown in Fig. 3, the transmitted signalx is the sum of BPSK modulated signal xd and the output ofMSoftCast xa (to make full use of channel, the signal codingrates of xd, xa and x are designed to be the same and equalto the channel bandwidth), i.e.,

x = xd + xa. (7)

Page 7: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

7

Fig. 3. Mapping output of the digital and analog encoders to I/Q componentsof transmitted signal.

A point worth noting is that the digital decoder treats the ana-log signal xa as noise. Specifically, the decoding performanceof FEC with BPSK is only affected by I component of xa, (i.e.,xa2 in Fig. 3), whereas has nothing to do with Q component(i.e., xa1 in Fig. 3). Therefore, I component of xa shouldbe kept as small as possible so as to achieve better decodingperformance. That is the reason why each GOP is divided intotwo parts xa1 and xa2 with different variances in previoussubsection.

In order to decode H.264/AVC bitstream and side infor-mation correctly, the overall power allocation (i.e., dividingthe average available power Pt into the average power ofdigital signals Pd and the average power of analog signalsPa) is needed. The two power allocation, operation 1 and 2,should be jointly considered because xa is treated as noisewhen decoding the digital part. Assume that γ0

(PTE

)is the

lowest SNR in the I component (i.e., the ratio of the powerof BPSK-modulated signals and the total power of analog andnoise signals in the I component) needed for the adopted FECto guarantee that the decoding bit error rate (BER) is not largerthan the target BER PT

E . Then, to guarantee that the decodingBER is not larger than PT

E , the following inequality shouldbe satisfied,

Pd

Pa2 +N0/2≥ γ0

(PTE

), (8)

where Pd and Pa2 are the allocated average powers to xd andxa2 respectively, N0 is the average power of channel noise.

From the expression (3), it can be derived that

Pa1

Pa2=

∑NP /2k=1 σs (ik)∑NP

k=NP /2+1 σs (ik), µ, (9)

where Pa1 is the allocated average powers to xa1. Obviously,it satisfies that

Pa = Pa1 + Pa2. (10)

In addition, the average transmitting power is usually con-strained by the average available power Pt, i.e.,

Pd + Pa ≤ Pt. (11)

By combining expressions (8)-(11), it can be deduced that

Pa ≤(1 + µ)

(Pt − γ0

(PTE

)N0/2

)1 + µ+ γ0

(PTE

) . (12)

In order to achieve the best video quality, the power shouldbe in full use. Therefore, it is reasonable to set Pa to be themaximum. In addition, considering the varying channel, the

above expression should hold when the maximum noise Nm

occurs, i.e.,

Pa =(1+µ)(Pt−γ0(PT

E )Nm/2)1+µ+γ0(PT

E ),

Pd = Pt − Pa.(13)

Assume that γm denotes maximum channel SNR, i.e., γm =Pt/Nm, then the expression (13) can be rewritten as

Pa =(1+µ)

(1−

γ0(PTE )

2γm

)1+µ+γ0(PT

E )Pt,

Pd = Pt − Pa.

(14)

V. WSVC’S DENOISING PART (OPTIONAL)

Due to the effect of analog coding, the reconstructed videounavoidably contains some analog noise. Therefore, it isdesired to denoise the reconstructed video so as to achievebetter visual quality. However, for the receiver with goodchannel, the noise power is much lower so that the noisy videois acceptable. In such cases, the denoising part can be oneoptional operation in WSVC scheme.

In WSVC scheme, microblock coding modes (e.g., intra-prediction mode, inter-prediction mode, etc.) obtained fromH.264/AVC decoder is utilized as auxiliary information inthe denoising part. Therefore, the denoising operation can berealized using block-based filter along the prediction direction.WSVC employs the temporal LMMSE (linear minimum meansquared error) filter proposed in [42]. It is a temporal linearfilter, which is designed to filter the decoded video along themotion vector (MV). It is embedded in the H.264/AVC decoderand has very low complexity; thereby it can be processedonline. However, the temporal LMMSE is only applicable tothe inter-prediction microblocks (P and B microblocks). InWSVC, in order to denoise intra-prediction microblocks (Imicroblocks), we add the spatial LMMSE denoising algorithminto the original temporal LMMSE denoising filter. The spatialLMMSE denoising filter can recursively process pixel bypixel along the intra-prediction direction obtained from theH.264/AVC decoder, but it need update the filter coefficientsfor each pixel. This will cause large computation complexity.To avoid it, we apply the same filter coefficients for each block.In the following, we introduce spatial LMMSE denoisingalgorithm in detail.

The noise-corrupted pixel In (i, j) can be expressed as:

In (i, j) = I (i, j) +N (i, j) , (15)

where I (i, j) is the original value of the pixel at position(i, j) in one frame, N (i, j) is the additive noise with zeromean. Similar to the derivation of temporal denoising filter in[42], the expressions of the optimal spatial filter are listed asfollows:

Id (i, j) = ω1 · In (i, j) + ω2 · Id (i′, j′) + α, (16)

where (i, j) is the position of the current pixel, (i′, j′) isthe position of the previous pixel along the current blockprediction direction, Id (i, j) is the denoising filter output forthe current pixel, Id (i′, j′) is the spatial prediction output for

Page 8: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

8

the previous pixel, ω1, ω2 and α are the filter coefficientssatisfying

ω2 =σ2N(i,j)

σ2R′(i,j)

,

ω1 = 1− ω2,

α = ω2 ·R′(i, j) ,

(17)

whereR′ (i, j) = In (i, j)− Id (i

′, j′) , (18)

R′(i, j) and σ2

R′(i,j) are the mean and variance of R′ (i, j)

respectively, and σ2N(i,j) is the variance of N (i, j).

On one hand, the filter coefficients ω1, ω2 and α are derivedfrom Id (i

′, j′); on the other hand, Id (i′, j′) is derived fromω1, ω2 and α. This is the famous chicken or the egg dilemma.We adopt approximate calculation to solve it. We assume thatId (i

′, j′) ≈ I (i′, j′) for denoising output is very approximateto the original signal. Then the expression (18) can be rewrittenas following,

R′ (i, j) ≈ In (i, j)− I (i′, j′)= In (i, j)− In (i

′, j′) +N (i′, j′)= Rn (i, j) +N (i′, j′) .

(19)

whereRn (i, j) = In (i, j)− In (i

′, j′) . (20)

Thus, we have

R′(i, j) ≈ Rn (i, j) ,

σ2R′(i,j) ≈ σ2

Rn(i,j)+ σ2

N(i,j),(21)

where σ2N(i,j) can be estimated as the average MSE of the

current GOP from the MSoftCast decoder. According to theexpressions (3) and (4), we have

σ2N(i,j) ≡ MSEGOP =

4

NP

NP∑k=1

σ2s(k)σ

2n(k)

g2 (k) σ2s(k) + σ2

n(k)

, (22)

where the factor 4 comes from 2D-IDWT transform. It meansthat all pixels in one GOP use the same estimation of noisevariance. Assuming that channel noise power during transmit-ting one GOP is constant, N0, for the receivers with high SNR,the estimated noise can be approximated as:

σ2N(i,j) ≈ 4

NP

∑NP

k=1

σ2s(k)σ

2n(k)

g2(k)σ2s(k)

≈ 4N0

Pa

(∑NPk=1 σs(k)

NP

)2

, ∀ i, j.(23)

VI. FULL SCALABILITY SUPPORT, BANDWIDTHADAPTATION AND COMPLEXITY ANALYSIS

The proposed WSVC is able to realize quality scalability,spatial scalability and temporal scalability. Although only theWSVC with quality scalability and spatial scalability in Fig.2is introduced above, on basis of this, the temporal scalabilitycan be realized by removing the temporal DCT operation, andreplacing spatial 2D-DWT with 3D-DWT.

Like SoftCast, our WSVC accommodates different band-widths by discarding the transform coefficients in the PAUswith the smaller variances for the channel bandwidths lowerthan source bandwidth, or retransmitting the transform coef-ficients in the PAUs with the larger variances for the channel

bandwidths higher than source bandwidth. WSVC encoder stillneeds to inform the decoder of the locations of the discardedor retransmitted PAUs, but this overhead is significantly small-er since each PAU represents many transform components.WSVC sends this location information as a bitmap. Again,due to energy concentration, the bitmap has long runs ofconsecutive retained PAUs, and can be compressed using run-length encoding.

Our WSVC adds the MSoftCast and LMMSE denoisingoperations besides H.264/AVC Codec with fixed modulationand convolutional coding. In addition, the LMMSE denoisingonly uses the linear filter with very low complexity. Therefore,the main additional complexity comes from DCT/ IDCT and2D-DWT/2D-IDWT operations of MSoftCast. For n-points D-CT/IDCT operation, its computation complexity is O (nlogn )[40]; and for n-points DWT/IDWT operation, its computationcomplexity is O (n) [41]. Thus, for WSVC, the additionalcomputation complexities of one GOP for HR video andLR video are O (VHRlogL ) and O (VLRlogL ) respectively,where VHR and VLR are the number of pixels in one GOPfor HR video and LR video respectively, and L is GOP size(the number of frames in each GOP).

VII. EVALUATION AND RESULTS

A. Reference Baselines and Testing SetupTo evaluate the performance of our proposed WSVC, we

compare it with two baselines with multiresolution broad-cast/multicast:

1) Typical scalable digital scheme: H.264/SVC with con-volutional codes and hierarchical modulation [5], whichis denoted as SVC+HM. For SVC+HM with spatialscalability, one receiver with bad channel and in demandof HR video may be only able to decode the base layerwith LR. In this case, it upsamples the reconstructedLR video into HR video as reconstructed HR video byIDWT operation. Similarly, one receiver with good chan-nel and in demand of LR video may be able to decodethe enhancement layer with HR. In this case, to achievebetter quality, it can downsample the reconstructed HRvideo into LR video as reconstructed LR video by DWToperation, but decoding the HR video may increaseso much computation complexity and memory spacethat the receiver in demand of LR video cannot afford.Therefore, we consider SVC+HM with two modes:high complexity (HC) and low complexity (LC). Inthe HC mode, the receiver do decoding HR video anddownsampling operations; while in the LC mode, thereceiver decode base layer with LR as reconstructed LRvideo.

2) Typical analog scheme: Latest version of SoftCast [16]which uses the 3D-DCT. However, it does not have thecapability of multiresolution broadcast/multicast. There-fore, we use SoftCast to transmit HR video, and achievethe LR video by downsampling the reconstructed HRvideo by DWT operation.

3) Up-to-date video delivery scheme: Latest analog versionof DCast [31] which adopts coset coding, motion esti-mation and power distortion optimization modules.

Page 9: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

9

All schemes are implemented using MATLAB’s communi-cations toolbox. All convolutional codes used in our experi-ments are taken from [33], which have the best error correctioncapability, and all decoders of convolutional codes apply 3-bitssoft-decision decoding.

In our experiments, we extract luminance component togenerate monochrome video sequences for test. Some repre-sentative reference video sequences [44] with resolution CIF(352x288), frame rate 30 fps are used as HR video to conductall the experiments, while the LL band of their spatial 2D-DWT with one level are used as LR video source. Obviously,the LR video is with QCIF (176x144) resolution. We use themto test the performances of all the schemes in two situations:individual video quality and overall video quality of stationaryreceivers in multicast, and individual video quality of mobilereceivers in multicast. We test the performances of all theschemes over additive white Gaussian noise (AWGN) channeland set the target SNR range to 0∼25 dB. In addition, inorder to show the performance degradation outside of targetSNR range, we set the measurement of SNR with a span from-1 to 25 dB. All the schemes send the same power, and usethe same wireless bandwidth of 1.15 MHz, which guaranteesthat nearly 3/4 of the coding coefficients can be sent out byanalog coding part.

For the baselines, we used reference implementation avail-able online. Specifically, we generate the H.264/SVC streamusing the JSVM implementation [45], which allows us tocontrol the number of layers, and we also generate H.264/AVCstreams by using the JSVM to encode the video with singlelayer. We use Open SVC Decoder [46] with error concealmentto decode SVC streams. The error concealment is realizedin this way: when bit error is detected in enhancementlayers, each pixel value of the concealed frame is copiedfrom the corresponding pixel of the corresponding decodedframe in lower layer. In order to achieve the best perfor-mance for H.264/AVC or SVC, both WSVC’s digital partand SVC+HM apply hierarchical B coding structure, withIntraPeriod (which represents the period of I frames) being32 and NumberBFrames (which represents the number of Bcoded frames inserted between adjacent I or P frames) being 7.In addition, for fair comparison, WSVC’s analog part, SoftCastand DCast all set GOP size to 32.

In the implementation of our WSVC, the size of each PAUin WSVC is set to 44× 36. This means each GOP is dividedinto 4096 PAUs. We apply rate 1/8 convolutional codes inWSVC. In order to guarantee PT

E is less than 6.14 × 10−7,we set γ0

(PTE

)= −1 dB and γm = 0 dB in expression (14)

(this set of parameters obtained from our test). This guaranteesthat for the all test sequences (with no more than 300 frames),the average bit error number of its bitstream encoded byH.264/AVC is less than one. Moreover, the higher the channelSNR is, the smaller the BER is.

We compare the schemes in terms of the Peak Signal-to-Noise Ratio (PSNR) [43] and subjective quality. The PSNRis a standard objective measure of video/image quality and isdefined as a function of the mean squared error (MSE) betweenall pixels of the decoded video and the original version as

follow:

PSNR = 10log102M − 1

MSE, (24)

where M is the number of bits used to encode luminance pixelfor original (uncompressed) video, typically 8 bits.

In addition, since we use 2D-DWT shown in Table I toproduce LR video source, which is not designed to maximizecoding efficiency for H.264/SVC, it may occur that for thevideo coded by H.264/SVC with spatial scalability, the re-sulting LR video by downsampling the enhancement layer isnot better than the base layer. Therefore, for SVC+HM withHC mode, by comparing the quality of the downsampled videofrom reconstructed enhancement layer with reconstructed baselayer, we adaptively choose the better one as reconstructed LRvideo.

B. Spatial and Temporal Scalability

Our WSVC is able to provide any number of different res-olutions by applying multiple levels of wavelet decompositionin the spatial 2D-DWT part, and SVC+HM is also able toprovide any number of different resolutions by encoding thevideo with multiple levels of spatial scalability. However, theoriginal version of SoftCast has no the Spatial and TemporalScalability.

C. Multicast Efficiency with Spatial Scalability

For the evaluation in term of Multicast Efficiency withSpatial Scalability, we conduct two groups of experiments toevaluate its two performance indexes: individual quality andoverall quality.

1) Individual Quality Evaluation :Method: For individual video quality comparison, we canevaluate it on one stationary receiver. Therefore, we run agroup of simple unicast experiments with stationary receiver,whose SNR is fixed, for different schemes: SVC+HM, Soft-Cast, DCast and our WSVC. For each experiment, all schemesare tested at same SNR, and we conduct a group of suchexperiments under different SNR.

We run SVC+HM with different layer number and differentmodulations and convolutional code rates for each layer. For 2-layers SVC case, we encode the video with spatial scalabilityand different code rates: base layer with QPSK and code rate1/10, and enhancement layer with QPSK and code rate 1/8,1/4 and 1/2 respectively. For 3-layers SVC case, we encodethe video with spatial scalability and quality scalability: baselayer with LR transmitted with QPSK and code rate 1/10, firstenhancement layer with HR transmitted with QPSK and coderate 1/8, and second enhancement layer with HR transmittedwith QPSK and code rate 1/4.Results: The experimental results for performance comparisonof different schemes are listed in Table II, and the PSNRcurves of the reconstructed videos with different resolutionsof different schemes for the “Foreman” sequence are shown inFig. 4 and Fig. 5. In addition, the 28-th frame of reconstructed“Foreman” sequence by each scheme for HR video and LRvideo at SNR 0 dB and 7 dB are shown in Fig. 6- Fig. 9,respectively. Since we achieve the data points of DCast in Fig.

USTC
高亮
Page 10: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

10

(a)

(b)

Fig. 4. Individual multiresolution video PSNR comparison for differentschemes: (a) HR video PSNR comparison and (b) LR video PSNR com-parison, test sequence: “Foreman”.

Fig. 5. Individual HR video PSNR comparison for different schemes: WSVC,DCast [31] and SoftCast, channel bandwidth: 1.33 MHz, target channel SNRrange: 5∼25 dB, test sequence: “Foreman”.

5 from the Fig. 9 (a) of [31], in order to keep the experimentalcondition consistent for different schemes, for the Fig. 5 we setthe experimental condition to the same as the one for the Fig.9 (a) of [31]: channel bandwidth 1.33 MHz and target channelSNR range 5∼25 dB; while for other results, the experimentalcondition is still set to the one described in the subsection A.For this changed experimental condition, the parameters ofour WSVC are adjusted to the rate of convolutional codesbeing 1/6, γ0

(PTE

)= 2 dB and γm = 5 dB accordingly.

Besides, since DCast has no Spatial and Temporal Scalability,for multicast with spatial scalability, as shown in Fig.5, weonly compare the performance of DCast and WSVC for theHR video. From the table and figures, it can be concluded:

1) Table II shows that for the target SNR range (i.e., 0∼25dB), WSVC is average 0.60∼5.90 dB and 3.39∼9.97dB better than SoftCast for the HR videos and LRvideos respectively, as well as average 3.87∼9.13 dB

(a) SoftCast (b) SVC+HM

(c) WSVC before Denoising (d) WSVC after Denoising

(e) Original

Fig. 6. The 28-th frame of the HR video comparison for Individual QualityEvaluation with a stationary receiver at SNR 0 dB, test sequence: “Foreman”.

(a) SoftCast (b) SVC+HM

(c) WSVC before Denoising (d) WSVC after Denoising

Fig. 7. The 28-th frame of the LR video comparison for Individual QualityEvaluation with a stationary receiver at SNR 0 dB, test sequence: “Foreman”.

Page 11: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

11

TABLE IIPERFORMANCE COMPARISON OF DIFFERENT SCHEMES FOR WIRELESS VIDEO BROADCAST (HR: CIF, LR: QCIF; CHANNEL SNR AND VIDEO PSNR

BOTH IN DB)

SVC+HM

(QPSK1/10+QPSK1/8+

QPSK1/4, HC)

SoftCast WSVC WSVC vs.

SVC+HM

WSVC vs.

SoftCast

Video

Sequences

Channel

SNR

HR Video

PSNR

LR

Video

PSNR

HR

Video

PSNR

LR

Video

PSNR

HR

Video

PSNR

LR

Video

PSNR

HR

PSNR

Impr.

LR

PSNR

Impr.

HR

PSNR

Impr.

LR

PSNR

Impr.

Akiyo 0 35.11 49.31 32.95 33.75 42.54 49.26 7.43 -0.05 9.59 15.51

4 35.11 49.31 36.46 37.40 45.37 50.93 10.26 1.62 8.91 13.53

10 46.08 49.31 41.94 42.96 49.95 53.63 3.87 4.32 8.01 10.67

13 46.08 49.31 44.65 45.60 52.03 54.82 5.95 5.51 7.38 9.22

16 46.08 49.31 47.39 48.17 53.73 55.92 7.65 6.61 6.34 7.75

22 49.78 52.08 52.31 52.47 55.78 57.77 6.00 5.69 3.47 5.30

25 49.78 52.08 54.85 54.70 56.26 58.51 6.48 6.43 1.41 3.81

Hall 0 29.24 41.57 29.18 30.00 36.86 42.58 7.62 1.01 7.68 12.58

Monitor 4 29.24 41.57 32.66 33.61 39.46 44.49 10.22 2.92 6.80 10.88

10 38.93 41.62 38.14 39.25 43.90 48.43 4.97 6.81 5.76 9.18

13 38.93 41.62 40.92 42.07 46.18 50.49 7.25 8.87 5.26 8.42

16 38.93 41.62 43.71 44.83 48.27 52.34 9.34 10.72 4.56 7.51

22 40.95 43.77 48.92 49.76 51.24 55.47 10.29 11.70 2.32 5.71

25 40.95 43.77 51.17 51.79 52.05 56.96 11.10 13.19 0.88 5.17

Foreman 0 30.12 37.99 26.42 27.29 33.20 37.78 3.08 -0.21 6.78 10.49

4 30.12 37.99 29.67 30.76 35.20 39.73 5.08 1.74 5.53 8.97

10 36.83 40.30 34.96 36.35 39.17 43.97 2.34 3.67 4.21 7.62

13 36.83 40.30 37.68 39.18 41.29 46.37 4.46 6.07 3.61 7.19

16 36.83 40.30 40.39 42.01 43.19 48.69 6.36 8.39 2.80 6.68

22 40.17 44.07 45.34 47.33 45.83 52.44 5.66 8.37 0.49 5.11

25 40.17 44.07 47.32 49.62 46.54 53.85 6.37 9.78 -0.78 4.23

Paris 0 23.90 39.40 26.27 27.91 31.26 38.79 7.36 -0.61 4.99 10.88

4 23.90 39.40 29.50 31.39 34.09 41.00 10.19 1.60 4.59 9.61

10 34.35 39.40 34.70 36.92 39.00 45.26 4.65 5.86 4.30 8.34

13 34.35 39.40 37.35 39.76 41.64 47.54 7.29 8.14 4.29 7.78

16 34.35 39.40 39.88 42.57 44.30 49.67 9.95 10.27 4.42 7.10

22 40.31 44.83 44.12 47.83 49.17 52.96 8.86 8.13 5.05 5.13

25 40.31 44.83 45.56 50.04 51.11 54.12 10.80 9.29 5.55 4.08

Football 0 29.17 30.81 26.24 26.84 30.25 31.62 1.08 0.81 4.01 4.78

(b) 4 29.17 30.81 29.50 30.26 32.40 34.00 3.23 3.19 2.90 3.74

10 31.07 32.62 34.83 35.78 36.83 38.84 5.76 6.22 2.00 3.06

13 31.07 32.62 37.66 38.67 39.33 41.55 8.26 8.93 1.67 2.88

16 31.07 32.62 40.51 41.54 41.80 44.28 10.73 11.66 1.29 2.74

22 35.01 37.32 46.01 46.96 46.04 49.35 11.03 12.03 0.03 2.39

25 35.01 37.32 48.51 49.37 47.55 51.47 12.54 14.15 -0.96 2.10

Mobile 0 22.44 28.57 22.16 23.71 24.52 29.29 2.08 0.72 2.36 5.58

4 22.44 28.57 25.29 27.12 26.57 31.35 4.13 2.78 1.28 4.23

10 29.18 33.03 30.50 32.62 30.86 35.76 1.68 2.73 0.36 3.14

13 29.18 33.03 33.24 35.49 33.25 38.32 4.07 5.29 0.01 2.83

16 29.18 33.03 35.99 38.37 35.58 40.91 6.40 7.88 -0.41 2.54

22 33.07 37.51 41.03 43.93 39.36 45.64 6.29 8.13 -1.67 1.71

25 33.07 37.51 43.10 46.49 40.61 47.49 7.54 9.98 -2.49 1.00

Average 0 4.77 0.27 5.90 9.97

4 7.18 2.30 5.00 8.49

10 3.87 4.93 4.10 7.00

13 6.21 7.13 3.70 6.38

16 8.40 9.25 3.16 5.72

22 8.02 9.00 1.61 4.22

25 9.13 10.47 0.60 3.39

and 0.27∼10.47 dB better than the SVC+HM with 3-layers for the HR videos and LR videos respectively.For SVC+HM schemes with 2-layers as well as differentmodulations and channel code rates, the conclusionsimilar to that for SVC+HM with 3-layers can be drawnfrom the Fig. 4. In a word, WSVC has higher codingefficiency than SoftCast and SVC+HM;

2) The Fig. 4 shows that the PSNR of WSVC varies grace-fully with channel SNR varying, whereas the PSNRcurve of SVC+HM appears staircase effect; therefore,WSVC provides better fairness among all receivers andbetter Quality Scalability than SVC+HM (for some HRvideos and very high channel SNR, WSVC may be alittle worse than SoftCast. This is because WSVC adoptsDWT instead of DCT to provide Spatial and TemporalScalability, while unfortunately the decorrelation perfor-mance of DWT is inferior to the one of DCT);

3) The Fig. 5 shows that WSVC outperforms DCast about

0.2∼3.3 dB for the channel SNR range of 5∼25 dB andHR video2.

The Fig. 6- Fig. 9 show that compared with SVC+HM, thereconstructed videos by WSVC with denoising part are more“clearly”, especially for HR ones, and compared with SoftCast,they have less analog noise. Therefore, among these threeschemes, WSVC has the best visual quality, no matter forHR video or LR video, good channel or bad channel.

In addition, to show the effect of denoising on reconstructedvideo PSNR for WSVC, we test the reconstructed video PSNRvariation before denoising and after denoising. The results areshown in Table III. The design object of denoising part is toimprove the subjective visual quality of reconstructed video.However, due to a simple denoising algorithm being employed,it leads to a little PSNR loss. Anyway, in this paper, we are

2Since DCast adopts IPPP coding structure while both WSVC and SoftCastadopt 3D-DCT operations (which code one GOP as a whole), the codingstructural delay of DCast is much less than the one of WSVC and SoftCast.Therefore, it is somewhat unfair for DCast in this performance comparison.

Page 12: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

12

(a) SoftCast (b) SVC+HM

(c) WSVC before Denoising (d) WSVC after Denoising

(e) Original

Fig. 8. The 28-th frame of the HR video comparison for Individual QualityEvaluation with a stationary receiver at SNR 7 dB, test sequence: “Foreman”.

(a) SoftCast (b) SVC+HM

(c) WSVC before Denoising (d) WSVC after Denoising

Fig. 9. The 28-th frame of the LR video comparison for Individual QualityEvaluation with a stationary receiver at SNR 7 dB, test sequence: “Foreman”.

TABLE IIIEFFECT OF DENOISING ON RECONSTRUCTED VIDEO PSNR FOR WSVC(HR: CIF, LR: QCIF; CHANNEL SNR AND VIDEO PSNR BOTH IN DB)

WSVC before

Denoising

WSVC after

Denoising

PSNR

Difference

Channel

SNR

HR

Video

PSNR

LR

Video

PSNR

HR

Video

PSNR

LR

Video

PSNR

HR

PSNR

Diff.

LR

PSNR

Diff.

0 33.20 37.78 32.80 37.26 -0.40 -0.52

7 37.09 41.70 36.86 41.51 -0.23 -0.19

16 43.19 48.69 42.87 48.54 -0.32 -0.15

25 46.54 53.85 45.92 53.68 -0.62 -0.17

Average -0.39 -0.26

Fig. 10. Video PSNR comparison for multiresolution multicast with stationaryreceivers 1-6 with SNRs being 0 dB, 10 dB, and 22 dB, and in demand ofthe HR and LR videos, respectively, test sequence: “Foreman”.

focused on WSVC’s coding part other than WSVC’s denoisingpart, and we may employ some much more efficient denoisingalgorithms in the future scheme to improve the visual quality.

2) Overall Quality Evaluation :Method: In order to test how much overall video quality ofour WSVC is better than SVC+HM at worst case, we performan experiment with a single sender and six multicast receiverswith SNRs being 0 dB, 10 dB, and 22 dB (which are theturning points for the SVC+HM with 3 layers as shown in Fig.4), as well as in demands of the HR and LR videos respective-ly. We test SVC+HM (QPSK1/10+QPSK1/8+QPSK1/4, HC),SoftCast and our WSVC transmitting video to the multicastreceivers respectively.Results: For the “Foreman” sequence, the comparison ofPSNRs of different schemes is showed in Fig. 10. From thefigure, it can be concluded that our WSVC is absolutely betterthan other schemes. In addition, in this case, its mean PSNRof multicast group for HR video is 3.85 dB and 3.69 dB higherthan SoftCast and SVC+HM respectively, and its mean PSNRof multicast group for LR video is 7.78 dB and 3.94 dB higherthan SoftCast and SVC+HM respectively.

D. Quality Scalability

Method: For Quality Scalability comparison of all schemes,we can evaluate it on one mobile receiver. Therefore, werun a simple unicast experiment with mobile receiver, whoseSNR varies uniformly from 12 dB to 6 dB. For SVC+HM,we assume that the one with 2 layers (LR, HR), QP-SK1/10+QPSK1/8, is selected in advance.Results: Fig. 11 shows PSNR variation with frame number forthe “Foreman” sequence. From the figure, it can be concludedthat our WSVC provides the highest PSNR (except for a few

Page 13: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

13

(a)

(b)

Fig. 11. PSNR variation with frame number for mobility with a mobilereceiver whose SNR varies uniformly from 12 dB to 6 dB, test sequence:“Foreman”.

TABLE IVPERFORMANCE COMPARISON OF DIFFERENT SCHEMES FOR WIRELESS

VIDEO BROADCAST

Schemes

Spatial and

Temporal

Scalability

Multicast

Efficiency

Quality

Scalability

SVC+HM High Low Low

SoftCast Low Low High

WSVC High High High

frames at the locations of I frames of SVC+HM) and moregraceful variation with channel SNR (which verifies that it hasstrong adaptability to channel variation).

All the performance comparisons among all the schemesabove are summarized in Table IV.

VIII. CONCLUSION

In this paper, we propose a novel WSVC framework.Specifically, we present a HDA JSCC scheme that integratesthe advantages of digital coding and analog coding—highcoding efficiency and graceful quality variation with channelquality varying over time; moreover, the proposed scheme isable to broadcast one video with different resolutions to fit thedevices with different display resolutions. The performanceevaluation in term of Quality Scalability, Multicast Efficiencyand Spatial and Temporal Scalability shows that (1) our WSVChas strong Spatial and Temporal Scalability; (2) our WSVChas average 0.60∼5.90 dB and 3.39∼9.97 dB performancegain over SoftCast for broadcasting/multicasting the videoswith CIF and QCIF resolutions respectively; (3) our WSVChas average 3.87∼9.13 dB and 0.27∼10.47 dB performancegain over SVC+HM for broadcasting/multicasting the videos

with CIF and QCIF resolutions respectively; (4) in the scenarioof broadcasting/multicasting the videos with CIF and QCIFresolutions, our WSVC outperforms DCast about 0.2∼3.3 dBfor the video with CIF resolution; (5) our WSVC avoids thestaircase effect and realizes CQS on condition that the channelquality is within expected range. Therefore, WSVC is verysuitable for wireless video broadcast/multicast transmissionand mobile video applications.

REFERENCES

[1] Digital Video Broadcasting; Framing structure, channel coding andmodulation for digital terrestrial television, ETSI Standard EN 300 744,Jan. 2009, [online], Available: http://www.etsi.org/deliver/etsi en/300700300799/300744/01.06.01 60/en 300744v010601p.pdf.

[2] Digital Video Broadcasting; DVB-H Implementation Guidelines,ETSI Standard TR 102 377, Jun. 2009, [online], Available:http://www.etsi.org/deliver/etsi tr/102300 102399/102377/01.04.01 60/tr 102377v010401p.pdf.

[3] Digital Video Broadcasting; Framing Structure, channel coding andmodulation for Satellite Services to Handheld devices (SH) below 3GHz, ETSI Standard EN 302 583, Dec. 2011, [online], Available:http://www.etsi.org/deliver/etsi en/302500 302599/302583/01.02.01 60/en 302583v010201p.pdf.

[4] T. Kratochvıl, “Hierarchical modulation in DVB-T/H mobile TV trans-mission,” in Lecture Notes in Electrical Engineering, vol. 41, Multi-Carrier Systems & Solutions, S. Plass, A. Dammann, S. Kaiser, K. Fazel,Ed., Springer Netherlands, pp. 333-341, 2009.

[5] C. Hellge, S. Mirta, T. Schierl, and T. Wiegand, “Mobile TV with SVCand hierarchical modulation for DVB-H broadcast services,” in IEEEInt. Symp. Broadband Multimedia Syst. and Broadcast. (BMSB), Bilbao,Spain, pp. 1-5, May 2009.

[6] M. M. Ghandi and M. Ghanbari, “Layered H.264 video transmission withhierarchical QAM,” Elsevier Journal of Visual Communication and ImageRepresentation, vol. 17, no. 2, pp. 451-466, Apr. 2006.

[7] Z. Reznic, M. Feder, and S. Freundlich, Apparatus and method forapplying unequal error protection during wireless video delivery, USpatent, US80006168B2, Aug. 23, 2011.

[8] K. Ramchandran, A. Ortega, M. Uz, and M. Vetterli, “Multiresolutionbroadcast for digital HDTV using joint source/channel coding,” IEEE J.Sel. Areas Commun., vol. 11, pp. 6-23, Jan. 1993.

[9] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, ”Overview ofthe H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 13, no. 7, pp. 560-576, Jul. 2003.

[10] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable videocoding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst.Video Technol., vol. 17, no. 9, pp.1103-1120, Sep. 2007.

[11] W. Li, “Overview of fine granularity scalability in MPEG-4 videostandard,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 3, pp.301-317, Mar. 2001.

[12] Y. P. Fallah, H. Mansour, S. Khan, P. Nasiopoulos, and H. M. Alnuweiri,“A link adaptation scheme for efficient transmission of H.264 scalablevideo over multirate WLANs,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 7, pp. 875-887, Jul. 2008.

[13] X. Ji, J. Huang, M. Chiang, G. Lafruit, and F. Catthoor, “Schedulingand resource allocation for svc streaming over OFDM downlink systems,”IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 10, pp. 1549-1555,Oct. 2009.

[14] X. Zhu, T. Schierl, T. Wiegand, and B. Girod, “Distributed media-awarerate allocation for video multicast over wireless networks,” IEEE Trans.Circuits Syst. Video Technol., vol. 21, no. 9, pp. 1181-1192, Sep. 2011.

[15] S. Jakubczak, H. Rahul, and D. Katabi, “One-size-fits-all wirelessvideo,” in Proc. Eighth ACM SIGCOMM HotNets Workshop, New YorkCity, NY, Oct. 2009.

[16] S. Jakubczak and D. Katabi. “A cross-layer design for scalable mobilevideo,” in ACM Proc. of the 17th Annu. Int. Conf. Mobile Computing andNetworking, pp. 289-300, New York, NY, USA, 2011.

[17] S. Jakubczak, J. Sun, D. Katabi, and V. Goyal, “Performance regimesof uncoded linear communications over AWGN channels,” in 45th Annu.Conf. Inform. Sci. and Syst., Mar. 2011.

[18] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE,vol. 37, no. 1, pp. 10-21, Jan. 1949.

[19] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

Page 14: Wireless Scalable Video Coding Using a Hybrid Digital ...staff.ustc.edu.cn/~yulei/WSVC.pdf · coding scheme is based on non-scalable coding design, and this unavoidably leads to “cliff

14

[20] Z. Reznic, M. Feder, and R. Zamir, “Distortion bounds for broadcastingwith bandwidth expansion,” IEEE Trans. Inform. Theory, vol. 52, no.8,pp. 3778-3788, Aug. 2006.

[21] C. Tian, S. N. Diggavi, and S. Shamai, “Approximate characterizationsfor the Gaussian broadcasting distortion region,” in Proc. IEEE Int. Symp.Inform. Theory, Seoul, Korea, pp 2477-2482, Jul. 2009.

[22] C. Tian, J. Chen, S N. Diggavi and S. Shamai, “Optimality andapproximate optimality of source-channel separation in networks,” inProc. IEEE Int. Symp. Inform. Theory, Austin, Texas, Jun. 2010.

[23] M. Skoglund, N. Phamdo, and F. Alajaji, “Hybrid digital-analog source-channel coding for bandwidth compression/expansion,” IEEE Trans.Inform. Theory, vol. 52, no. 8, pp. 3757-3763, Aug. 2006.

[24] V. Prabhakaran, R. Puri, and K. Ramchandran, “Hybrid digital-analogcodes for source-channel broadcast of Gaussian sources over Gaussianchannels,” IEEE Trans. Inform. Theory, vol. 57, no. 7, pp.4573-4588,Aug. 2011.

[25] — —, “Colored gaussian source-channel broadcast for heterogeneous(analog/digital) receivers,” IEEE Trans. Inform. Theory, vol. 54, no. 4,pp.1807-1814, April 2008.

[26] Y. Wang, F. Alajaji, and T. Linder, “Hybrid digital-analog coding withbandwidth compression for Gaussian source-channel pairs,” IEEE Trans.Commun., vol. 57, no. 4, pp. 997-1012, Apr. 2009.

[27] — —, “Design of VQ-based hybrid digital-analog joint source-channelcodes for image communication,” in Proc. IEEE Data Compression Conf.,Snowbird, Utah, USA, pp. 193-202, Mar. 2005.

[28] I. Kozintsev and K. Ramchandran, “Hybrid compressed-uncompressedframework for wireless image transmission,” in Proc. IEEE Int. Conf.Communications, Montral, QC, Canada, pp. 77-80, Jun. 1997.

[29] X. Fan, R. Xiong, F. Wu, and D. Zhao, “WaveCast: Wavelet basedwireless video broadcast using lossy transmission,” IEEE Visual Com-munications and Image Processing, pp. 1-6, Nov. 2012.

[30] X. Fan, F. Wu, and D. Zhao, “D-Cast: DSC based soft mobile videobroadcast,” in ACM Int. Conf. Mobile and Ubiquitous Multimedia (MUM),Beijing, China, Dec. 2011.

[31] X. Fan, F. Wu, D. Zhao, and O.C. Au, “Distributed wireless visualcommunication with power distortion optimization,” IEEE Trans. CircuitsSyst. Video Technol., vol. 99, Feb. 2013.

[32] T. Kratochvıl and R. Stukavec. “DVB-T digital terrestrial televisiontransmission over fading channels,” Radioengineering, vol. 17, no. 3, Sep.2008.

[33] P. Frenger, P. Orten, and T. Ottosson, “Code-spread CDMA usingmaximum free distance low-rate convolutionalal codes,” IEEE Trans.Commun., vol. 48, no. 1, pp. 135-144, Jan. 2000.

[34] J. J. Rissanen and G. G. Langdon, “Arithmetic coding,” IBM Journal ofResearch and Development, vol. 23 no. 2, pp. 149-162, Mar. 1979.

[35] D. Marpe, H. Schwarz, and T. Wiegand, “Context-adaptive binaryarithmetic coding in the H.264/AVC video compression standard,” IEEETrans. Circuits Syst. Video Technol., vol. 13, pp. 620-636, July 2003.

[36] I. Witten, H. Neal, M. Rradford, and J.G. Cleary, “Arithmetic codingfor data compression,” Communications of the ACM, vol. 30, no. 6, pp.520-540. Jun. 1987.

[37] M. Antonini, M. Barlaud, P. Mathieu and I. Daubechies, “Image codingusing the wavelet transform,” IEEE Trans. Image Processing, vol. 1, no.2, pp. 205-220, Apr. 1992.

[38] M. Rabbani and R. Joshi, “An overview of the JPEG2000 still imagecompression standard,” Signal Process. Image Commun., vol. 17, no. 1,pp. 3-48, 2002.

[39] D. S. Taubman and M. W. Marcellin, JPEG2000: Image CompressionFundamentals, Standards and Practice, Kluwer Academic Publishers,Dordrecht, Jan. 2002.

[40] W. A. Chen, C. Harrison, and S. C. Fralick, “A fast computationalalgorithm for the discrete cosine transform,” IEEE Trans. Commun., vol.25, no. 9, pp. 1004-1011, Sep. 1977.

[41] S. G. Mallat, A Wavelet Tour of Signal Processing, Academic Press,1999.

[42] L. Guo, Oscar C. Au, M. Ma, and Peter H. Wong, “Video decoderembedded with temporal lmmse denoising filter,” IEEE Int. Symp. Circuitsand Syst., 2008.

[43] D. Salomon. Guide to Data Compression Methods. Springer, 2002.[44] Xiph.org Media, [online], Available: http://media.xiph.org/video/derf/.[45] SVC Reference Software, [online], Available:

http://www.hhi.fraunhofer.de/de/kompetenzfelder/image-processing/research-groups/image-video-coding/svc-extension-of-h264avc/jsvm-reference-software.html.

[46] Open SVC Decoder, [online], Available:http://sourceforge.net/projects/opensvcdecoder/.

Lei Yu received the B.S. degree from the Universityof Science and Technology of China, Hefei, China,in 2010. He is now pursuing the Ph.D. degree at theUniversity of Science and Technology of China.

His research interests include image/video codingand transmission, and theory and practice of jointsource-channel coding (JSCC).

Houqiang Li (S’12) received the B.S., M.Eng.,and Ph.D. degree from University of Science andTechnology of China (USTC) in 1992, 1997, and2000, respectively, all in electronic engineering.

He is currently a professor at the Departmentof Electronic Engineering and Information Science(EEIS), USTC. His research interests include videocoding and communication, multimedia search, im-age/video analysis, etc. He has authored or co-authored over 100 papers in journals and confer-ences. He is an Associate Editor of IEEE Transac-

tions on Circuits and Systems for Video Technology and in the Editorial Boardof Journal of Multimedia. He has served on technical/program committees,organizing committees, and as program cochair, track/session chair for over10 international conferences. He was the recipient of the Best Paper Awardfor Visual Communications and Image Processing (VCIP) in 2012, therecipient of the Best Paper Award for International Conference on InternetMultimedia Computing and Service (ICIMCS) in 2012, the recipient of theBest Paper Award for the International Conference on Mobile and UbiquitousMultimedia from ACM (ACM MUM) in 2011, and a senior author of the BestStudent Paper of the 5th International Mobile Multimedia CommunicationsConference (MobiMedia) in 2009.

Weiping Li (F’00) received his B.S. degree fromUniversity of Science and Technology of China(USTC) in 1982, and his M.S. and Ph.D. degreesfrom Stanford University in 1983 and 1988 respec-tively, all in electrical engineering.

In 1987, he joined the Faculty of Lehigh Uni-versity as an Assistant Professor in the Departmentof Electrical Engineering and Computer Science.In 1993, he was promoted to Associate Professorwith Tenure. In 1998, he was promoted to FullProfessor. From 1998 to 2010, he worked in several

high-tech companies in the Silicon Valley with technical and managementresponsibilities. In March of 2010, he was appointed to the position of theDean for School of Information Science and Technology in USTC.

Dr. Li has been elected to Fellow of IEEE for contributions to imageand video coding algorithms, standards, and implementations. He servedas the Editor-in-Chief of IEEE Transactions on Circuits and Systems forVideo Technology. He served as a Guest Editor for a special issue of IEEEProceedings. He served as the Chair of several Technical Committees inIEEE Circuits and Systems Society and IEEE International Conferences.He served as the Chair of Best Student Paper Award Committee for SPIEVisual Communications and Image Processing Conference. He has made manycontributions to International Standards. His inventions on Fine GranularityScalable Video Coding and Shape Adaptive Wavelet Coding have beenincluded into the MPEG-4 International Standard. He served as a memberof MPEG (Moving Picture Experts Group) of ISO (International StandardOrganization) and an Editor of MPEG-4 International Standard. He served asa founding member of the Board of Directors of MPEG-4 Industry Forum.As a technical advisor, he also made contributions to the Chinese AudioVideo coding Standard (AVS) and its applications. He received Certificate ofAppreciation from ISO/IEC as a Project Editor in development of InternationalStandard in 2004, the Spira Award for Excellence in Teaching in 1992 atLehigh University, and the first Guo Mo-Ruo Prize for Outstanding Studentin 1980 at USTC.