ieee transactions on information theory, …aguillen/home_upf/publications_files/multiclass... ·...

12
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016 5093 Multi-Class Source-Channel Coding Irina E. Bocharova, Senior Member, IEEE , Albert Guillén i Fàbregas, Senior Member, IEEE , Boris D. Kudryashov, Senior Member, IEEE , Alfonso Martinez, Senior Member, IEEE , Adrià Tauste Campo, Member, IEEE, and Gonzalo Vazquez-Vilar, Member, IEEE Abstract— This paper studies an almost-lossless source-channel coding scheme in which source messages are assigned to different classes and encoded with a channel code that depends on the class index. The code performance is analyzed by means of random-coding error exponents and validated by simulation of a low-complexity implementation using existing source and channel codes. While each class code can be seen as a concatenation of a source code and a channel code, the overall performance improves on that of separate source-channel coding and approaches that of joint source-channel coding when the number of classes increases. Index Terms— Source-channel coding, error exponent, unequal error protection, UEP, LDPC. I. I NTRODUCTION R ELIABLE transmission of a source through a com- munication channel can be achieved by using sep- arate source and channel codes, as shown by Shannon’s source-channel coding theorem [1]. This means that a concatenation of a (channel-independent) source code fol- lowed by a (source-independent) channel code achieves Manuscript received October 31, 2014; revised September 20, 2015 and March 18, 2016; accepted June 20, 2016. Date of publication July 7, 2016; date of current version August 16, 2016. This work was supported in part by the European Research Council under Grant 259663, in part by the Spanish Ministry of Economy and Competitiveness under Grant FPDI-2013-18602, Grant RYC-2011-08150, Grant TEC2012-38800-C03-03, and Grant TEC2013-41718-R, and in part by the European Union’s 7th Framework Programme under Grant 303633 and Grant 329837. This work was presented at the 2014 IEEE Symposium on Information Theory, Honolulu, HI, USA, June 29–July 4, 2014, and the 8th International Symposium on Turbo Codes and Iterative Information Processing, Bremen, Germany, Aug. 18–22, 2014. I. E. Bocharova and B. D. Kudryashov are with the St. Petersburg University of Information Technologies, Mechanics and Optics, Saint Petersburg 197101, Russia (e-mail: [email protected]; [email protected]). A. Guillén i Fàbregas is with the Department of Information and Commu- nication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain, also with the Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain, and also with the Department of Engineering, Uni- versity of Cambridge, Cambridge CB2 1PZ, U.K. (e-mail: [email protected]). A. Martinez is with the Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain (e-mail: [email protected]). A. Tauste Campo is with the Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain, and also with the Hospital del Mar Medical Research Institute, Barcelona 08003, Spain (e-mail: [email protected]). G. Vazquez-Vilar was with the Department of Information and Commu- nication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain. He is now with the Signal Theory and Communications Department, Uni- versidad Carlos III de Madrid, Leganés 28911, Spain, and also with the Gregorio Marañón Health Research Institute, Madrid 28007, Spain (e-mail: [email protected]). Communicated by E. Tuncel, Associate Editor for Source Coding. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2016.2588496 vanishing error probability as the block length goes to infinity, as long as the source entropy is smaller than the channel capacity [1]. However, in the non-asymptotic regime joint source-channel codes can perform strictly bet- ter. This improvement (i.e. reduction in error probability) has been quantified in terms of error exponents [2], [3] and in terms of source and channel dispersion [4], [5]. Joint design has an error exponent at most twice of that of separate codes [6], and a dispersion gain that depends on the target error probability; for vanishing values of the latter, the dispersion of joint design is at best half of the dispersion of separate design [5]. This potential gain justifies the interest in practical finite-length joint source-channel codes. Several practical joint source-channel coding schemes have been considered in the past. One possible approach is to adapt existing channel coding techniques to exploit the knowledge on the source statistics at the decoder side. Examples include a modification of the Viterbi decoding algorithm to use the a priori probabilities of the source bits [7], punctured turbo- codes with a modified iterative decoder [8], and source and channel LDPC codes with a decoder exploiting the joint graph structure of the codes and the source [9]. Other schemes exploit the source statistics both at the encoder and decoder. In [10], source bits are matched to a non-systematic LDPC code via scrambling or splitting. In [11]–[13] the authors propose a trellis-structure description of the Huffman code and an appropriate channel code so that joint decoding is possible. This technique has been extended to arithmetic [14] and Lempel-Ziv source coding [15]. These source-channel coding schemes share the underlying idea of approximating the (optimum) maximum a posteriori (MAP) decoder by using certain properties of the source statistics. In this paper, we analyze an almost-lossless source-channel coding scheme in which source messages are assigned to disjoint classes and encoded by codes that depend on the class index. Under MAP decoding, this scheme attains the joint source-channel reliability function in the cases where it is known to be tight [16]. We are interested in characterizing the performance of this coding scheme under simpler, sub-optimal decoding. First, we process the channel output in parallel for each class using a bank of maximum likelihood (ML) decoders. Then, the decoded message is selected from the outputs of the ML decoders based on a MAP criterion. While this construction fails to achieve the best performance of joint source-channel coding, it presents a smaller complexity for a fixed number of classes. This scheme is shown to improve on the error exponent of separate coding, and, as the number 0018-9448 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: hoangnhan

Post on 27-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016 5093

Multi-Class Source-Channel CodingIrina E. Bocharova, Senior Member, IEEE, Albert Guillén i Fàbregas, Senior Member, IEEE,

Boris D. Kudryashov, Senior Member, IEEE, Alfonso Martinez, Senior Member, IEEE,Adrià Tauste Campo, Member, IEEE, and Gonzalo Vazquez-Vilar, Member, IEEE

Abstract— This paper studies an almost-losslesssource-channel coding scheme in which source messagesare assigned to different classes and encoded with a channelcode that depends on the class index. The code performanceis analyzed by means of random-coding error exponents andvalidated by simulation of a low-complexity implementationusing existing source and channel codes. While each classcode can be seen as a concatenation of a source code and achannel code, the overall performance improves on that ofseparate source-channel coding and approaches that of jointsource-channel coding when the number of classes increases.

Index Terms— Source-channel coding, error exponent, unequalerror protection, UEP, LDPC.

I. INTRODUCTION

RELIABLE transmission of a source through a com-munication channel can be achieved by using sep-

arate source and channel codes, as shown by Shannon’ssource-channel coding theorem [1]. This means that aconcatenation of a (channel-independent) source code fol-lowed by a (source-independent) channel code achieves

Manuscript received October 31, 2014; revised September 20, 2015 andMarch 18, 2016; accepted June 20, 2016. Date of publication July 7, 2016;date of current version August 16, 2016. This work was supported in part bythe European Research Council under Grant 259663, in part by the SpanishMinistry of Economy and Competitiveness under Grant FPDI-2013-18602,Grant RYC-2011-08150, Grant TEC2012-38800-C03-03, andGrant TEC2013-41718-R, and in part by the European Union’s 7th FrameworkProgramme under Grant 303633 and Grant 329837. This work was presentedat the 2014 IEEE Symposium on Information Theory, Honolulu, HI, USA,June 29–July 4, 2014, and the 8th International Symposium on Turbo Codesand Iterative Information Processing, Bremen, Germany, Aug. 18–22, 2014.

I. E. Bocharova and B. D. Kudryashov are with the St. Petersburg Universityof Information Technologies, Mechanics and Optics, Saint Petersburg 197101,Russia (e-mail: [email protected]; [email protected]).

A. Guillén i Fàbregas is with the Department of Information and Commu-nication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain,also with the Institució Catalana de Recerca i Estudis Avançats (ICREA),Barcelona 08010, Spain, and also with the Department of Engineering, Uni-versity of Cambridge, Cambridge CB2 1PZ, U.K. (e-mail: [email protected]).

A. Martinez is with the Department of Information and CommunicationTechnologies, Universitat Pompeu Fabra, Barcelona 08018, Spain (e-mail:[email protected]).

A. Tauste Campo is with the Department of Information and CommunicationTechnologies, Universitat Pompeu Fabra, Barcelona 08018, Spain, and alsowith the Hospital del Mar Medical Research Institute, Barcelona 08003, Spain(e-mail: [email protected]).

G. Vazquez-Vilar was with the Department of Information and Commu-nication Technologies, Universitat Pompeu Fabra, Barcelona 08018, Spain.He is now with the Signal Theory and Communications Department, Uni-versidad Carlos III de Madrid, Leganés 28911, Spain, and also with theGregorio Marañón Health Research Institute, Madrid 28007, Spain (e-mail:[email protected]).

Communicated by E. Tuncel, Associate Editor for Source Coding.Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2016.2588496

vanishing error probability as the block length goes toinfinity, as long as the source entropy is smaller thanthe channel capacity [1]. However, in the non-asymptoticregime joint source-channel codes can perform strictly bet-ter. This improvement (i.e. reduction in error probability)has been quantified in terms of error exponents [2], [3]and in terms of source and channel dispersion [4], [5].Joint design has an error exponent at most twice of that ofseparate codes [6], and a dispersion gain that depends onthe target error probability; for vanishing values of the latter,the dispersion of joint design is at best half of the dispersionof separate design [5]. This potential gain justifies the interestin practical finite-length joint source-channel codes.

Several practical joint source-channel coding schemes havebeen considered in the past. One possible approach is to adaptexisting channel coding techniques to exploit the knowledgeon the source statistics at the decoder side. Examples includea modification of the Viterbi decoding algorithm to use thea priori probabilities of the source bits [7], punctured turbo-codes with a modified iterative decoder [8], and source andchannel LDPC codes with a decoder exploiting the joint graphstructure of the codes and the source [9]. Other schemesexploit the source statistics both at the encoder and decoder.In [10], source bits are matched to a non-systematic LDPCcode via scrambling or splitting. In [11]–[13] the authorspropose a trellis-structure description of the Huffman codeand an appropriate channel code so that joint decoding ispossible. This technique has been extended to arithmetic [14]and Lempel-Ziv source coding [15]. These source-channelcoding schemes share the underlying idea of approximatingthe (optimum) maximum a posteriori (MAP) decoder by usingcertain properties of the source statistics.

In this paper, we analyze an almost-lossless source-channelcoding scheme in which source messages are assigned todisjoint classes and encoded by codes that depend on theclass index. Under MAP decoding, this scheme attains thejoint source-channel reliability function in the cases where it isknown to be tight [16]. We are interested in characterizing theperformance of this coding scheme under simpler, sub-optimaldecoding. First, we process the channel output in parallelfor each class using a bank of maximum likelihood (ML)decoders. Then, the decoded message is selected from theoutputs of the ML decoders based on a MAP criterion. Whilethis construction fails to achieve the best performance of jointsource-channel coding, it presents a smaller complexity for afixed number of classes. This scheme is shown to improveon the error exponent of separate coding, and, as the number

0018-9448 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

5094 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

Fig. 1. Block diagram of the multi-class source-channel coding scheme. Class #0 is reserved for a declared error, and it is not decoded at the receiver.

of classes increases, to approach the error exponent of jointsource-channel coding [3].

The proposed coding scheme can be interpreted as basedon unequal error protection (UEP). The most probable mes-sages are encoded with low-rate channel codes, and hencethey receive an increased protection against channel errors.Analogously, less probable messages are assigned to classesthat receive less protection against channel errors. UEP canalso be implemented via an alternative coding scheme in whichcodewords are divided in two parts: a prefix that identifieswhich class the message belongs to, and a payload whichencodes the message within the class. When the number ofclasses grows sub-exponentially with the block-length, the pre-fix encodes an information source of effective zero rate. Evenin this case, for channels with no zero-error capacity, the prefixlength is required to grow linearly with the code length.Therefore, this prefix scheme incurs a loss in exponent (seediscussion after [3, Lemma 2]) and a loss in finite-lengthperformance [17, Sec. II.C].

Shkel, Tan and Stark also proposed an alternative UEPcoding scheme in [17]. In their scheme, codewords belongingto different classes generally have different minimum distance,hence UEP is guaranteed via code construction. In contrast,in our scheme we do not require an UEP codebook. Codesfor different classes are selected and optimized independently.Instead, UEP is achieved at the decoding stage by givingpriority to some classes over the others. As a result, the source-channel code proposed here can be designed and implementedwith reduced complexity using existing source and channelcodes, as shown with several examples.

The structure of the paper is as follows. In Section IIthe system model and our multi-class source channel codingscheme are introduced. Section III presents a random-codinganalysis of this scheme. Section IV validates these results bymeans of simulation of a reduced complexity implementationbased on LDPC codes, and Section V concludes the paper.

II. SYSTEM MODEL AND CODING SCHEME

We consider the transmission of a length-k discretememoryless source over a memoryless channel usinglength-n block codes. We define t � k

n . The sourceoutput v = (v1, . . . , vk) ∈ Vk , where V is a discretealphabet, is distributed according to Pk(v) = ∏k

i=1 P(vi ),v = (v1, . . . , vk) ∈ Vk , where P(v) is the source symboldistribution. Without loss of generality, we assume thatP(v) > 0 for all v; if P(v) = 0 for some v, we define anew source without this symbol. The channel input x =(x1, . . . , xn) ∈ X n and output y = (y1, . . . , yn) ∈ Yn , whereX and Y respectively denote the input and output alphabet,are related via a channel law W n(y|x) = ∏n

i=1 W (yi |xi),where W (y|x) denotes the channel transition probability.For the sake of clarity, in the following, we consider

discrete channels. The analysis carries over to the case ofcontinuous output alphabets, replacing the correspondingsums by integrals.

A source-channel code is defined by an encoder and adecoder. The encoder maps the message v to a length-ncodeword x(v). Based on the channel output y, the decoderselects a message v(y). When clear from context, we avoidwriting the dependence of the decoder output on the channeloutput explicitly. Throughout the paper, random variables willbe denoted by capital letters and the specific values theytake on are denoted by the corresponding lower case letters.The error probability of a source-channel code is thus given by

εn = Pr{

V �= V}. (1)

We characterize this probability in terms of error exponents.An exponent E(P, W, t) > 0 is to said to be achievable if thereexists a sequence of codes with n = 1, 2, . . ., and k = 1, 2, . . .,whose error probabilities εn satisfy

εn ≤ e−nE(P,W,t)+o(n), (2)

where o(n) is a sequence such that limn→∞ o(n)n = 0.

The supremum of all achievable exponents E(P, W, t) isusually referred to as reliability function.

Our coding scheme splits the source-message set in subsets,and use concatenated source and channel codes for eachsubset. At the receiver, each channel code is decoded inparallel, and the final output is selected based on the MAPcriterion. A block diagram of this scheme is shown in Fig. 1.

For each k, we define a partition Pk of the source-messageset Vk into Nk + 1 disjoint subsets Ak

i , i = 0, 1, . . . , Nk .We shall refer to these subsets as classes. Sometimes, we con-sider sequences of sources, channels and partitions where Nk

grows with k. The asymptotic number of classes as k → ∞is N � limk→∞ Nk , hence N ∈ N ∪ {∞}. More specifically,we consider partitions in which source messages are assignedto classes depending on their probability,

Aki =

{v∣∣ γ k

i < Pk(v) ≤ γ ki+1

}, i = 0, . . . , Nk , (3)

with 0 = γ k0 ≤ γ k

1 ≤ . . . ≤ γ kNk+1 = 1. Since the sets Ak

i areunions of type classes, Nk grows (at most) subexponentiallyin k. We define the rate of each class as

Ri � 1

nlog∣∣Ak

i

∣∣, i = 0, . . . , Nk . (4)

All the messages in the class Ak0 are encoded with the same

codeword x(v) = x0 and are assumed to lead to a decodingerror. For each remaining class Ak

i , messages are encodedwith a channel code Ci of rate Ri . At the receiver, we use atwo-step decoder (see Fig. 1). For each classAk

i , i = 1, . . . , Nk , the i -th ML decoder selects

BOCHAROVA et al.: MULTI-CLASS SOURCE-CHANNEL CODING 5095

a message vi in Aki as

vi = arg maxv∈Ak

i

W n(y|x(v)). (5)

Next, the decoder selects from the set {vi }Nki=1, the source

message with largest MAP decoding metric. That is, the finaloutput is v = v ı , where the class index selected by the MAPdecoder corresponds to

ı = arg maxi=1,...,Nk

q(vi , y), (6)

where q(v, y) � Pk(v)W n(y|x(v)

).

III. ERROR EXPONENT ANALYSIS

To analyze the random-coding error exponent of the schemedescribed in Section II, we define three different error events.The first occurs when a source message belongs to the setAk

0 (source error); the second occurs when, for a sourcemessage belonging to class Ak

i , the i -th ML decoder makesan error (ML error); and the third occurs when the i -th MLdecoder output is correct but the MAP decoder makes anerror (MAP error). More precisely, these three error eventsare defined respectively as

ES �{v ∈ Ak

0

}, (7)

EML(i) �{v ∈ Ak

i , vi �= v}, (8)

EMAP(i) �{v ∈ Ak

i , vi = v, ı �= i}. (9)

Using that these error events are disjoint, we write the errorprobability as

εn = Pr

⎧⎨

⎩ES ∪

⎝Nk⋃

i=1

EML(i)

⎠ ∪⎛

⎝Nk⋃

i=1

EMAP(i)

⎫⎬

⎭(10)

= Pr{

V ∈ Ak0

}+Nk∑

i=1

Pr{

V ∈ Aki , V i �= V

}

+Nk∑

i=1

Pr{

V ∈ Aki , V i = V , I �= i

}. (11)

To lower-bound the error exponent, we start by upper-bounding every term in the third summand in (11) as

Pr{

V ∈ Aki , V i = V , I �= i

}

= Pr{

I �= i∣∣ V ∈ Ak

i , V i = V}

Pr{

V ∈ Aki , V i = V

}

(12)

≤ Pr{

I �= i∣∣ V ∈ Ak

i , V i = V}

Pr{

V ∈ Aki

}, (13)

where (12) follows from the chain rule, and (13) by upper-bounding Pr

{V ∈ Ak

i , V i = V}

by Pr{

V ∈ Aki

}. Using the

MAP decoding rule in (6), the first factor in the right-handside of (13) can be upper-bounded as

Pr{

I �= i∣∣ V ∈ Ak

i , V i = V}

≤ Pr

{

q(V i , Y ) ≤ maxj=1,...,Nk , j �=i

q(V j , Y )

∣∣∣ V ∈ Ak

i , V i = V}

(14)

≤ Pr

{

q(V , Y) ≤ maxv �=V ,v /∈Ak

0

q(v, Y )∣∣∣ V ∈ Ak

i

}

, (15)

where (14) follows from (6) by assuming that ties are decodedas errors, and (15) follows by applying the condition V i = Vand by enlarging the set of source messages over which themaximum is computed.

Substituting (13) and (15) in (11), via the chain rule, yields

εn ≤ Pr{

V ∈ Ak0

}+Nk∑

i=1

Pr{

V ∈ Aki , V i �= V

}

+ Pr

{

V /∈ Ak0, q(V , Y ) ≤ max

v �=V ,v /∈Ak0

q(v, Y)

}

. (16)

In the following, we find useful to define the channel codingand source coding exponents. For ρ ≥ 0 and Q an arbitrarydistribution over X let the Gallager’s channel and sourcefunctions be given by

E0(ρ, Q) � − log∑

y

(∑

x

Q(x)W (y|x)1

1+ρ

)1+ρ

, (17)

and

Es(ρ) � log

(∑

v

P(v)1

1+ρ

)1+ρ

, (18)

respectively. For channel coding alone, the random-codingexponent at rate R for an input distribution Q is achievableand it is given by [2]

Er(R, Q) = maxρ∈[0,1]

{E0(ρ, Q) − ρR

}. (19)

For source coding alone, the reliability function of a source Pat rate R, denoted by e(R), is given by [18]

e(R) = supρ≥0

{ρR − Es(ρ)

}. (20)

We upper-bound (16) via a random-coding argument. Forevery k, n, we assign a distribution Qi (x) to each classAk

i , i = 0, . . . , Nk , and randomly generate a codewordx(v) according to Qn

i (x) �∏n

j=1 Qi (x j ) for each sourcemessage v ∈ Ak

i and each i = 1, . . . , Nk . For theclass Ak

0, we select a symbol distribution Q0 that assignsmass 1 to a predetermined null symbol. Then, its Gallagerfunction satisfies that E0(ρ0, Q0) = 0 for any ρ0 ∈ [0, 1].We also define RN+1 � 0 such that e

(RN+1

t

)= 0.

The next result follows from (16) using the exponentialbounds [2, Th. 5.6.1], [16, Th. 1], [18, Th. 5.2].

Theorem 1: There exists a sequence of codes, partitions anddecoders as defined in Section II that achieves the exponent

mini=0,...,N

{

Er (Ri , Qi ) + te

(Ri+1

t

)}

, (21)

where N = limk→∞ Nk . Furthermore, for the set of rates {Ri }maximizing (21) there exists a one-to-one relationship betweeneach Ri and the corresponding threshold γi (see Lemma 2 inAppendix I).

Proof: See Appendix I.Under certain assumptions the lower bound in Theorem 1

coincides with an upper bound to the error exponent derivedin [19, Th. 2] for the family of codes described in Section II.This is the case for a given class of channels (such as the

5096 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

binary symmetric channel, binary erasure channel or phase-shift-keying modulated additive white Gaussian noise chan-nel (AWGN)), when the intermediate rates optimizing (21) areabove the critical rate of the channel and the codes C1, . . . , CNk

are linear. While this converse result only applies to a classof codes and channels, it shows that in these cases there is noloss in exponent by considering the bound in Theorem 1.

Further analysis involves optimization over rates Ri

(i.e., thresholds γi ) and distributions Qi , i = 1, . . . , N .The bound in Theorem 1 can be relaxed to obtain an alternativeexpression. We define E0(ρ) � maxQ E0(ρ, Q).

Theorem 2: There exists a sequence of codes, partitions anddecoders defined in Section II with N ≥ 2 that achieves theexponent

maxR′≥R≥0

min

{

maxρ≥0

{ρR′ − t Es(ρ)

},

maxρ∈[0,1]

{

E0(ρ) − t Es(ρ) − ρR′ − R

N − 1

}

,

max¯ρ∈[0,1]{

E0( ¯ρ) − ¯ρR}}

, (22)

Moreover, the rate of the i -th class in the partition is

Ri = R + (i − 1)R′ − R

N − 1, i = 1, . . . , N, (23)

where R and R′ are the values optimizing (22).Proof: See Appendix II.

The bound in Theorem 2 is simple to evaluate since it onlyinvolves the well known functions Es(·) and E0(·), and theoptimization is performed over a fixed number of parame-ters (ρ, ρ, ¯ρ, R and R′), independent of N . Furthermore, as weverify next with an example, it is sometimes indistinguishablefrom the bound in Theorem 1.

For N = 1, we have that Er(R0, Q0) = 0 and e(R2) = 0.Optimizing (21) over intermediate rate R = R1 and distri-bution Q1, Theorem 1 recovers the separate source-channelexponent [3],

maxR≥0

min

{

Er (R), te

(R

t

)}

, (24)

where Er(R) � maxQ Er(R, Q).Let Nk grow (subexponentially) with k in such a way that

limk→∞ Nk = ∞. For this discussion only, we allow Rand R′ to depend on k as Rk and R′

k , respectively. Let uschoose the sequences Rk and R′

k such that limk→∞ Rk = 0,

limk→∞ R′k = ∞ and limk→∞

R′k−Rk

Nk−1 = 0, i.e., R′k = o(Nk).

In this case, the first and last terms within the minimiza-tion in (22) become irrelevant and the bound in Theorem 2recovers Gallager’s source-channel error exponent [2, p. 534,Prob. 5.16],

maxρ∈[0,1]

{E0(ρ) − t Es(ρ)

}. (25)

In several cases of interest, the exponent (25) coincideswith the joint source-channel reliability function. However, forspecific source and channel pairs the following exponent gives

Fig. 2. Error exponent bounds. BMS with P(1) = 0.1 transmitted over abinary input AWGN channel for t = 1.

a tighter bound to the reliability function [3], [6],

minR≥0

{

Er (R) + te

(R

t

)}

= maxρ∈[0,1]

{E0(ρ) − t Es(ρ)

}, (26)

where E0(ρ) denotes the concave hull of E0(ρ), definedpointwise as the supremum over convex combinations of anytwo values of the function E0(ρ) [20, p. 36]. While thebound in Theorem 2 does not attain (26), this error exponentcan be recovered from Theorem 1 by identifying the classeswith the source-type classes Pi , i = 1, . . . , Nk . In this case,Ri = t H (Pi ) and Ri+1 = t H (Pi+1) become infinitely close toeach other and they uniformly cover the interval

[0, t log(|V|)]

for i = 1, 2, . . .. As a result, (21) recovers the left-handside of (26). This shows that the gap between the bounds inTheorems 1 and 2 can be strictly positive.

A. Example

A binary memoryless source (BMS) with parameter p �P(1) ≤ 1/2 is to be transmitted over a binary-input AWGNchannel with signal-to-noise ratio (SNR) Es/N0. For compari-son purposes, we normalize Es/N0 with respect to the numberof transmitted information bits if the source were compressedto entropy, i.e. t H (V ). Let h2(p) = −p log2 p − (1 − p) log2(1 − p) denote the binary entropy function in bits. We definea signal-to-noise ratio per source bit Eb/N0 as

Eb

N0� n

kh2(p)

Es

N0. (27)

Figure 2 shows the achievable error exponents for differentcoding schemes as a function of Eb/N0 in decibels. The errorexponents in the figure correspond to separate source-channelcoding (24), joint source-channel coding (25), and the multi-class scheme with N = 2, 3, 5, 12. The bound in Theorem 1has been optimized over the parameters ρi , i = 0, . . . , N ,and thresholds γi , i = 1, . . . , N . The bound in Theorem 2has been optimized over the parameters ρ, ρ, ¯ρ, R and R′.In both cases the channel input distribution has been chosento be equiprobable. From the figure we can see that thebound in Theorem 1 and the relaxed version in Theorem 2coincide for N = 2, 3. For N = 2, the multi-class scheme

BOCHAROVA et al.: MULTI-CLASS SOURCE-CHANNEL CODING 5097

shows a 0.4-0.7 dB improvement over separate source coding,with just a small increase in complexity. Moreover, fromthe curves for N = 2, 3, 5, 12 we can see that the multi-class construction approaches the joint source-channel errorexponent as the number of classes increases, confirming theresults of Theorems 1 and 2 (since (25) and (26) coincide forthis example).

IV. PRACTICAL CODE DESIGN

Based on the proposed scheme, we now design a practicaljoint source-channel code for the transmission of a BMS withP(1) ≤ 1/2 over a binary-input AWGN channel. In particular,we consider a two-class code composed of a fixed-to-variablelossless source code followed by two linear codes with dif-ferent rates. The lossless source code corresponds to the classselector in Fig. 1. The ML decoders in Fig. 1 can be replacedby using standard quasi-ML decoders. This fixed-to-variable-to-fixed source-channel code allows a simple implementationusing existing source and channel codes:

First, the length-k binary source sequence is encoded using afixed-to-variable coding scheme that assigns shorter codewordsto the most probable messages, i.e., messages with smallestHamming weight. Two examples are enumerative [21] andarithmetic coding [22]. For a source message v, the lengthof the source codeword L(v) determines which code will beused to encode each source message. Since the source codeis assumed lossless, this is equivalent to assigning sourcemessages to classes based on their probability.

As channel codes we consider two linear (n, ki )-codes Ci ,i = 1, 2. If L(v) ≤ k1 the channel code C1 is used fortransmission, otherwise, if L(v) ≤ k2 the second code, C2,is used. If L(v) > k2 an arbitrary codeword is used anda source coding error is reported. In this coding scheme,ki − L(v) leftover bits may appear due to a mismatch betweenthe source and channel code, i = 1, 2. These bits can be usedto include additional redundancy checks (see [23] for details),however, we set them to zero for the sake of simplicity. Due tothese leftover bits, we do not use all the codewords belongingto each of the channel codes, in contrast to the analysis inSection III. However, in general, ki ≈ log2

∣∣Ak

i

∣∣, i = 1, 2, and

the performance loss is small.At the decoder, two ML (or quasi-ML) parallel decoding

attempts are performed, one of each channel code. Bothdecoder outputs are then checked to verify whether they arevalid source sequences. If only one of the two outputs is avalid source message, the corresponding data are used. If bothdecoders fail, a predetermined message, for example the all-zero data sequence, is used. Finally, if both source decodersreport success, the message with larger a posteriori likelihoodis selected.

A. Code Optimization

The specific pair of (n, ki )-codes depends on the signal-to-noise ratio Eb/N0. Obviously, the choice of the code rates andof the codes themselves is critical for the system performance.If the block length n is small, we can obtain a set of goodchannel codes with different coding rates using techniques

from, e.g. [24]–[26]. Then, for each Eb/N0 the best pair ofcodes from this set can be selected by simulating the systemperformance. While this optimization procedure is feasible forshort block lengths, it becomes computationally intractable asthe block length or rate granularity grow large.

In these cases, we may resort to the error exponents derivenin Section III to estimate the optimal coding rate pair. To thisend we compute the optimal rates R1 and R2 from eitherTheorem 1 or Theorem 2, and select two codes of rates R1and R2. Since the exponential behavior dominates for largeblock lengths, these rates become asymptotically optimal asthe block length grows large. As we will see in the simulationssection, Theorems 1 and 2 give a good approximation of theoptimal coding rates for moderate block lengths (n ≈ 1000).

B. Lower Bound on the Error Probability

We derive a lower bound on the error probability of a two-class linear coding scheme for a BMS. This lower bound willserve as a benchmark to the performance of practical codes.

Disregarding the last summand in (11) we lower bound theerror probability of a given code as

εn ≥ Pr{

V ∈ Ak0

}+∑

i=1,2

Pr{

V ∈ Aki , V i �= V

}(28)

= Pr{

V ∈ Ak0

}

+∑

i=1,2

Pr{

V ∈ Aki

}Pr{

V i �= V∣∣ V ∈ Ak

i

}. (29)

A lower bound on the error probability of a channel codeof rate R is given by Shannon’s sphere-packing bound [27].

Let codewords be distributed over the surface of ann-dimensional hypersphere with squared radius E = nEs andcentered at the origin of coordinates. Let θ be the half-angle ofa cone with vertex at the origin and with axis going throughone arbitrary codeword. We let Q(θ) denote the probabilitythat such codeword be moved outside the cone by effect ofthe Gaussian noise. We choose θn,R such that the solid anglesubtended by a cone of half-angle θn,R is equal to �n/2nR ,where �n is the surface of the n-dimensional hypersphere.Then, Q(θn,Ri ) is a lower bound on the error probability ofthe i -th (length-n) linear codes under ML decoding (when tiesare resolved randomly), i.e.,

Pr{

V i �= V∣∣ V ∈ Ak

i

} ≥ Q(θn,Ri ), i = 1, 2. (30)

This bound is accurate for low SNRs and relatively shortcodes [28]. In order to compute (30) we shall use the approxi-mation from [29], known to be accurate for error probabilitiesbelow 0.1.

For a BMS with p = P(1) ≤ 1/2, it is possible to obtaina closed-form expression for the source terms Pr

{V ∈ Ak

i

},

i = 0, 1, 2. Consider a class Aki composed by the sequences

with Hamming weights w ∈ [w1, w2], where w1 and w2 aretwo arbitrary integers. Then, it follows that

Pr{

V ∈ Aki

} = Bk,p(w1, w2), (31)

where we defined

Bk,p(w1, w2) �w2∑

w=w1

(k

w

)

pw(1 − p)k−w. (32)

5098 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

The best coding strategy is to encode the sequences ofHamming weight w ∈ {0, . . . , w1} with the first (lower-rate) channel code and the sequences of weight w ∈ {w1 +1, . . . , w2} with the second (higher-rate) code. All othersequences are transmitted by some fixed codeword which leadsto decoding error. Therefore, using (30) and (31) in (29),we obtain the following result.

Theorem 3: Consider a length-k BMS with p = P(1) ≤1/2 to be transmitted over a binary-input AWGN channel usinga length-n block code. The error probability of any two-classscheme using linear channel codes and ML decoding (withrandomly resolved ties), is lower bounded as

εn ≥ minw1=0,...,k,

w2=w1+1,...,k

{Bk,p(0, w1)Q

(θn,R(0,w1)

)

+ Bk,p(w1 + 1, w2)Q(θn,R(w1+1,w2)

)

+ Bk,p(w2 + 1, k)}, (33)

where the rate R(w1, w2) is given by

R(w1, w2) = 1

n

log2

w2∑

w=w1

(k

w

)⌉

. (34)

C. Simulation Results

In this subsection we show simulation results for differentimplementations of a two-class scheme in short and moderateblock length scenarios. The source probability is fixed toP(1) = 0.1.

1) Short Block Length Scenario (k = 80, n ≈ 100):Figure 3 shows the simulated frame error rate (FER) per-formance of an implementation using tail-biting codes andML decoding. As source code we use an enumerative codingscheme and as channel codes we have chosen a family of tail-biting (TB) codes of rates R = 1/2, 3/5 and 3/4. The codeof rate R = 1/2 was taken from [30], and the codes of rates3/5 and 3/4 where chosen by doing a short search for high-rate convolutional codes using techniques from [24] and [25].Among the most efficient ML decoding algorithms we haveselected BEAST [31] which allows ML decoding for codes oflength 100 with acceptable complexity. The curves “Separate”and “Two-class” show the best performance obtained withinthe corresponding family of codes. The two-class schemeoutperforms separate coding by about 1 dB, in agreement withthe values predicted by the random coding analysis. Also,from the figure we see that the lower bound (33) can beused to predict not only the gain value but also the best errorprobability.

Table I shows the best code rate pairs obtained for differentvalues of Eb/N0 in this scenario. The table compares thevalues obtained by simulating pairs of TB codes R = 1/2, 3/5and 3/4 with the asymptotic results obtained from (23) inTheorem 2. We can see that there is a discrepancy betweensimulation and asymptotic analysis, due to the short blocklength considered or possibly to the coarse granularity of thecoding rates.

Fig. 3. Enumerative + TB coding, n = 100, k = 80. Frame error rate forseparate and two-class source-channel coding.

TABLE I

ENUMERATIVE + TB CODING, n = 100, k = 80.OPTIMAL RATE

PAIRS (R1, R2) FOR A TWO-CLASS CODING SCHEME

Fig. 4. Enumerative + LDPC coding, n = 1008, k = 1000. Frame error ratefor separate and two-class source-channel coding.

TABLE II

ENUMERATIVE + LDPC CODING, n = 1008, k = 1000. OPTIMALRATE PAIRS (R1, R2) FOR A TWO-CLASS CODING SCHEME

2) Moderate Block Length Scenario (k = 1000, n ≈ 1000):Figure 4 shows the FER for an implementation using LDPCcodes and iterative decoding. We use enumerative source cod-ing and a family of quasi-cyclic (QC) LDPC codes as channelcodes. In particular we consider a set of codes with 24-columnbase matrix and coding rates R = 12/24, 13/24, . . . , 16/24.For constructing these parity-check matrices we used theoptimization algorithm from [26]. The only exception is the

BOCHAROVA et al.: MULTI-CLASS SOURCE-CHANNEL CODING 5099

code of rate R = 18/24 which is borrowed from [32, code A].The decoding algorithm is stopped after 50 iterations of beliefpropagation decoding and we require at least 50 block errorevents for each simulated point.

Each separate source-channel code presents an error floordue to the effect of the source coding error events which donot depend on the channel SNR. This phenomenon resultsin a staggered behavior both of separate and the two-classcodes curves in Fig. 4. For clarity, no individual separatesource-channel curves have been plotted, but only the bestperformance within the family (“Separate”). We can see thatthe two-class scheme (“Two-class”), optimized for each SNRpoint, outperforms separate coding by 0.4-0.7 dB. In this casethe gap to the lower bound (33) is larger with respect to thatin Fig. 3 because of the suboptimal decoding algorithm, withperformance far from ML decoding.

Table II shows the best code rate pairs in this scenario.We observe a better agreement between asymptotic results andsimulation results. This is due to the larger block length, thatmakes the asymptotic approximations more accurate. This factjustifies the use of the asymptotic analysis from Section III toguide the design of good finite-length codes.

V. CONCLUDING REMARKS

In this paper we have presented a source-channel codingscheme in which the source messages are divided into classesbased on their probability and a channel code and ML decod-ing is used for each of the classes. We have shown that theoverall scheme outperforms separate source-channel codingand approaches the performance of joint source-channel cod-ing as the number of classes increases.

The multi-class scheme can be implemented using existingsource and channel codes with reduced complexity. Simulationresults for a binary memoryless source transmitted over abinary input additive Gaussian channel show that using twoclasses offers a 0.5-1.0 dB gain compared to separate source-channel coding. This is consistent with the theoretically pre-dicted values. Moreover, analytical results have been shown tooffer a practical guideline to the design of finite-length source-channel codes in the memoryless setting. While the analysis isrestricted to memoryless sources and channels, the multi-classscheme could be easily implemented for sources and channelswith memory by using appropriate source and channel codes.

APPENDIX IPROOF OF THEOREM 1

In order to prove Theorem 1 we start by introducing anumber of properties of the partition of the source messageset. The main proof is then included in Section V-B of thisappendix.

A. Properties of the Partition{Ak

i

}in (3)

Let us define the function

Es,i (ρ) � limk→∞

1

klog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

, (35)

which takes over the role of Gallager’s source functionEs(·) when dealing with multiple classes (see, e.g., [16]).In principle, the functions Es,i (·) are difficult to evaluate,since they involve summing over an exponential number ofterms (one for each sequence) and the computation of a limit.The following result provides a simple characterization ofEs,i (·) for a sequence of partitions of the form (3). We denotethe derivative of Es(ρ) evaluated at ρ as

E ′s(ρ) � ∂ Es(ρ)

∂ρ

∣∣∣∣ρ=ρ

(36)

and we define the tilted distribution

Pσ (v) � P(v)σ∑

v P(v)σ. (37)

Lemma 1: Consider a sequence of memoryless sources Pk

and partitions{Ak

i

}in (3), k = 1, 2, . . .. Then, for any ρ ∈ R,

γi ≤ maxv P(v) and γi+1 > minv P(v),

Es,i (ρ)=

⎧⎪⎪⎨

⎪⎪⎩

Es(ρi ) + (ρ − ρ

i )E ′s(ρ

i ), 1

1+ρ < 11+ρ

i,

Es(ρ), 11+ρ

i≤ 1

1+ρ ≤ 11+ρ

i+1,

Es(ρi+1)+(ρ − ρ

i+1)E ′s(ρ

i+1),

11+ρ > 1

1+ρi+1

,

(38)

where ρi , i = 0, . . . , N + 1, are given by the solution to

the implicit equation∑

v P 11+ρ

i

(v) log P(v) = log γi as long

as minv P(v) ≤ γi ≤ maxv P(v). When γi < minv P(v),ρ

i = −1− and for γi > maxv P(v), ρi = −1+.

For γi > maxv P(v) or γi+1 ≤ minv P(v), the i -th class isempty and Es,i (ρ) = −∞.

Proof: See Appendix I-C.In principle, the values of ρ

i appearing in Lemma 1 canbe negative. If we restrict ourselves to the range ρ ≥ 0,the thresholds yielding negative values of ρ

i are uninterestingto us, since they correspond to classes that never dominatethe exponent. Therefore, for the present work, we may restrictthe value of the thresholds γi to satisfy

∑v

1|V | log P(v) ≤

log γi ≤ ∑v P(v) log P(v), i = 1, . . . , N . In this case,

the three regions in ρ appearing in (38) can be equivalentlywritten as

{ρ > ρ

i

},{ρ

i+1 ≤ ρ ≤ ρi

}, and

{ρ < ρ

i+1

},

respectively, with ρ0 = ∞ and ρ

N+1 = 0.An example of the characterization in Lemma 1 for ρ ≥ 0

is shown in Fig. 5 for a three-class partition. We observe thatEs,i (ρ) is equal to Es(ρ) for the interval ρ

i+1 ≤ ρ ≤ ρi , and

corresponds to a straight line tangent to Es(ρ) out of thoseintervals. Since the thresholds γ0 and γN+1 are fixedto 0 and 1, respectively, then ρ

0 = ∞ andρ

N+1 = 0. For the remaining thresholds, we can obtainany finite value of ρ

i ∈ [0,∞) by appropriately choosing thethreshold γi , i = 1, . . . , N , between exp

(∑v

1|V | log P(v)

)

and exp(∑

v P(v) log P(v)).

Lemma 2: For a sequence of memoryless sources Pk

and partitions{Ak

i

}in (3), k = 1, 2, . . ., each threshold

minv P(v) ≤ γi ≤ maxv P(v) in (3) univocally determines thecorresponding coding rate Ri in (4) for each i = 1, . . . , N ,with N = limk→∞ Nk . In particular,

Ri = t E ′s(ρ

i ) (39)

5100 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

Fig. 5. Example of the characterization in Lemma 1 of the Es,i (·) functionswith three classes (N = 2).

where ρi is given by the solution to the implicit equation∑

v Pρi(v) log P(v) = log γi .

Proof: See Appendix I-D.Lemmas 1 and 2 imply that, asymptotically, it is equivalent

to optimize the partition over either the set of thresholds{γi } or over the rates {Ri }. Furthermore, they provide analternative representation of the asymptotic probability of theset Ak

i , as shown by the next result.Lemma 3: Consider a sequence of memoryless sources Pk

and partitions{Ak

i

}in (3), k = 1, 2, . . .. When log γi+1 ≤∑

v P(v) log P(v), i = 1, . . . , N , it holds that

limk→∞

1

klog

(∑

v∈Aki

Pk(v)

)

= −e

(Ri+1

t

)

. (40)

Proof: From (35) we have that

limk→∞

1

klog

(∑

v∈Aki

Pk(v)

)

= Es,i (0) (41)

= Es(ρi+1) − ρ

i+1 E ′s(ρ

i+1) (42)

= maxρ≥0

{

Es(ρ) − ρRi+1

t

}

, (43)

where (42) follows from (38) given the assumptions in thelemma implying ρ

i+1 ≥ 0, and in (43) we used Lemma 2and the fact that ρ

i+1 is the point where Es(ρ) has slopeRi+1

t , i.e., it maximizes the quantity in brackets. The resultthus follows from (43) by using the definition (20) of the errorexponent of a discrete memoryless source compressed to rateRi+1

t .Then, the asymptotic coding rate of the i -th class is uniquely

determined by the lower threshold γi defining this class,as shown in Lemma 2. Similarly, combining Lemma 2 andLemma 3, we obtain that the exponent of the probability ofthe i -th class is determined by the upper threshold γi+1.

B. Proof of Theorem 1

We now proceed with the proof of Theorem 1. Underthe assumption that the number of classes Nk behaves sub-exponentially in k, the error exponent is given by the minimum

of the individual exponents of each of the summands in (16),namely

− limn→∞

1

nlog εn

= min

{

− limn→∞

1

nlog Pr

{V ∈ Ak

0

},

mini=1,...,N

− limn→∞

1

nlog Pr

{V ∈ Ak

i , V i �= V},

− limn→∞

1

nlog Pr

{V /∈ Ak

0, q(V , Y ) ≤ maxv �=V ,v /∈Ak

0

q(v, Y )}}

.

(44)

We next analyze each of the terms in the minimum separately.As we discussed after Lemma 1, we consider partitions

with thresholds γi satisfying∑

v1

|V | log P(v) ≤ log γi ≤∑

v P(v) log P(v), i = 1, . . . , N . Then, Lemma 3 yields theexponent of the first term in the minimum in (44), that is

− limn→∞

1

nlog Pr

{V ∈ Ak

0

} = te

(R1

t

)

. (45)

We now upper bound the second term in (44). First, we usethe chain rule to express the probability, for i = 1, . . . , N , as

Pr{

V ∈ Aki , V i �= V

}

= Pr{

V i �= V |V ∈ Aki

}Pr{

V ∈ Aki

}. (46)

The first factor corresponds to the error probability of achannel coding problem with Mi messages transmitted overa channel W n . We can lower-bound its exponent in termsof the random-coding exponent for input distribution Qi .For each each class Ak

i , i = 1, . . . , N , there exists a codeCi whose error probability over the memoryless channel Wsatisfies [2, Th. 5.6.1]

− limn→∞

1

nPr{

V i �= V |V ∈ Aki

}

≥ maxρi∈[0,1]

{E0(ρi , W, Qi ) − ρi Ri

}, (47)

= Er (Ri , Qi ). (48)

As in (45), the exponent of the second factor in (46) is

− limn→∞

1

nlog Pr

{V ∈ Ak

i

} = te

(Ri+1

t

)

. (49)

Combining (48) and (49) we thus obtain

− limn→∞

1

nlog Pr

{V ∈ Ak

i , V i �= V}

≥ Er (Ri , Qi ) + te

(Ri+1

t

)

, i = 1, . . . , N. (50)

Finally, we identify the last term in (16) as the error expo-nent of a specific joint source-channel coding problem, wherethe source message probabilities do not add up to 1. In therandom-coding argument, codewords are generated accordingto a class-dependent input distribution Qi , i = 1, . . . , N .We can thus use [16, Th. 1] to bound the exponent

− limn→∞

1

nlog Pr

{

q(V , Y ) ≤ maxv �=V ,v /∈Ak

0

q(v, Y ), V /∈ Ak0

}

≥ mini=1,...,N

{E0(ρi , W, Qi

)− t Es,i (ρi )}, (51)

BOCHAROVA et al.: MULTI-CLASS SOURCE-CHANNEL CODING 5101

for any ρi ∈ [0, 1]. Here we used that the proof of[16, Th. 1] is valid also for defective source message proba-bilities.

From Lemma 1, we infer that the source function Es,i (ρ) isnon-decreasing, convex and with a non-decreasing derivative.Moreover, Lemma 2 shows that the derivative approaches thelimiting value Ri

t as ρ → ∞. Therefore, the source functionEs,i (ρ) satisfies the following simple upper bound for non-negative ρ

Es,i (ρ) ≤ Es,i (0) + ρRi

t(52)

= −e

(Ri+1

t

)

+ ρRi

t, (53)

where we used (42). Substituting (53) in the right-handside (51) we obtain

E0(ρi , W, Qi

)− t Es,i (ρi )

≥ E0(ρi , W, Qi

)− ρi Ri + te

(Ri+1

t

)

. (54)

Since this inequality holds for arbitrary ρi ∈ [0, 1] andinput distribution Qi , we conclude that for each value ofi = 1, . . . , N , the corresponding exponent in (51) is lower-bounded by the exponent in (50). Hence, this term can beomitted in the minimum in (44).

Finally, we observe that Q0 satisfies Er (R, Q0) = 0 for anyrate R. Then, from (44), using the intermediate results (45),

with te( R1

t

)replaced by te

( R1t

) + Er (R1, Q0), and (50) weget the desired

− limn→∞

1

nlog εn ≥ min

i=0,...,N

{

Er (Ri , Qi ) + te

(Ri+1

t

)}

. (55)

Lemma 2 shows that for t E ′s(0) ≤ Ri ≤ limρ→∞ t E ′

s(ρ)the correspondence between γi and Ri is one-to-one. Sincethe set {Ri } that maximizes the right-hand side of (55) isalways in this range, we conclude that it is asymptoticallyequivalent to optimize the partition over thresholds {γi} orrates {Ri }.

C. Proof of Lemma 1

For σ ∈ R and k = 1, 2, . . ., let us definethe random variable Zσ,k � log Pk(V ) with underlyingdistribution

Pkσ (v) � Pk(v)σ

∑v Pk(v)σ

. (56)

This distribution is the multi-letter version of (37). The asymp-totic normalized log-moment generating function of Zσ,k isgiven by

κσ (τ ) � limk→∞

1

klog E

[eτ Zσ,k

](57)

= log

(∑v P(v)σ+τ

∑v P(v)σ

)

. (58)

It follows that

�i (σ ) � limk→∞

1

klog

⎜⎝∑

v∈Aki

Pk(v)σ

⎟⎠ (59)

= limk→∞

1

klog

(∑

v

Pk(v)σ

)

+ limk→∞

1

klog

⎜⎝∑

v∈Aki

Pkσ (v)

⎟⎠ (60)

= log(∑

vP(v)σ

)

+ limk→∞

1

klog(

Pr{log γ k

i < Zσ,k ≤ log γ ki+1

}).

(61)

Applying the Gartner-Ellis theorem [33, Th. II.6.1] to theterm Pr

{log γ k

i < Zσ,k ≤ log γ ki+1

}, and given the smoothness

properties of κσ (τ ) in (58), we obtain

�i (σ ) = suplog γi≤r≤log γi+1

infτ

(r, τ ), (62)

where

(r, τ ) � log(∑

vP(v)σ

)− (

rτ − κσ (τ ))

(63)

= −rτ + log(∑

vP(v)σ+τ

). (64)

The function (r, τ ) is differentiable in C2 and that its

Hessian is given by

∇2 (r, τ ) =

[0 −1

−1 ∂2 (r,τ )(∂τ )2

]

. (65)

Hence, its determinant is∣∣∇2

(r, τ )∣∣ = −1 < 0 and the

solution of (62) is a saddle point provided that the constraintsare non-active. By taking the derivative of (r, τ ) with respectto τ and equating it to zero we obtain that for the optimal pointit holds that

r =∑

vPσ+τ (v) log P(v). (66)

By taking the derivative of (r, τ ) with respect to r and

equating it to zero it follows that for the optimal point

τ = 0, (67)

provided that the constraints in (62) are non-active.

We translate the constraints on r to the domain of σ . Letσ

i be given by the solution to the implicit equation∑

vPσ

i(v) log P(v) = log γi , (68)

as long as minv P(v) ≥ γi ≥ maxv P(v). In case that γi <

minv P(v) then σi = −∞; if γi > maxv P(v), then σ

i = ∞.Using (66) and (67), the constraints in (62), log γi ≤ r ≤log γi+1, can be equivalently written as σ

i ≤ σ ≤ σi+1, i =

0, . . . , N .

5102 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

1) When σi ≤ σ ≤ σ

i+1 the constraints are non-active andthe saddlepoint occurs at

r =∑

vPσ (v) log P(v), τ = 0. (69)

Substituting these values in (62) we obtain

�i (σ ) = log(∑

vP(v)σ

). (70)

2) For σ < σi , the optimal r is given by

r = log γi =∑

vPσ

i(v) log P(v), (71)

and using (66), we obtain τ = σi −σ . Substituting these

values in (62) yields

�i (σ ) = (σ − σ

i

)∑

vPσ

i(v) log P(v)

+ log(∑

vP(v)σ

i

). (72)

3) Proceeding in an analogous way to the previous case,for σ > σ

i+1, we obtain

�i (σ ) = (σ − σ

i+1

)∑

vPσ

i+1(v) log P(v)

+ log(∑

vP(v)σ

i+1

). (73)

Substituting (70), (72) and (73), i = 0, . . . , N , in thecorresponding range of the parameter σ , rearranging terms,we obtain

1

σ�i (σ ) =

⎧⎪⎪⎨

⎪⎪⎩

G(σ, σ i ), σ < σ

i ,1

σlog(∑

v P(v)σ), σ

i ≤ σ ≤ σi+1,

G(σ, σ i+1), σ > σ

i+1,

(74)

where

G(σ, s) � 1s log

(∑

vP(v)s

)

− ( 1σ − 1

s

)∑

vPs(v) log Ps(v). (75)

The expression 1σ �i (σ ) in (74) corresponds precisely with

Es,i (ρ) when σ = 11+ρ . Then, the result follows from the

definition of Es(ρ) in (18), using that

E ′s(ρ) = −

v

P 11+ρ

(v) log P 11+ρ

(v). (76)

D. Proof of Lemma 2

Using the characterization in Lemma 1 it followsthat

limρ→∞

1

ρEs,i (ρ) = lim

ρ→∞1

ρ

(Es(ρ

i ) + (ρ − ρ

i )E ′s(ρ

i ))

(77)

= E ′s(ρ

i ), (78)

as long as ρi < ∞.

Also, using the definition (35) we have that

limρ→∞

1

ρEs,i (ρ) lim

ρ→∞ limk→∞

1

ρklog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

(79)

= limk→∞ lim

ρ→∞1

ρklog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

(80)

= limk→∞

1

klim

ρ→∞1 + ρ

ρlog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠ (81)

= limk→∞

1

klog

∣∣Ak

i

∣∣ (82)

= Ri

t, (83)

where in (80) we applied the Moore-Osgood theorem [34,p. 619] since the expression

1

ρklog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

(84)

presents uniform convergence for each k as ρ → ∞, andpointwise convergence as k → ∞, as we show next. Then,using (77)-(78) and (79)-(83), we obtain (39). The result thusfollows from the definition of ρ

i in Lemma 1.We show the convergence properties of (84). We write

1

klog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

− 1

klog

∣∣Ak

i

∣∣

≤ 1

k

⎜⎜⎝log

⎜⎝∑

v∈Aki

11

1+ρ

⎟⎠

1+ρρ

− log∣∣Ak

i

∣∣

⎟⎟⎠ (85)

= 1

k

(

log∣∣Ak

i

∣∣

1+ρρ − log

∣∣Ak

i

∣∣)

(86)

= 1

kρlog

∣∣Ak

i

∣∣ (87)

= Ri

tρ. (88)

Similarly,

1

klog

∣∣Ak

i

∣∣− 1

klog

⎜⎝∑

v∈Aki

Pk(v)1

1+ρ

⎟⎠

1+ρ

≤ 1

k

⎜⎜⎝log

∣∣Ak

i

∣∣− log

⎜⎝∑

v∈Aki

(min

vP(v)

) k1+ρ

⎟⎠

1+ρρ

⎟⎟⎠ (89)

= 1

k

(

log∣∣Ak

i

∣∣− log

∣∣Ak

i

∣∣

1+ρρ − log

(min

vP(v)

) kρ

)

(90)

= − 1

kρlog

∣∣Ak

i

∣∣− 1

ρlog min

vP(v) (91)

= 1

ρ

(

− log minv

P(v) − Ri

t

)

. (92)

BOCHAROVA et al.: MULTI-CLASS SOURCE-CHANNEL CODING 5103

Since (88) and (92) do not depend on k, (84) presentsuniform convergence with respect to k as ρ → ∞. Pointwiseconvergence of (84) as k → ∞ follows from (38).

APPENDIX IIPROOF OF THEOREM 2

We start by writing (21) in dual form, that is, as explicitmaximizations over parameters ρi and ρi ,

− limn→∞

1

nlog εn ≥ min

i=0,...,N

{

maxρi ∈[0,1]

{E0(ρi , Qi ) − ρi Ri

}

+ maxρi ∈[0,∞)

{ρi Ri+1 − t Es(ρi )

}}

. (93)

For i = 0 we have Er (R, Q0) = 0 and for i = N we havee( RN+1

t

) = 0. In the range i = 1, . . . , N − 1 we may fix ρi =ρi without violating the inequality in (93). Then, optimizingover Qi , i = 1, . . . , N , we obtain

− limn→∞

1

nlog εn

≥ maxR1≥...≥RN ≥0

min

{

maxρ0∈[0,∞)

{ρ0 R1 − t Es(ρ0)

},

mini=1,...,N−1

maxρi ∈[0,1]

{E0(ρi ) − t Es(ρi ) − ρi (Ri − Ri+1)

},

maxρN ∈[0,1]

{E0(ρN ) − ρN RN

}}

. (94)

Noting that the inner minimization in (94) is maximized withrespect to {Ri } when Ri − Ri+1 is constant, i = 1, . . . , N −1,the result follows.

REFERENCES[1] C. Shannon, “A mathematical theory of communication,” Bell Syst.

Tech. J., vol. 27, pp. 379–423 and 623–656, Jul./Oct. 1948.[2] R. G. Gallager, Information Theory and Reliable Communication.

New York, NY, USA: Wiley, 1968.[3] I. Csiszár, “Joint source-channel error exponent,” Probl. Control Inf.

Theory, vol. 9, no. 5, pp. 315–328, Sep. 1980.[4] D. Wang, A. Ingber, and Y. Kochman, “The dispersion of joint source-

channel coding,” in Proc. 49th Ann. Allerton Conf. Commun., Control,Comput., Sep. 2011, pp. 180–187.

[5] V. Kostina and S. Verdú, “Lossy joint source-channel coding in thefinite blocklength regime,” IEEE Trans. Inf. Theory, vol. 59, no. 5,pp. 2545–2575, May 2013.

[6] Y. Zhong, F. Alajaji, and L. L. Campbell, “On the joint source-channelcoding error exponent for discrete memoryless systems,” IEEE Trans.Inf. Theory, vol. 52, no. 4, pp. 1450–1468, Apr. 2006.

[7] J. Hagenauer, “Source-controlled channel decoding,” IEEE Trans.Commun., vol. 43, no. 9, pp. 2449–2457, Sep. 1995.

[8] J. Garcia-Frias and Y. Zhao, “Compression of binary memorylesssources using punctured turbo codes,” IEEE Commun. Lett., vol. 6, no. 9,pp. 394–396, Sep. 2002.

[9] M. Fresia, F. Pérez-Cruz, H. V. Poor, and S. Verdú, “Joint sourceand channel coding,” IEEE Signal Process. Mag., vol. 27, no. 6,pp. 104–113, Nov. 2010.

[10] G. I. Shamir, J. J. Boutros, A. Alloum, and L. Wang, “Non-systematicLDPC codes for redundant data,” in Proc. Inaugural Workshop CenterInf. Theory Appl., San Diego, CA, USA, Feb. 2006, pp. 1–5.

[11] R. Bauer and J. Hagenauer, “Symbol-by-symbol MAP decoding ofvariable length codes,” in Proc. 3rd ITG Conf. Source Channel Coding,Jan. 2000, pp. 111–116.

[12] J. Hagenauer and R. Bauer, “The turbo principle in joint source channeldecoding of variable length codes,” in Proc. IEEE Inf. Theory Workshop,Cairns, Australia, Sep. 2001, pp. 33–35.

[13] A. Guyader, E. Fabre, C. Guillemot, and M. Robert, “Joint source-channel turbo decoding of entropy-coded sources,” IEEE J. Sel. AreasCommun., vol. 19, no. 9, pp. 1680–1696, Sep. 2001.

[14] M. Grangetto, P. Cosman, and G. Olmo, “Joint source/channel codingand MAP decoding of arithmetic codes,” IEEE Trans. Commun., vol. 53,no. 5, pp. 1007–1016, Jul. 2005.

[15] S. Lonardi, W. Szpankowski, and M. D. Ward, “Error resilient LZ’77data compression: Algorithms, analysis, and experiments,” IEEE Trans.Inf. Theory, vol. 53, no. 5, pp. 1799–1813, May 2007.

[16] A. Tauste Campo, G. Vazquez-Vilar, A. Guillén i Fàbregas, T. Koch, andA. Martinez, “A derivation of the source-channel error exponent usingnonidentical product distributions,” IEEE Trans. Inf. Theory, vol. 60,no. 6, pp. 3209–3217, Jun. 2014.

[17] Y. Y. Shkel, V. Y. F. Tan, and S. C. Draper, “Unequal message protection:Asymptotic and non-asymptotic tradeoffs,” IEEE Trans. Inf. Theory,vol. 61, no. 10, pp. 5396–5416, Oct. 2015.

[18] F. Jelinek, Probabilistic Information Theory. New York, NY, USA:McGraw-Hill, 1968.

[19] I. E. Bocharova, A. Guillén i Fàbregas, B. D. Kudryashov, A. Martinez,A. Tauste Campo, and G. Vazquez-Vilar, “Source-channel coding withmultiple classes,” in Proc. IEEE Int. Symp. Inf. Theory, Honolulu, HI,USA, Jun./Jul. 2014, pp. 1514–1518.

[20] R. T. Rockafellar, Convex Analysis. Princeton, NJ, USA:Princeton Univ. Press, 1970.

[21] T. M. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory,vol. 19, no. 1, pp. 73–77, Jan. 1973.

[22] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for datacompression,” Commun. ACM, vol. 30, no. 6, pp. 520–540, 1987.

[23] I. E. Bocharova, A. Guillén i Fàbregas, B. D. Kudryashov, A. Martinez,A. Tauste Campo, and G. Vazquez-Vilar, “Low-complexity fixed-to-fixed joint source-channel coding,” in Proc. 8th Int. Symp. Turbo CodesIterative Inf. Process., Bremen, Germany, Aug. 2014, pp. 132–136.

[24] F. Hug, I. E. Bocharova, R. Johannesson, and B. D. Kudryashov,“Searching for high-rate convolutional codes via binary syndrometrellises,” in Proc. IEEE Int. Symp. Inf. Theory, Seoul, South Korea,Jun./Jul. 2009, pp. 1358–1362.

[25] I. E. Bocharova and B. D. Kudryashov, “Rational rate puncturedconvolutional codes for soft-decision Viterbi decoding,” IEEE Trans.Inf. Theory, vol. 43, no. 4, pp. 1305–1313, Jul. 1997.

[26] I. E. Bocharova, B. D. Kudryashov, and R. Johannesson, “Combinatorialoptimization for improving QC LDPC codes performance,” in Proc.IEEE Int. Symp. Inf. Theory, Istanbul, Turkey, Jul. 2013, pp. 2651–2655.

[27] C. E. Shannon, “Probability of error for optimal codes in a Gaussianchannel,” Bell Syst. Tech. J., vol. 38, no. 3, pp. 611–656, May 1959.

[28] I. Sason and S. Shamai (Shitz), “Performance analysis of linearcodes under maximum-likelihood decoding: A tutorial,” in Foundationsand Trends in Communications and Information. Norwell, MA, USA:NOW Publishers, 2006.

[29] S. Vialle and J. Boutros, “Performance of optimal codes on Gaussianand Rayleigh fading channels: A geometrical approach,” in Proc. 37thAnnu. Allerton Conf. Commun., Control Comp., vol. 37. Allerton, IL,USA, Sep. 1999, pp. 515–524.

[30] I. E. Bocharova, R. Johannesson, B. D. Kudryashov, and P. Stahl,“Tailbiting codes: Bounds and search results,” IEEE Trans. Inf. Theory,vol. 48, no. 1, pp. 137–148, Jan. 2002.

[31] I. E. Bocharova, R. Johannesson, B. D. Kudryashov, and M. Loncar,“BEAST decoding for block codes,” Eur. Trans. Telecommun., vol. 15,no. 4, pp. 297–305, Jul./Aug. 2004.

[32] Air Interface for Fixed and Mobile Broadband Wireless Access Systems,IEEE Standard P802.16e/D12, Oct. 2005.

[33] R. S. Ellis, Entropy, Large Deviations, and Statistical Mechanics,vol. 271. Berlin, Germany: Springer, 1985.

[34] W. F. Osgood, Lehrbuch der Funktionentheorie, vol. 1, 5th ed. Leipzig,Germany: Teubner, 1928, .

Irina E. Bocharova was born in Leningrad, U.S.S.R., on July 31, 1955.She received the Diploma in Electrical Engineering in 1978 from theLeningrad Electro-technical Institute and the Ph.D. degree in technical sci-ences in 1986 from Leningrad Institute of Aircraft Instrumentation.

Since 1986 until 2007, she has been Senior Researcher, Assistant Professor,and then Associate Professor at the Leningrad Institute of Aircraft Instrumen-tation (now State University of Aerospace Instrumentation, St.-Petersburg,Russia). Since 2007 she has been Associate Professor at the State Universityof Information Technologies, Mechanics and Optics. Her research interestsinclude convolutional codes, communication systems, source coding and itsapplications to speech, audio and image coding. She has more than 70 paperspublished in journals and proceedings of international conferences, and sevenU.S. patents in speech, audio and video coding. She has authored the textbookCompression for Multimedia (Cambridge University Press, 2010).

Professor Bocharova was awarded the Lise Meitner Visiting Chair inengineering at Lund University, Lund, Sweden twice (January-June 2005 and2011). In 2014 she received Scopus Russia Award.

5104 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 62, NO. 9, SEPTEMBER 2016

Albert Guillén i Fàbregas (S’01–M’05–SM’09) received the Telecommu-nication Engineering Degree and the Electronics Engineering Degree fromUniversitat Politècnica de Catalunya and Politecnico di Torino, respectivelyin 1999, and the Ph.D. in Communication Systems from École PolytechniqueFédérale de Lausanne (EPFL) in 2004.

Since 2011 he has been an ICREA Research Professor at Universitat Pom-peu Fabra. He is also an Adjunct Researcher at the University of Cambridge.He has held appointments at the New Jersey Institute of Technology, TelecomItalia, European Space Agency (ESA), Institut Eurécom, University of SouthAustralia, University of Cambridge, as well as visiting appointments at EPFL,École Nationale des Télécommunications (Paris), Universitat Pompeu Fabra,University of South Australia, Centrum Wiskunde& Informatica and TexasA&M University in Qatar. His research interests are in the areas of informationtheory, coding theory and communication theory.

Dr. Guillén i Fàbregas is a Member of the Young Academy of Europe, andreceived the Starting Grant from the European Research Council, the YoungAuthors Award of the 2004 European Signal Processing Conference, the 2004Best Doctoral Thesis Award from the Spanish Institution of Telecommuni-cations Engineers, and a Research Fellowship of the Spanish Governmentto join ESA. He is also an Associate Editor of the IEEE TRANSACTIONS

ON INFORMATION THEORY, an Editor of the Foundations and Trends inCommunications and Information Theory, Now Publishers and was an Editorof the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS.

Boris D. Kudryashov was born in Leningrad, U.S.S.R., on July 9, 1952.He received the Diploma in Electrical Engineering in 1974 and the Ph.D.in technical sciences degree in 1978 both from the Leningrad Institute ofAerospace Instrumentation, and the Doctor of Science degree in 2005 fromInstitute of Problems of Information Transmission, Moscow.

In 1978, he became Assistant Professor, in 1983 Associate Professor, and in2005 Professor at Leningrad Institute of Aerospace Instrumentation (now theState University of Aerospace Instrumentation, St.-Petersburg, Russia). SinceNovember 2007, he has been Professor at the State University of InformationTechnologies, Mechanics and Optics, St.-Petersburg, Russia. His researchinterests include coding theory, information theory and applications to speech,audio and image coding. He has authored textbooks on information theoryand on coding theory, and coauthored a textbook on data communications(all books in Russian). He has more than 80 papers published in journalsand proceedings of international conferences and more than 25 internationalpatents in image, speech and audio coding.

Alfonso Martinez (SM’11) was born in Zaragoza, Spain, in October 1973.He is currently a Ram´on y Cajal Research Fellow at Universitat PompeuFabra, Barcelona, Spain. He obtained his Telecommunications Engineeringdegree from the University of Zaragoza in 1997. In 1998-2003 he was aSystems Engineer at the research centre of the European Space Agency(ESA-ESTEC) in Noordwijk, The Netherlands. His work on APSK mod-ulation was instrumental in the definition of the physical layer ofDVB-S2. From 2003 to 2007 he was a Research and Teaching Assistantat Technische Universiteit Eindhoven, The Netherlands, where he conductedresearch on digital signal processing for MIMO optical systems and onoptical communication theory. Between 2008 and 2010 he was a post-doctoralfellow with the Informationtheoretic Learning Group at Centrum Wiskunde&Informatica (CWI), in Amsterdam, The Netherlands. In 2011 he was aResearch Associate with the Signal Processing and Communications Lab atthe Department of Engineering, University of Cambridge, Cambridge, U.K.His research interests lie in the fields of information theory and coding, withemphasis on digital modulation and the analysis of mismatched decoding;in this area he has coauthored a monograph on "Bit-Interleaved CodedModulation." More generally, he is intrigued by the connections betweeninformation theory, optical communications, and physics, particularly by thelinks between classical and quantum information theory.

Adrià Tauste Campo (S’08–M’12) is a Research Associate with Hospitaldel Mar Research Institute and Universitat Pompeu Fabra (UPF), Barcelona,Spain. He received the Mathematics degree in 2005 and the Telecommu-nications Engineering degree in 2006, both from Universitat Politécnica deCatalunya (UPC), Barcelona, Spain, the M. Sc. in Electrical Engineeringfrom Stanford University, U.S., in 2009 and the Ph.D in Engineering fromUniversity of Cambridge, U.K., in 2011. From 2013 to 2015 he was aMarie Curie Intra-European Research Fellow with Universitat Pompeu Fabra.Before, he was a post-doctoral researcher at Universitat Pompeu Fabraand at the Department of Engineering, University of Cambridge, under theEngineering and Physical Sciences Research Council (EPSRC) Doctoral Prizeprogramme.

He has held research appointments at the University of South Australia(UNISA), Adelaide, Australia, Telef´onica Research Labs, Barcelona, Spain,and visiting appointments at the Università degli Studi di Cassino, Cassino,Italy, and Universidad Nacional Autónoma de México (UNAM), MexicoD.F., Mexico. His research interests include information theory and statisticallearning with applications to computational neuroscience. Dr. Tauste Campocurrently serves as vice-president of the Spanish Network for the Interactionof Cognitive and Computational Neuroscience.

Gonzalo Vazquez-Vilar (S’08–M’12) received the Telecommunication Engi-neering degree from the University of Vigo, Spain, in 2004, the Masterof Science degree from Stanford University, U.S., in 2008 and the Ph.D.in Communication Systems from the University of Vigo, Spain, in 2011.In 2011-2014 he was a post-doctoral fellow in the Department of Informationand Communication Technologies, Universitat Pompeu Fabra, Spain, andsince 2014 he has been a Visiting Professor in the Department of SignalTheory and Communications, Universidad Carlos III de Madrid, Spain.

He has held appointments as visiting researcher at Stanford Univer-sity, U.S., University of Cambridge, U.K., and Princeton University, U.S.His research interests lie in the field of Shannon theory, with emphasis onfinite-length information theory and communications.