coded modulation using superimposed binary codes

13
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3331 Coded Modulation Using Superimposed Binary Codes Xiao Ma and Li Ping, Member, IEEE Abstract—In this correspondence, we investigate in a comprehensive fashion a one-layer coding/shaping scheme resembling a perfectly co- operated multiple-access system. At the transmitter, binary data are encoded by either single-level or multilevel codes. The coded bits are first randomly interleaved and then entered into a signal mapper. At each time, the signal mapper accepts as input multiple binary digits and delivers as output an amplitude signal, where the input are first independently mapped into 2-PAM signals (possibly having different amplitudes) and then superimposed to form the output. The receiver consists of an iterative decoding/demapping algorithm with an entropy-based stopping criterion. In the special cases when all the 2-PAM signals have equal amplitudes, based on an irregular trellis, we propose an optimal soft-input–soft-output (SISO) demapping algorithm with quadratic rather than exponential complexity. In the general cases, when multilevel codes are employed, we propose power-allocation strategies to facilitate the iterative de- coding/dempaping algorithm. Using the unequal power-allocations and the Gaussian-approximation-based suboptimal demapping algorithm (with linear complexity), coded modulation with high bandwidth efficiency can be implemented. Index Terms—Coded modulation, coding/shaping scheme, iterative decoding/demapping algorithm, iterative multistage decoding, multi- level coding, sigma-mapping, soft-input–soft-output (SISO) demapping algorithm. I. INTRODUCTION The ideal additive white Gaussian noise (AWGN) channel model is an important channel model both from theoretical and practical points of view [1], [2]. To transmit a sequence of binary digits (information bearers) through such a channel, a modulator is required to map binary digits into real signals. The simplest method of digital signaling through such a channel is to use one-dimensional pulse amplitude mod- ulation (PAM), or equivalently, to use two-dimensional narrow-sense quadrature amplitude modulation (QAM) [3]. The commonly used one-dimensional -PAM constellation consists of equispaced real symbols centered on the origin. In the power-limited regime with low signal-to-noise ratio (SNR), the equiprobable 2-PAM signaling is nearly optimal and binary linear codes suffice to approach the channel capacity. The mapping is quite simple, say, , where is one (coded) binary digit at time , is the transmission energy per symbol, and hence, is one 2-PAM real signal. However, in the high-SNR bandwidth-limited regime, the channel capacity cannot be achieved if the equispaced -PAM signal points are used with equal probabilities [3]. As , the reduc- tion in capacity, or equivalently, the increase in SNR asymptotically approaches 1.53 dB , which is called the ultimate shaping gain [3], [4]. On the other hand, the mutual information achieved with equiprobable -PAM (which is called “equiprobable -PAM capacity” in [2]) can be approached by binary coset codes [5]. The Manuscript received April 10, 2003; revised May 17, 2004. This work was supported by Research Grants Council of Hong Kong SAR, China, under Grant CityU 1164/03E. X. Ma was with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong. He is now with the Department of Elec- tronics and Communication Engineering, Sun Yat-sen University, Guangzhou 510275, China (e-mail: [email protected]). L. Ping is with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong (e-mail: [email protected]). Communicated by K. A. S. Abdel-Ghaffar, Associate Editor for Coding Theory. Digital Object Identifier 10.1109/TIT.2004.838104 Fig. 1. The general two-layer coding/shaping scheme. mapping can in principle be chosen as any one-to-one mapper from onto -PAM constellation, 1 where . Therefore, to approach the channel capacity in the high-SNR bandwidth-limited regime, coding techniques must be supplemented with shaping techniques. There are two methods to combine coding and shaping techniques, which are reviewed separately in Sections I-A and -B. A. Two-Layer Scheme The first method is a two-layer scheme [3], which is shown in Fig. 1 and described as follows. A given (finite or infinite) -dimen- sional signal constellation is partitioned into disjoint subsets , . Each of these subsets consists of signal points to be transmitted. The binary data stream is divided into two substreams, and , according to appropriate rates. The substream drives an error-correcting code, resulting in a coded sequence , where is the code length. The coded sequence is then utilized to select a sequence of subsets , where for some and is the frame length. This layer is called a channel coding layer. Given the selected subset sequence , the other substream (possibly encoded by a shaping code) is utilized to select a particular signal sequence such that for . This layer is called a source coding layer. The function of the channel coding layer is to introduce redundancies while transforming a binary data sequence into a subset sequence such that all subsets sequences can be distinguished with high probability in the presence of noise. The goal of the source coding layer is to maximize the transmission rate for a fixed average energy, or equivalently, to minimize the average energy for a fixed rate. Assuming that the channel is nearly clean (guaranteed by the channel coding layer), the problem is therefore equivalent to maximizing the entropy of the source (signal constellation) under the energy constraints. It has been proved [6] that the optimal distribution is the Maxwell–Boltzmann distribution. 2 The optimality of the two-layer scheme in the high-SNR regime is based on the separability of coding and shaping [4]. It has been proved that, for large SNR, the loss in terms of capacity is negligible if the channel coding layer and the source coding layer are properly com- bined for lattice-based constellations [7]. It has also been illustrated via simulations that the performance of the two-layer coding/shaping scheme can be achieved within about 1 dB away from Shannon limits at bit-error-rate (BER) around [8], [9]. B. One-Layer Scheme The second method to combine coding and shaping is a one-layer scheme, as shown in Fig. 2. The binary data stream is encoded by a binary code, resulting in a coded sequence . Then the coded sequence 1 Precisely, for any given one-to-one mapper from onto the -PAM con- stellation, there exist binary coset codes approaching the equiprobable -PAM capacity. For the definition of a binary coset code, see [5, p. 206]. 2 If continuous approximations are applied, the optimal distribution is the truncated Gaussian distribution [4]. 0018-9448/04$20.00 © 2004 IEEE

Upload: xiao-ma

Post on 24-Sep-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3331

Coded Modulation Using Superimposed Binary Codes

Xiao Ma and Li Ping, Member, IEEE

Abstract—In this correspondence, we investigate in a comprehensivefashion a one-layer coding/shaping scheme resembling a perfectly co-operated multiple-access system. At the transmitter, binary data areencoded by either single-level or multilevel codes. The coded bits are firstrandomly interleaved and then entered into a signal mapper. At each time,the signal mapper accepts as input multiple binary digits and deliversas output an amplitude signal, where the input are first independentlymapped into 2-PAM signals (possibly having different amplitudes) andthen superimposed to form the output. The receiver consists of an iterativedecoding/demapping algorithm with an entropy-based stopping criterion.In the special cases when all the 2-PAM signals have equal amplitudes,based on an irregular trellis, we propose an optimal soft-input–soft-output(SISO) demapping algorithm with quadratic rather than exponentialcomplexity. In the general cases, when multilevel codes are employed,we propose power-allocation strategies to facilitate the iterative de-coding/dempaping algorithm. Using the unequal power-allocations andthe Gaussian-approximation-based suboptimal demapping algorithm(with linear complexity), coded modulation with high bandwidth efficiencycan be implemented.

Index Terms—Coded modulation, coding/shaping scheme, iterativedecoding/demapping algorithm, iterative multistage decoding, multi-level coding, sigma-mapping, soft-input–soft-output (SISO) demappingalgorithm.

I. INTRODUCTION

The ideal additive white Gaussian noise (AWGN) channel model isan important channel model both from theoretical and practical pointsof view [1], [2]. To transmit a sequence of binary digits (informationbearers) through such a channel, a modulator is required to mapbinary digits into real signals. The simplest method of digital signalingthrough such a channel is to use one-dimensional pulse amplitudemod-ulation (PAM), or equivalently, to use two-dimensional narrow-sensequadrature amplitude modulation (QAM) [3]. The commonly usedone-dimensionalM -PAM constellation consists ofM � 2 equispacedreal symbols centered on the origin. In the power-limited regime withlow signal-to-noise ratio (SNR), the equiprobable 2-PAM signaling isnearly optimal and binary linear codes suffice to approach the channelcapacity. The mapping is quite simple, say, xt =

pEs(1 � 2ut),

where ut 2 2 = f0; 1g is one (coded) binary digit at time t, Es isthe transmission energy per symbol, and hence, xt 2 is one 2-PAMreal signal. However, in the high-SNR bandwidth-limited regime, thechannel capacity cannot be achieved if the equispacedM -PAM signalpoints are used with equal probabilities [3]. As M ! 1, the reduc-tion in capacity, or equivalently, the increase in SNR asymptoticallyapproaches �e=6(� 1.53 dB), which is called the ultimate shapinggain [3], [4]. On the other hand, the mutual information achievedwith equiprobable M -PAM (which is called “equiprobable M -PAMcapacity” in [2]) can be approached by binary coset codes [5]. The

Manuscript received April 10, 2003; revised May 17, 2004. This work wassupported by Research Grants Council of Hong Kong SAR, China, under GrantCityU 1164/03E.X. Ma was with the Department of Electronic Engineering, City University

of Hong Kong, Kowloon, Hong Kong. He is now with the Department of Elec-tronics and Communication Engineering, Sun Yat-sen University, Guangzhou510275, China (e-mail: [email protected]).L. Ping is with the Department of Electronic Engineering, City University of

Hong Kong, Kowloon, Hong Kong (e-mail: [email protected]).Communicated by K. A. S. Abdel-Ghaffar, Associate Editor for Coding

Theory.Digital Object Identifier 10.1109/TIT.2004.838104

Fig. 1. The general two-layer coding/shaping scheme.

mapping can in principle be chosen as any one-to-one mapper from k2

ontoM -PAM constellation,1 whereM = 2k . Therefore, to approachthe channel capacity in the high-SNR bandwidth-limited regime,coding techniques must be supplemented with shaping techniques.There are two methods to combine coding and shaping techniques,

which are reviewed separately in Sections I-A and -B.

A. Two-Layer Scheme

The first method is a two-layer scheme [3], which is shown inFig. 1 and described as follows. A given (finite or infinite) n-dimen-sional signal constellation A � n is partitioned into m disjointsubsets Ai, 1 � i � m. Each of these subsets consists of signalpoints to be transmitted. The binary data stream u is divided intotwo substreams, u(c) and u(s), according to appropriate rates. Thesubstream u(c) drives an error-correcting code, resulting in a codedsequence c = (c1; . . . ; cj ; . . . ; cL), where L is the code length. Thecoded sequence c is then utilized to select a sequence of subsetss = (s1; . . . ; st; . . . ; sN), where st = Ai for some 1 � i � mand N is the frame length. This layer is called a channel codinglayer. Given the selected subset sequence s, the other substream u(s)

(possibly encoded by a shaping code) is utilized to select a particularsignal sequence x = (x1; . . . ; xt; . . . ; xN ) such that xt 2 st for1 � t � N . This layer is called a source coding layer.The function of the channel coding layer is to introduce redundancies

while transforming a binary data sequence u(c) into a subset sequences such that all subsets sequences can be distinguished with highprobability in the presence of noise. The goal of the source codinglayer is to maximize the transmission rate for a fixed average energy,or equivalently, to minimize the average energy for a fixed rate.Assuming that the channel is nearly clean (guaranteed by the channelcoding layer), the problem is therefore equivalent to maximizingthe entropy of the source (signal constellation) under the energyconstraints. It has been proved [6] that the optimal distribution isthe Maxwell–Boltzmann distribution.2

The optimality of the two-layer scheme in the high-SNR regime isbased on the separability of coding and shaping [4]. It has been provedthat, for large SNR, the loss in terms of capacity is negligible if thechannel coding layer and the source coding layer are properly com-bined for lattice-based constellations [7]. It has also been illustratedvia simulations that the performance of the two-layer coding/shapingscheme can be achieved within about 1 dB away from Shannon limitsat bit-error-rate (BER) around 10�5 [8], [9].

B. One-Layer Scheme

The second method to combine coding and shaping is a one-layerscheme, as shown in Fig. 2. The binary data stream u is encoded by abinary code, resulting in a coded sequence c. Then the coded sequence

1Precisely, for any given one-to-one mapper from onto the -PAM con-stellation, there exist binary coset codes approaching the equiprobable -PAMcapacity. For the definition of a binary coset code, see [5, p. 206].

2If continuous approximations are applied, the optimal distribution is thetruncated Gaussian distribution [4].

0018-9448/04$20.00 © 2004 IEEE

Page 2: Coded modulation using superimposed binary codes

3332 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

Fig. 2. The one-layer coding/shaping scheme.

c is mapped to a signal sequence x = (x1; . . . ; xt; . . . ; xN). Sup-pose that x is transmitted and its noisy version y is received. Theoreti-cally, one may perform the optimal a posteriori probability (APP) de-coding/demapping algorithm to minimize the BER. In practice, a sub-optimal iterative decoding/demapping algorithm may be implementedto reduce complexity. When this is the case, a random interleaver isimplicitly assumed to exist as a part of the binary encoder.The design of the one-layer system consists of two steps. First, a

signal mapper is designed such that its mutual information rate is asclose as possible to the channel capacity under certain constraints. Sec-ondly, a binary code is designed to approach the mutual information ofthe designed signal mapper. The optimality of the one-layer scheme isbased on Gallager’s lemma (see Section II).The signal points in a fixed constellation can be utilized with

unequal probabilities. In contrast to the two-layer coding/shapingscheme, the optimal distribution for a fixed constellation is generallynot theMaxwell–Boltzmann distribution, but can be numerically foundby using Blahut–Arimoto algorithm [10], [11] or other variations ofBlahut–Arimoto algorithm [12]. Raphaeli and Gurevitz [13] proposeda scheme that combines turbo codes and multiple-to-one signal map-pers that induce nonuniform distributions over signal constellations.Varnica, Ma, and Kavcic [12] proposed a serially concatenated codingscheme with a trellis code as an inner code, where the function ofthe inner trellis code is to transform an i.u.d.-like sequence c into asignal sequence x that matches the channel. In this correspondence,the acronym i.u.d. is for “independent and uniformly distributed.”Generally, if fZ1; . . . ; Zt; . . . ; Zng are n independent and identicallydistributed (i.i.d.) random variables taking values from a finite set Zwith equal probabilities, we call Z = (Z1; . . . ; Zt; . . . ; Zn) an i.u.d.sequence.Also note that the signal constellation can be chosen as nonequis-

paced. Sun and van Tilborg [14] proved that the channel capacity canbe achieved by equiprobable signaling with nonequispaced (geometri-cally Gaussian-like) signal sets. Fragouli et al. [15] showed by simu-lation that such a nonequispaced 8-PAM can offer an improvement ofapproximately 0.2 dB over the conventional 8-PAM.In 1997, Duan, Rimoldi, and Urbanke [16] applied a multiple-ac-

cess signaling method (named sigma-mapping in this correspondence)to multilevel coding (MLC) systems. It has been pointed out that,with properly designed parameters, MLC using sigma-mappers canapproach the channel capacity. Another feature of sigma-mapping isthat the corresponding mapping/demapping algorithms can be imple-mented in an algorithmic (instead of table-lookup-based) manner, asrequired for coding systems with large signal constellations.

C. Outline and Organization

In this correspondence, we investigate the one-layer coding/shapingsystems with sigma-mappers. Our work can be considered as exten-sions of [16]. At the transmitter, coded bits are randomly interleavedbefore entering the sigma-mapper. At the receiver, iterative (instead ofstripping) decoding/demapping algorithms are performed. In the spe-cial cases when all the 2-PAM signals have equal amplitudes, basedon an irregular trellis, we propose an optimal soft-input–soft-output(SISO) demapping algorithm with quadratic rather than exponentialcomplexity. In the general cases, a suboptimal SISO demapping al-gorithm [17] with linear complexity is rederived, which is based onGaussian approximations. In order to approach the channel capacity

by using sigma-mappers, we investigate two coding schemes: single-level coding scheme andmultilevel coding scheme. For the single-levelcoding/sigma-mapping scheme, we illustrate how to choose codingrates for the “inner code” (sigma-mapper) and the outer code. For themultilevel coding/sigma-mapping scheme, we show how to design thesigma-mapper based on power allocations, which is more convenientthan rate allocations [8]. The employed iterative decoding/demappingalgorithmswith entropy-based stopping criteria are explicitly describedas message-passing/processing algorithms over the normal realizationsof the whole system.It is worthwhile pointing out that the investigated coding system can

be easily generalized to QAM (two-dimensional signaling) systems al-though we only consider one-dimensional signaling method (PAM) inthis correspondence.The rest of this correspondence is organized as follows. In Section II,

we first introduce the concept of mutual information for a given signalmapper, and then compare several mapping methods by computingtheir mutual information. In Section III, optimal and suboptimal SISOdemapping algorithms are described. In Section IV, we investigate twocoding schemes using sigma-mappers: single-level coding scheme andmultilevel coding scheme. In Section V, we design three coding/sigma-mapping examples to verify the theoretic analysis. Section VI con-cludes this correspondence.

II. INFORMATION-THEORETIC ANALYSIS OF SIGNAL MAPPERS

A. The Ideal AWGN Channel Model

An ideal AWGN channel is characterized by Yt = Xt +Wt, whereXt 2 and Yt 2 are the channel input and output at time t, re-spectively. The additive noise sequence W = (W1; . . . ;Wt; . . .) isassumed to be an i.i.d. sequence, whereWt is a Gaussian random vari-able with mean 0 and variance �2, denoted byWt � N (0; �2). Let theaverage input energy per dimension be constrained, i.e.,E X2

t � Es,where E(�) stands for the statistical expectation. The channel capacityequals the maximum mutual information and Xt � N (0; Es) is theoptimal distribution. The capacity in bits per dimension (bits/dim) isgiven by Shannon [1]

C =1

2log2(1 + SNR) (1)

where SNR = Es=�2. Without loss of generality, we assume that

�2 = 1 in the rest of this correspondence.

B. Mutual Information of a Signal Mapper

In this correspondence, we only consider one-dimensional signalingmethod for the ideal AWGN channel.Let k be a positive integer. A mathematical mapping � : k

2 7! iscalled a one-dimensional signal mapper, if for any

v = (v(0); . . . ; v(k�1)) 2 k

2

there exists a unique x 2 such that �(v) = x. The image of �,denoted byA = � k

2 , is called the signal constellation. Clearly, thecardinality ofA (denoted by jAj) � 2k . Let V = (V (0); . . . ; V (k�1))be a binary i.u.d. sequence with realizations v 2 k

2 . Then the signalmapper � naturally induces a real random variable X = �(V ) withprobability mass function (pmf)

PX(x) =

v2

�(�(v) = x)

2k; x 2 A: (2)

Hereafter, the indication function�(P) = 1 if the propositionP is true;otherwise, �(P) = 0.

Page 3: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3333

Let Y denote the observation of X through the AWGN channel.Then V ! X ! Y constitutes a Markov chain. The mutual informa-tion I� of a given signal mapper � is defined as the mutual informationI(V ; Y ). It is not difficult to see that

I� = I(V ;Y ) = I(X;Y ) = E log2PY jX(yjx)PY (y)

(3)

where

PY jX(yjx) = 1p2�

exp(�(y � x)2=2)

is the channel transition probability density function (pdf) and

PY (y) =x2A

PX(x)PY jX(yjx):

The average energy per symbol of � is calculated as E� = E(X2).A scaling version of � with respect to a scalar � is defined as v 7!��(v) for v 2 k

2 , which is simply denoted by ��. Clearly, E�� =�2E�. As � ! 1, I�� approaches

H(X) = �x2A

PX(x) log2 PX(x)

nondecreasingly.We have mentioned in the Introduction that, for equiprobable

M -PAM signaling (either equispaced as commonly used or nonequis-paced as designed in [14]), there exist binary coset codes that approachthe corresponding mutual information. More generally, we have thefollowing.

Lemma 0.1 (Gallager) [5]: For any given signal mapper �, I� canbe achieved by binary coset codes.

Proof: See Gallager [5, pp. 206–209].

Motivated byGallager’s lemma, wemay utilize a serial concatenatedsystem to approach the channel capacity. The signal mapper � is firstdesigned such that I� is as close as possible to the channel capacity, andthen an outer binary code is designed to approach I�. Similar ideas havebeen exploited to design good codes for partial response channels [18].

C. Natural Mapping

Let A be the conventional M -PAM signal constellation of sizeM = 2k , that is, A is the scaling version of f(M � 1)=2 � i :0 � i � M � 1g. Obviously, any one-to-one mapper �U : k

2 7! Ahas the same mutual information, denoted by IU . One algorithmicmapper, called natural mapping, is characterized as

�U (v) = �M � 1

2�

k�1

i=0

v(i)2i ; for v 2 k2 (4)

where the scaling factor � is chosen to satisfy E� = Es, i.e.,

� = 12Es=(M2 � 1):

D. Gallager Mapping

Given A, let X� be a random variable with pmf fPX (x); x 2 Agmaximizing I(X;Y ), possibly under certain constraints. Given k, asignal mapper �G is chosen such that the induced random variable Xis as close as possible toX�, say, in the sense that the Kullback–Leiblerdistance [19] D(PX k PX ) is minimized under certain constraints.The mapping method by using �G is called Gallager mapping, whichwas proposed by Gallager [5]. The development here is slightly dif-

ferent from the original description in [5], where PX is selected asthe one maximizing the error exponent rather than the mutual informa-tion. Applying turbo codes to Gallager-type signal mappers, Raphaeliand Gurevitz [13] are able to reach about 1.1 dB from the channel ca-pacity at rates 2 and 3 bits/dim.

E. Sigma-Mapping

In 1997, Duan, Rimoldi, and Urbanke [16] applied a multiple-access signaling method to single-user systems. Though signals ofany types could be applied, we will only consider the cases whenantipodal signals are adopted by each “user.” The resulting mappingmethod is called sigma-mapping, denoted by ��. More precisely,for any v 2 k

2

��(v) =

k�1

i=0

�i(1� 2v(i)) (5)

where �i’s are chosen to satisfy E� = Es, i.e.,

0�i�k�1

�2i = Es:

If all �i’s are equal, �� is called a type-I sigma-mapper; otherwise,�� is called a type-II sigma-mapper. For a type-I sigma-mapper, theinducedX is in general nonuniform.3 For a type-II sigma-mapper, theresulting constellation is in general nonequispaced.

F. Numerical Examples

The mutual information I� defined in (3) can be evaluated byMonteCarlo method or other numerical integration methods. Since the nat-ural mappers and the type-I sigma-mappers are easily determined fora given SNR, we plot their mutual information in Fig. 3. Also shownin Fig. 3 is the capacity. It can be seen that, for example, the type-Isigma-mappers with k = 3 are nearly optimal at rates around 1 bit/dim.In the following, we described four concrete signal mappers at

SNR = 3(� 4.77 dB). In this case, the channel capacity is 1 bit/dim.

1) Gallager mapping for equispaced 4-PAM: Consider k = 5. AGallager mapper is determined by

�G(v) =

3�; if 0 � D(v) < 3

�; if 3 � D(v) < 16

��; if 16 � D(v) < 29

�3�; if 29 � D(v) < 32

(6)

where v 2 52 and D(v) = 0�i�k�1 v

(i)2i stands forthe decimal representation of v. It can be verified thatPX = (3; 13; 13; 3)=32 and A = �(�3;�1; 1; 3) with� = 6=5.

2) Natural mapping for equispaced 4-PAM: In this case, PX =(1; 1; 1; 1)=4 and A = �(�3;�1; 1; 3) with � = 3=5.

3) Type-I sigma-mapping: Consider k = 3. In this case, PX =(1; 3; 3; 1)=8 and �0 = �1 = �2 = 1. The signal constellationis A = (�3;�1; 1; 3).

4) Type-II sigma-mapping:Consider k = 2. Obviously, parameters�0 and �1 can be chosen to maximize I�. Here, we simply set�0 =

p1:84 and �1 =

p1:16. Therefore, PX = (1; 1; 1; 1)=4

and

A = (��0 � �1;��0 + �1; �0 � �1; �0 + �1):

3Actually, is a linear transform of a binomial random variable.

Page 4: Coded modulation using superimposed binary codes

3334 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

Fig. 3. Mutual information versus SNR curves for different signal mappers.

TABLE IINFORMATION-THEORETIC ANALYSIS OF FOUR SPECIFIC SIGNAL MAPPERS

The properties of these four signalmappers are listed in Table I. It canbe seen that the type-I sigma-mapper is as nearly optimal as Gallagermapper in terms of mutual information. The geometrical and statisticalproperties of the corresponding signal constellations are also depictedin Fig. 4, where a signal point is represented by a black square with itsamplitude at the centroid and its probability proportional to the area.

III. SISO DEMAPPING ALGORITHMS

In this section, we first describe the message processing/passingalgorithms on general normal graphs, and then derive the optimal/sub-optimal SISO demapping algorithms based on normal realizations.

A. Message Processing/Passing Algorithms on Normal Graphs

Let Z be a random variable with realizations z 2 Z . The notationZ � PZ(z) is used to represent that Z has pmf fPZ(z); z 2 Zg if Zis discrete and also to represent that Z has pdf fPZ(z); z 2 Z � gif Z is continuous. Generally, a discrete variable is characterized byits pmf (a nonnegative vector of length jZj), while a continuous vari-able is characterized by parameters that determine its pdf. For example,a binary random variable is characterized by fPZ(0); PZ(1)g, and aGaussian random variable is characterized by its mean and variance.In particular, we use Z � �(Z = z) to represent that Z is a random

Fig. 4. Four specific signal constellations with bit labelings. A signal pointis represented by a black square with its amplitude at the centroid and itsprobability proportional to the area. (a) Gallager mapping for equispaced4-PAM with = 5. (b) Natural mapping for equispaced 4-PAM. (c) Type-Isigma-mapping with = 3. (d) Type-II sigma-mapping with = 2.

Page 5: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3335

variable with an observation z. We call Z � PZ a message4 indicatingthat fPZ(z); z 2 Zg may not be the exact pmf/pdf of Z .Let Z = fZ1; . . . ; Zi; . . . ; Zng be n distinct random variables that

constitute a subsystem S(0). Following Forney [20], this subsystem canbe represented as a normal subgraph with edges representing Z and avertex S(0) representing the subsystem constraints, as shown in Fig. 5.Each half-edge (ending with a dongle) may potentially be coupled tosome half-edge in other subsystems. For example,Z1 andZ5 are shownto be connected to subsystems S(1) and S(m), respectively. In this case,the corresponding edge is called a full-edge.Consider a normal graph with vertices (subsystems) fS(i); 0 � i �

mg shown in Fig. 5. Assume that, for every random variable Z in-volved in this system, an a priori message Z � P

(a)Z is available.

A visit to the vertex S(0) with respect to variables fZi g � Z , de-

noted by V S(0); fZi g , is defined as two sequential operations inthe following.

1) Message processing: For each variable Zi of interest, re-esti-mate its distribution by considering the subsystem constraintsS(0) and all available a priori messages fZi � P

(a)Z g but

Zi � P(a)Z . The new estimated distribution is written as

P(e)Z (zi ) = PZ zi k S(0); Zi � P

(a)Z ; i 6= ij (7)

for zi 2 Zi , where the notation PZ (� k �) rather than

PZ j�(�j�) implies that P (e)Z may not be a conditional proba-

bility function (in the traditional sense) since the conditions S(0)

and fZi � P(a)Z ; i 6= ijg may not be a probabilistic event. We

call Zi � P(e)Z an extrinsic message.

2) Message passing: For each full-edge Zi of interest, update thea priori message Zi � P

(a)Z by the extrinsic message Zi �

P(e)Z , i.e., set

P(a)Z (zi ) = P

(e)Z (zi ); for zi 2 Zi : (8)

Obviously, we can define a visit to any vertex with respect toany collection of variables. Such a visit is also called a message pro-cessing/passing algorithm or a soft-input-soft-output (SISO) algorithm.Note that it is not suitable to start the updating (the message-passingstep in a visit) unless all extrinsic messages of interest are re-estimated.Also note that updating the message with respect to a half-edge is notallowed. A sequence of visits

V(S0; �)! V(S1; �)! � � � ! V(Sq�1; �)

stands for a sequential algorithm consisting of q steps. At step `, thealgorithm makes a visit to S`, where S` = S(i) for some 0 � i � m.

B. SISO Demapping for General Signal Mappers

The combination of a general signal mapper and the AWGNchannel can be considered as a (sub)system �, as shown in Fig. 6(a).The available a priori messages include V (i) � P

(a)

Vfor all i,

W � N (0; 1), and Y � �(y). Then three types of extrinsic messagescan be estimated.

4When more general (other than discrete and continuous) random variablesare considered, a message can be defined as a measure. More generally, a mes-sage can be defined as any quantities that specify a measure. For example, amessage with respect to a binary random variable can be the log-likelihoodratio (LLR) log ( (0) (1)).

Fig. 5. A normal graph of a general (sub)system.

Fig. 6. Normal graphs of signal mappers over the AWGN channel. (a) Ageneral signal mapper �. (b) A general sigma-mapper � with parameters�0; �1; . . . ; �k�1.

Fig. 7. A trellis diagram for type-I sigma-mappers with k = 3.

• For all V (i), m 2 2, P(e)

V(m) is given as

PV m k �; V (j) � P(a)

V; j 6= i ;

W � N (0; 1); Y � �(y) (9)

/v2

�(v(i) = m)PV (v n v(i))PY jX(yj�(v)) (10)

where

PY jX(yj�(v)) = 1p2�

exp(�(y � �(v))2=2)

Page 6: Coded modulation using superimposed binary codes

3336 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

is the conditional transition pdf of the channel and

PV (v n v(i)) =j 6=i

P(a)

V(v(j)) (11)

is an assumptionwhich (approximately) holds if the normal graphrepresenting the whole system has no short cycles and all a priorimessages are collected from other subsystems rather than the cur-rent subsystem �.

• Obviously, P (e)Y is a linear combination of several Gaussian pdfs

P(e)Y (y) =

v2

PV (v)PY jX(yj�(v)) (12)

for y 2 . For simplicity, wemake the following approximations:

P(e)Y (y) � 1

2��2Yexp � (y � �Y )

2

2�2Y(13)

for y 2 , where

�Y =

v2

PV (v)�(v) (14)

and

�2Y = 1 +

v2

PV (v)�2(v)� �

2Y : (15)

In words, conditioned on all messages from other variables,Y is approximated as a Gaussian random variable. Such anapproximation, called Gaussian approximation, becomes accu-rate when V is nearly determined. The extrinsic message withrespect to Y can be utilized to design stop criteria for iterativedecoding/demapping algorithms, see Section IV.

• We can also estimate the extrinsic message with respect to W ,which could be utilized to adaptively modify the channel param-eters for time-varying channels. In particular, when V is knownat the receiver, such a computation is equivalent to the channelestimation.

C. SISO Demapping for Type-I Sigma-Mappers

Evidently, the computation complexity of the SISO demapping algo-rithm is exponentially increasing with k for general signal mappers. Inthis subsection, we will show that, for a type-I sigma-mapper, an irreg-ular trellis can be constructed to specify the relationships between theinput V and the noiseless outputX , see Fig. 7 for an example. There-fore, we may compute all extrinsic messages with respect to V basedon the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm [21] and hencereduce the computation complexity to be of order k2.The trellis has k stages. At stage i, there are i + 1 states and the

initial state is denoted by s0 = 0. Emitted from each state si, thereare two branches corresponding to v(i) = 0 and v(i) = 1, respec-tively. A binary vector v corresponds to a path through the trellis withsi+1 = si + v(i). Therefore, the terminal state sk is equal to theHamming weight of v and the transmitted signal x is hence equal to�(k � 2sk), where � is the parameter of the type-I sigma-mapper. Toa given branch b with labels v(i), a metric �[b] = P

(a)

V(v(i)) is as-

signed. The forward recursion variables are initialized by an obviousway, while the backward variables are initialized by, for 0 � s � k

� Tkk (�; �; s) =

1p2�

exp � (y � �(k � 2s))2

2: (16)

For a general complexity analysis and the notation, see [22].

D. Suboptimal SISO Demapping for Sigma-Mappers

The graph shown in Fig. 6(b) is the normal realization of the fol-lowing equation:

Y =

k�1

i=0

�i(1� 2V (i)) +W: (17)

Let V (j) be of interest. We may rewrite (17) as

Y = �j(1� 2V (j)) + ~W (18)

where

~W = W +i6=j

�i(1� 2V (i)) (19)

is the superposition of the channel noise and “inter-user interferences.”As discussed below (15), ~W may be approximated as a Gaussian vari-able [17], [23], [24]. The mean and variance of ~W are calculated as

� ~W =i6=j

�i(1� 2�V ) (20)

and

�2~W = 1 +

i6=j

4�2i�2V

(21)

where

�V = P(a)

V(1) (22)

and

�2V

= P(a)

V(0)P

(a)

V(1): (23)

Combining the available messages

Y � �(y) and ~W � N � ~W ; �2~W

with (18), we can easily estimate P (e)

Vas follows:

P(e)

V(m) / exp � (y � �j(1� 2m)� � ~W )2

2�2~W(24)

form 2 2. Note that the derivations above apply to any linear map-ping and hence to the natural mapping. For all linear signal mappers,the complexity of demapping algorithm by using Gaussian approxima-tion is linearly increasing with k.

IV. CODED MODULATION SCHEMES WITH SIGMA-MAPPERS

In this section, we investigate possible coding/decoding schemeswith sigma-mappers. The target rate is assumed to be r bits/dim, andhence the Shannon limit is E� = 22r � 1 from (1) and the assumption�2 = 1.

A. Single-Level Coding With Sigma-Mappers

Encoding: As shown in Fig. 8, a binary sequence u of length K isencoded by a binary code (typically, a turbo-like code), resulting in acoded sequence c of length L. The randomly-interleaved version ~c of cis converted by a serial-to-parallel converter into k sequences v(i) for0 � i � k � 1 of length N . These sequences are then mapped to asignal sequence x of length N by a sigma-mapper �� such that xt =��(vt), where vt = (v

(0)t ; . . . ; v

(k�1)t ). Several rules for choosing

parameters of this system are listed as follows.

Page 7: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3337

Fig. 8. The single-level coding/sigma-mapping scheme.

1) The key is how to choose k. Given a tolerated energy loss� > 0. Choose k as small as possible such that I� > r andE� < E� + �.

2) The coding rate of the binary code ro is then determined byro = r=k. Then choose K and L such that K=L = ro andL is a multiple of k. Of course, K should be large enough inorder to approach the capacity.

3) The interleaver is a random interleaver of length L.

Decoding: The normal graph for the single-level coding/sigma-map-ping scheme is shown in Fig. 9, where we have combined the randominterleaver with the serial-to-parallel converter into a single vertex (sub-system)�. Therefore, the whole system S consists of three subsystemsC, �, and �. Assume that

Ui � P(a)U (0); P

(a)U (1) ; for 1 � i � K; Wt � N (0; 1)

and Yt � �(yt) for 1 � t � N . The APP algorithm is to computeP(e)U by considering all the a priorimessages but Ui � P

(a)U as well as

the whole system constraints. Once this is done, we make the followingdecisions:

ui =0; if P (a)

U (0)P(e)U (0) > P

(a)U (1)P

(e)U (1)

1; otherwise.(25)

The APP algorithm is optimal in the sense that the BER is mini-mized. However, due to the existence of the large interleaver in thesystem S, it seems no easy way to perform the APP algorithm. An im-plementable algorithm is the suboptimal iterative algorithm, which isbased on the well-known turbo principle [25]–[27]. There are manyschedules (especially when the binary code is a turbo-like code) to per-form the iterative decoding/demapping algorithm on the normal graph.In this correspondence, we employ the following serial procedures.

Algorithm 1• Initialization:1) All intermediate binary random variables C and V are assumed

to be Bernoulli-1=2 distributed.2) Set a maximum iteration number J and an iteration variable

j = 1.3) Set h0(Y ) = 0 and a threshold � � 0.

• Repeat while j � J :1) Perform visits

V(�; V )! V(�; C)! V(C; fU;Cg)! V(�; V ): (26)

2) Perform the visit V(�; Y ) and estimate the entropy rate of Y by

hj(Y ) = �1

N

N

t=1

log P(e)Y (yt) : (27)

3) If jhj(Y )� hj�1(Y )j < �, set j = J + 1; else set j = j + 1.• Make decisions: Determine U according to (25).

Fig. 9. A normal graph for the single-level coding/sigma-mapping scheme.

Though the vertex � involves (k + 2)N variables, the visit to � isessentially equivalent to N SISO processors as described in previoussection. This is because both the sigma-mapper and the AWGNchannelare memoryless. In particular, the visit V(�; Y ) is equivalent to esti-mating �Y and �2Y for all t if Yt is viewed as a Gaussian random vari-able, which is similar to the calculations described in Section III-D.The visit to� is quite simple, since the only constraint existing in� isthe physical positions. The visit toC will be described in Section IV-Busing specified examples.The objective of the stop criterion is to avoid unnecessary iterations

provided that the iterative algorithm has already reached a “fixed point.”Similar to other stop criteria [28]–[30], the proposed criterion is basedon empirical observations. For most of realizations U � �(u) andW � �(w), the extrinsic messages with respect to Y will not changetoo much after many iterations. Especially, when most a priori mes-sages with respect to V are nearly “degenerate,” that is, P (a)

V(0) � 0

or 1 for most 0 � i � k � 1 and 1 � t � N , we have

�Y � xt + zt (28)

and

�2Y � �2W = 1 (29)

where xt and zt are the true value of the transmitted signal and the errorat time t, respectively. Therefore, for large N

hj(Y ) = �1

N

N

t=1

logP(e)Y (yt) (30)

= �1

N

N

t=1

log1

2��2Y

�(yt � �Y )2

2�2Y(31)

�1

2log(2�e) +

1

2N

N

t=1

z2t � 2wtzt (32)

where wt is the noise value and the law of large numbers has beenapplied to the derivation above. Consequently, if few errors exist aftermany iterations, hj(Y ) will approach a value near 1

2log(2�e). If we

make a further assumption5 that Zt and Wt are uncorrelated, hj(Y )will approach a value near 1

2log(2�e) + 1

2E(Z2). Therefore, “blind”

5For a turbo system with large interleavers, such an assumption is in generalreasonable since is caused by unreliable extrinsic messages from other localconstraints related to other noise samples.

Page 8: Coded modulation using superimposed binary codes

3338 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

Fig. 10. The multilevel coding/sigma-mapping scheme.

simulation (without knowing U ) could be performed. We are not goinginto details here.

B. Multilevel Coding With Sigma-Mappers

Encoding: As shown in Fig. 10, a binary data sequence u of lengthKis partitioned into k subsequences u(i) of lengthK(i) for 0 � i � k�1. The ith subsequence u(i) is encoded by a component code at the ithlevel, resulting in a sequence c(i) of lengthN . The random-interleavedversion v(i) of c(i) for all i are then mapped to a signal sequence x suchthat xt = ��(vt), where vt = (v

(0)t ; . . . ; v

(k�1)t ).

Obviously, the multilevel coding/sigma-mapping scheme can beconsidered as an instance of Imai’s MLC scheme [31]. Therefore, wemay apply the rate-allocation strategies [8] to allocate the total rateamong “users” when sigma-mappers are fixed. On the other hand,due to the special features of sigma-mappers, we can apply the fol-lowing power-allocation strategies to facilitate the iterative multistagedecoding (MSD) algorithm.

• Power-allocation A: Assume that all component codes C(i) andhence r(i), 0 � i � k � 1 have been fixed. We propose thefollowing simulation-based recursive search algorithm, which isconceptually simple, to determine the scalars �i, 0 � i � k� 1.Assume that scalars �i, k�j � i � k�1 have been determinedfor a j-level (j < k) system with component codesC(i), k�j �i � k � 1 such that the BER of the j-level system is around thedesignated BER.We then add one more level with the componentcode C(k�j�1) and choose �k�j�1 by simulation such that theBER of the (j + 1)-level system is around the designated BER.

• Power-allocation B: Evidently, power-allocation A is time con-suming especially for systems with large k and complex compo-nent codes. When all component codes are identical, we can usethe following power-allocation method, which is not as (sub)op-timal as power-allocation A (see [32] for details). We simplychoose �i such that 0�i�k�1 �

2i = Es and

�2i = 1 +j>i

�2j = �2k�1:

Given Es, we have

�2i = (p1 + Es � 1)(

p1 + Es)

k�1�i; for 0 � i � k � 1:

Similar equations can be found in [17], [33].

Decoding: The normal graph for the multilevel coding/sigma-map-ping scheme is shown in Fig. 11. One iterative decoding/demappingprocedure with stop criteria is described as follows.

Algorithm 2

• Initialization:1) All intermediate binary random variables C and V are assumed

to be Bernoulli-1=2 distributed.2) Set a maximum iteration number J . Set an iteration variable

j = 1.3) Set h0(Y ) = 0 and a threshold � � 0.

• Repeat while j � J :

1) Perform visits

V(�; V (0))!V(�; C(0))

!V(C(0); fU (0); C(0)g)! V(�; V (0))

V(�; V (1))!V(�; C(1))

!V(C(1); fU (1); C(1)g)! V(�; V (1))

...

V(�; V (k�1))!V(�; C(k�1))

!V(C(k�1); fU (k�1); C(k�1)g)!V(�; V (k�1)):

2) Perform the visit V(�; Y ) and estimate the entropyrate of Y

hj(Y ) = � 1

N

N

t=1

log P(e)Y (yt) : (33)

3) If j hj(Y )� hj�1(Y ) j< �, set j = J + 1; else setj = j + 1.

• Make decisions: Determine U(i)

for all i according to (25).

C. Remarks on the Coding/Sigma-Mapping Schemes

We have several comments on the coding/sigma-mapping schemes.

• The single-level coding/sigma-mapping scheme can be consid-ered as a bit-interleaved coded modulation (BICM) scheme [34]with a specific signal mapper. The main difference is explainedas follows. For the conventional BICM with QAM signaling andGray mapping [34]–[36], the iterative processing between the de-coder and the demapper is not required.6 However, it is generallyrequired to perform the iterative decoding/demapping algorithmsfor the single-level coding/sigma-mapping scheme.

• Since themultilevel coding/sigma-mapping scheme is an instanceof MLC scheme, the asymptotic performance of the system canbe analyzed by using the tools developed in [8], [39]. It is worthnoting that, for the conventional MLC with lattice-based con-stellations and set-partitioning-based bit mappings [40], different

6When other mapping methods are adopted, iterative decoding/demappingalgorithm can be used to improve the performance [37], [38].

Page 9: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3339

Fig. 11. A normal graph for the multilevel coding/sigma-mapping scheme.

levels are usually protected by codes with different rates. In con-trast, by choosing appropriate amplitudes �i’s in the multilevelcoding/sigma-mapping systems, the component codes at differentlevels can be same.

• The multilevel coding/sigma-mapping scheme can be viewed asan extension of the scheme proposed in [16]. At the transmitter,a random interleaver is introduced at each level between the en-coder and the sigma-mapper. At the receiver, an iterative (insteadof stripping) decoding/demapping algorithm is performed.

• The multilevel coding/sigma-mapping system can be treated asa multiuser system by viewing one level as one user. So it is notsurprising that most important methods and results for the mul-tiuser system are applicable here. In addition, since the coopera-tion among different “users” is perfect, we are able to play moreat both the transmitter and the receiver. Some closely related ref-erences are [17], [23], [33], [41], [44].

V. EXAMPLES

Before we go into specified examples, we first introduce the notionof SNRnorm that appeared in [2], which is defined as

SNRnorm = Es=(22r � 1)

where 22r�1 is the Shannon limit at rate r bits/dim, andEs is the trans-mission energy per dimension. We will plot the BER versus SNRnormcurves, and thus it is easy to read out the gap to the Shannon limit. Thecoding gain over the uncoded conventionalM -PAM can be evaluatedby 9 � SNRnorm [in decibels].

A. Transmission 1 Bit Per Dimension

We choose k = 3 for the type-I sigma-mapper since it is nearly op-timal. If the single-level coding scheme is employed, we need a binaryerror-correcting code of rate of 1=3. The code we used is a turbo code.The encoder is shown in Fig. 12(a). The iterative decoding/demapingalgorithm (Algorithm 1) has been described in Section IV-A. What weneed to specify is the visit to C. The normal graph for the decoder isshown in Fig. 12(b). The visit V(C; fU;Cg)we performed in this cor-respondence is listed as follows.

Algorithm 3

V(T(0); U (0))!V(=; U (1))! V(�; ~U(1))

!V(T(1); ~U(1))! V(�; U (1))

!V(=; fU; U (0); U (1)g)

!! V(T(0); fC(0); C(1)g)

! V(�; ~U(1))! V(T(1); C(2)):

In this example, the optimal SISO demapping algorithm is per-formed based on the trellis shown in Fig. 7. The performance is shownin Fig. 13. Like other turbo-like systems, the code performance isgetting better with the increase of the frame length. At BER of 10�5,

Fig. 12. A turbo code of rate 1=3. (a) The encoder. (b) The normal graph forthe decoder.

this coding/sigma-mapping system with frame length N = 1000000is about 1.2 dB far from the Shannon limit. Compared with the un-coded 1-bit/dim system (2-PAM), this simple coding/sigma-mappingsystem with frame length N = 1000 can offer coding gain (withoutsacrificing bandwidth) of 5 dB at BER 10�5.

B. Transmission 1.5 Bits Per Dimension

We apply the standard turbo code [42] to a multilevel coding system.In this case, the system has three levels. The length of the binary datastream at each level is set to be 5000, that is,K(0) = K(1) = K(2) =5000. The scalars �2 and �1 are fixed as 1:12 and 1:67, respectively,which are obtained by the simulation-based power-allocation algorithm(power-allocation A).The iterative decoding/demapping algorithm (Algorithm 2) has been

described in Section IV-B. Obviously, for many iterations in the begin-ning, the extrinsic messages from high levels are not useful since thecorresponding “channels” are too noisy. Though those extrinsic mes-sages are not harmful either (observed from simulations), we prefer notto perform the visit to next level unless the turbo algorithm at currentlevel has reached a (temporal) “fixed point.”More precisely, we will re-place the visit V(C(i); fU (i); C(i)g) in Algorithm 2 with the followingiterative algorithm.

Algorithm 4• Set jc = 1.• Repeat while jc � Jc:1) Perform the visit V(C(i); fU (i); C(i)g).2) Perform visits V(�; V (i))! V(�; Y ) and estimate the entropy

rate of Y

h(i)j (Y ) = �

1

N

N

t=1

log P(e)Y (yt) : (34)

3) If h(i)j (Y )� h

(i)j �1(Y ) < �, set jc = Jc + 1; else set jc =

jc + 1.

The maximum iteration Jc > 0 for the inner iteration (Algorithm 4)described above is predetermined and the threshold � � 0 can bechosen as the one for the outer iteration (Algorithm 2) described inSection IV-B. The entropy rate h(i)0 (Y ) is initialized to be zero.

Page 10: Coded modulation using superimposed binary codes

3340 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

Fig. 13. Performance of the single-level coding/sigma-mapping system with rate of 1 bit/dim. The maximum iteration number J = 50 and the threshold � =10�6.

Fig. 14. Performance of the three-level coding/sigma-mapping system taking Berrou et al. ’s turbo code of length 10 000 as component codes. The coding rateis 1.5 bits/dim.

Page 11: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3341

Fig. 15. A normal graph for the doped code.

Fig. 16. Performance of the four-level coding/sigma-mapping system taking ten Brink’s doped code of length 500 000 as component codes.The coding rate is2 bits/dim. The maximum outer iteration number = 6, inner iteration number = 100, and the threshold = 10 .

Fig. 17. The signal constellation of the type-II sigma-mapper with = 4 at SNR = 12.52 dB. The parameters ’s are determined by power-allocation B.The corresponding mutual information is 2.0776 bits/dim.

Since C(i) is a turbo code, the visit V(C(i); fU (i); C(i)g) in Algo-rithm 4 is similar to Algorithm 3 described in previous example.In this example, the suboptimal (Gaussian approximation) SISO

demapping algorithm is utilized. The simulation results are shown inFig. 14. The MSD algorithm (also called stripping algorithm [16])is performed as usual. After 30 iterations at the ith level (startingfrom i = 0), the “inter-user interferences” are canceled from thereceived sequence based on the hard decisions of the coded bits. Forthe iterative MSD, we set J = 3, Jc = 20, and � = 10�6. It can beseen that, by introducing random interleavers in the front end of thesigma-mapper, the iterative MSD can be utilized to improve the BERperformance. On the other hand, if no interleavers exist between theencoders and the sigma-mapper, the performance cannot be improvedmuch by the iterative MSD.

C. Transmission 2 Bits Per Dimension

In this example, we apply the doped code [43] to a four-level codingsystem. The lengthK(i) of the binary data stream U (i) at each level isset to be 250 000. The parameters of the sigma-mapper are determinedby power-allocation B.

The iterative decoding/demapping algorithm for this example issimilar to Algorithm 2/Algorithm 4. We only need to specify the visitto the doped code C(i) at each level in Algorithm 4. The normal graphof the doped code is shown in Fig. 15, where the vertex R representsthe outer repeat code of rate 1=2 and the vertex A represents theinner “accumulator” with doped outputs, see [43] for details. The visitV(C(i); fU (i); C(i)g) is specified as follows.

Algorithm 5

V(A; ~O)!V(�; O)! V(R; fU (i); Og)

!V(�; ~O)! V(A; C(i)):

In this example, we utilize the optimal SISO demapping algorithmdescribed in Section III-B. The simulation results are shown inFig. 16. At SNR = 12.52 dB, BER < 10�5, which is slightlybetter than the equiprobable equispaced 16-PAM capacity limit.It is interesting to note that the resulting signal constellation atSNR = 12.52 dB is neither equispaced nor equiprobable since

Page 12: Coded modulation using superimposed binary codes

3342 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004

��(f0; 1; 0; 1g) and ��(f1;0; 0; 0g) (symmetrically, ��(f0;1; 1; 1g)and ��(f1; 0; 1; 0g)) are almost indistinguishable in the presence ofnoise, as shown in Fig. 17. Also note that this result is better thanthat presented by Wachsmann et al. in [8] (as shown in Fig. 16),which is the best simulation result at rate 2 bits/dim (known tous) in the literature. In [8], a multilevel code with four levels isdesigned for 64-QAM constellation. One level is a trellis shapingcode that constitutes the source coding layer, while the other threelevels constitute the channel coding layer. At rate of 4 bits pertwo-dimension, the simulated SNR at BER = 10�5 is almost sameas the equiprobable conventional 32-QAM capacity limit. See [8,Fig. 10 and Section VIII.B] for details.

VI. CONCLUSION

We have distinguished between two coding/shaping schemes:two-layer scheme and one-layer scheme. Unlike the two-layercoding/shaping scheme that has been investigated by many authorssince the 1980s, the one-layer coding/shaping scheme has rarelybeen mentioned before. To show the optimality of the one-layercoding/shaping scheme, we have taken the mutual information asthe criterion. The signal mapper is first designed such that its mutualinformation is as high as possible under certain constraints. Thenan outer code is designed to approach the mutual information of thesignal mapper. The rationality of such a two-step design is guaranteedby Gallager’s lemma.We have shown that sigma-mapping is nearly optimal by comparing

the corresponding mutual information with the channel capacity. Wealso provided several rules for choosing the parameters of the wholecoding/shaping system. If a single-level code is utilized, we illustratedhow to choose coding rates for the “inner code” (sigma-mapper) andthe outer code. If a multilevel code is utilized, we proposed designingthe sigma-mapper based on power allocation, which is more convenientthan designing component codes based on rate allocation.We have shown that a type-I sigma-mapper can be represented by

an irregular trellis. Therefore, the implementation of the optimal SISOdemapping algorithm for type-I sigma-mappers is simpler than generalsignal mappers. The suboptimal SISO demapping algorithm by usingGaussian approximations was rederived, which can be generalized toany linear signal mappers.We have generalized the concept of extrinsic messages to the re-

ceived sequence. The estimated extrinsic messages with respect to thereceived sequence have been utilized to design stop criteria for iterativedecoding/demapping algorithms.

ACKNOWLEDGMENT

The authors are grateful to Prof. Bai, Dr. Leung, Mr. Liu, and Ms.Wu for some useful discussions.

REFERENCES

[1] C. E. Shannon, “A mathematical theory of communications,” Bell Syst.Tech. J., pt. I, vol. 27, pp. 379–423, 1948.

[2] G. D. Forney, Jr and G. Ungerboeck, “Modulation and coding forlinear Gaussian channels,” IEEE Trans. Inform. Theory, vol. 44, pp.2384–2415, Oct. 1998.

[3] G. D. Forney, Jr, R. G. Gallager, G. R. Lang, F. M. Longstaff, and S.U. Qureshi, “Efficient modulation for band-limited channels,” IEEE J.Select. Areas Commun., vol. SAC-2, pp. 632–647, Sept. 1984.

[4] G. D. Forney, Jr and L.-F. Wei, “Multidimensional constellations-PartI: Introduction, figures of merit, and generalized cross constellations,”IEEE J. Select. Areas Commun., vol. 7, pp. 877–892, Aug. 1989.

[5] R. G. Gallager, Information Theory and Reliable Communica-tion. New York: Wiley, 1968.

[6] F. R. Kschischang and S. Pasupathy, “On optimal nonuniform signalingfor Gaussian channels,” IEEE Trans. Inform. Theory, vol. 39, pp.913–929, May 1993.

[7] G. D. Forney, Jr,M.D. Trott, and S.-Y. Chung, “Sphere-bound-achievingcoset codes and multilevel coset codes,” IEEE Trans. Inform. Theory,vol. 46, pp. 820–850, May 2000.

[8] U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes:Theoretical concepts and practical design rules,” IEEE Trans. Inform.Theory, vol. 45, pp. 1361–1391, July 1999.

[9] Q. Wang, L. Wei, and R. A. Kennedy, “Iterative viterbi decoding, trellisshaping, and multilevel structure for high-rate parity-concatenatedTCM,” IEEE Trans. Commun., vol. 50, pp. 48–55, Jan. 2002.

[10] R. E. Blahut, “Computation of channel capacity and rate-distortion func-tions,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 460–473, July 1972.

[11] S. Arimoto, “An algorithm for computing the capacity of arbitrary dis-crete memoryless channels,” IEEE Trans. Inform. Theory, vol. IT-18,pp. 14–20, Jan. 1972.

[12] N. Varnica, X. Ma, and A. Kavcic, “Capacity of power constrainedmemoryless AWGN channels with fixed input constellations,” inProc. IEEE Global Telecommunications Conf. (GLOBECOM), Taipei,Taiwan, China, Nov. 2002, pp. 1339–1343.

[13] D. Raphaeli and A. Gurevitz, “Constellation shaping for pragmatic turbocoded modulation,” Electron. Lett., vol. 38, pp. 717–719, July 2002.

[14] F.-W. Sun andH. C. A. van Tilborg, “Approaching capacity by equiprob-able signaling on the Gaussian channel,” IEEE Trans. Inform. Theory,vol. 39, pp. 1714–1716, Sept. 1993.

[15] C. Fragouli, R. D. Wesel, D. Sommer, and G. P. Fettweis, “Turbo codeswith nonuniform constellations,” in Proc. IEEE Int. Conf. Communica-tions (ICC 2001), June 2001, pp. 70–73.

[16] L. Duan, B. Rimoldi, and R. Urbanke, “Approaching theAWGNchannelcapacity without active shaping,” in Proc. IEEE Int. Symp. InformationTheory, Ulm, Germany, June/July 1997, p. 374.

[17] N. Chayat and S. Shamai (Shitz), “Iterative soft onion peeling for multi-access and broadcast channels,” in Proc. 9th IEEE Int. Symp. Personal,Indoor and Mobile Radio Communications, Boston, MA, Sept. 1998.

[18] A. Kavcic, X. Ma, and N. Varnica, “Matched information rate codes forpartial response channels,” IEEE Trans. Inform. Theory. Presented inpart at the IEEE Int. Symp. Information Theory, Lausanne, Switzerland,June/July 2002, submitted for publication.

[19] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

[20] G. D. Forney, Jr, “Codes on graphs: Normal realizations,” IEEE Trans.Inform. Theory, vol. 47, pp. 520–548, Feb. 2001.

[21] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linearcodes for minimizing symbol error rate,” IEEE Trans. Inform. Theory,vol. IT-20, pp. 284–287, Mar. 1974.

[22] X. Ma and A. Kavcic, “Path partitions and forward-only trellis algo-rithms,” IEEE Trans. Inform. Theory, vol. 49, pp. 38–52, Jan. 2003.

[23] F. Brännström, T. M. Aulin, and L. K. Rasmussen, “Iterative detectorsfor trellis-code multiple-access,” IEEE Trans. Commun., vol. 50, pp.1478–1485, Sept. 2002.

[24] L. Ping, K.-Y. Wu, L.-H. Liu, and W.-K. Leung, “A simple unified ap-proach to nearly optimal multiuser detection and space-time coding,” inProc. IEEE Information Theory Workshop, Bangalore, India, Oct. 2002,pp. 53–56.

[25] S. Benedetto, D. Divsalar, and J. Hagenauer, “Guest editorial—Concate-nated coding techniques and iterative decoding: Sailing toward channelcapacity,” IEEE J. Select. Areas Commun., vol. 16, pp. 137–139, Feb.1998.

[26] P. H. Siegel, D. Divsalar, E. Eleftheriou, J. Hagenauer, and D. Rowitch,“Guest editorial—The turbo principle: From theory to practice,” IEEEJ. Select. Areas Commun., vol. 19, pp. 793–799, May 2001.

[27] , “Guest editorial—The turbo principle: From theory to practice II,”IEEE J. Select. Areas Commun., vol. 19, pp. 1657–1661, Sept. 2001.

[28] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binaryblock and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42,pp. 429–445, Mar. 1996.

[29] R. Y. Shao, S. Lin, and M. P. C. Fossorier, “Two simple stopping criteriafor turbo decoding,” IEEE Trans. Commun., vol. 47, pp. 1117–1120,Aug. 1999.

[30] Y. Wu, B. D. Woerner, and W. J. Ebel, “A simple stopping criterion forturbo decoding,” IEEE Commun. Lett., vol. 4, pp. 258–260, Aug. 2000.

[31] H. Imai and S. Hirakawa, “A new multilevel coding method using error-correcting codes,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 371–377,May 1977.

[32] X.Ma and L. Ping, “Power-allocations formultilevel codingwith sigma-mapping,” Electon. Lett., vol. 40, pp. 609–611, May 2004.

[33] A. J. Viterbi, “Very low rate convolutional codes for maximum theoret-ical performance of spread-spectrummultiple-access channels,” IEEE J.Select. Areas Commun., vol. 8, pp. 641–649, May 1990.

[34] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula-tion,” IEEE Trans. Inform. Theory, vol. 44, pp. 927–946, May 1998.

Page 13: Coded modulation using superimposed binary codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 3343

[35] S. Le Goff, “Signal constellations for bit-interleaved codedmodulation,”IEEE Trans. Inform. Theory, vol. 49, pp. 307–313, Jan. 2003.

[36] A. Banerjee, D. J. Costello, Jr, T. E. Fuja, and P. C. Massey, “Bit in-terleaved coded modulation using multiple turbo codes,” in Proc. IEEEInt. Symp. Information Theory, Lausanne, Switzerland, June/July 2002,p. 269.

[37] S. ten Brink, J. Speidel, and R.-H. Yan, “Iterative demapping for QPSKmodulation,” Electron. Lett., vol. 34, pp. 1459–1460, July 1998.

[38] X. Li, A. Chindapol, and J. A. Ritcey, “Bit-interleaved codedmodulationwith iterative decoding and 8PSK signaling,” IEEE Trans. Commun.,vol. 50, pp. 1250–1257, Aug. 2002.

[39] Y. Kofman, E. Zehavi, and S. Shamai (Shitz), “Performance analysis ofa multilevel coded modulation system,” IEEE Trans. Commun., vol. 49,pp. 299–312, Feb./Mar./Apr. 1994.

[40] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEETrans. Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982.

[41] P. R. Chevillat, “N-user trellis coding for a class of multiple-access chan-nels,” IEEE Trans. Inform. Theory, vol. IT-27, pp. 114–120, Jan. 1981.

[42] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror-correcting coding and decoding: Turbo-codes,” in Proc. IEEEInt. Conf. Communications, Geneva, Switzerland, May 1993, pp.1064–1070.

[43] S. ten Brink, “Rate one-half code for approaching the Shannon limit by0.1 dB,” Electron. Lett., vol. 36, pp. 1293–1294, July 2000.

[44] H. Schoeneich and P. A. Hoeher, “Adaptive interleave-division multipleaccess-A potential air interference for 4G bearer services and wirelesslans,” in Proc. 1st IEEE and IFIP Int. Conf. Wireless and Optical Com-munications and Networks (WOCN ’2004), Muscat, Oman, June 2004,pp. 179–182.

Geometrical Uniformity of a Class of Space–TimeTrellis Codes

Zhiyuan Yan, Member, IEEE, andD. Mihai Ionescu, Senior Member, IEEE

Abstract—The continued relevance of the Euclidean distance in (flat)fading scenarios renders meaning to distance preserving properties forcodes designed for fading channels. Geometrical uniformity can shed morelight on the structure of various techniques for jointly encoding acrossmultiple transmit antennas, and can assist in the design of, or systematicsearch for, better coding schemes. In this correspondence, a family ofspace–time codes introduced by Ionescu et al. is treated as signal spacecodes, then proved to be generalized coset codes, and thereby geometri-cally uniform. It is then shown that geometrical uniformity does remainmeaningful in fading channels, for an optimally designed space–time code.

Index Terms—Coset codes, Euclidean distance, geometrical uniformity,space–time codes.

I. INTRODUCTION

It has been observed that good codes exhibit a certain regularstructure [25], and some have been shown to have various geometrical

Manuscript received March 23, 2003; revised February 17, 2004. This workwas supported in part by Nokia Corporation, and in part by the National ScienceFoundation under Grant ITR–0085929 and by Lehigh University.Z. Yan was with the Department of Electrical and Computer Engineering,

University of Illinois at Urbana-Champaign. He is now with the Department ofElectrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015USA (e-mail: [email protected]).D.M, Ionescu is with Nokia Research Center, Irving, TX 75039USA (e-mail:

[email protected]).Communicated by K. A. S. Abdel-Ghaffar, Associate Editor for Coding

Theory.Digital Object Identifier 10.1109/TIT.2004.838107

properties [3], [5], [9]–[11]. In particular, it is shown in [11] that mostof the known, good trellis codes share a property called geometricaluniformity. A code is geometrically uniform if for any two codewordsc1 and c2 there exists an isometry that leaves the code invariant whilemapping c1 to c2. Geometrical uniformity of a code is a sufficientcondition for certain useful properties; for example, distance profilesare transparent to the choice of a reference codeword if a code isgeometrically uniform. Further details about geometrically uniformcodes can be found in [11].Lately, various techniques for jointly encoding across multiple

transmit antennas have been proposed for the multiple-input-mul-tiple-output (MIMO) channel (see, for example, [1], [2], [6]–[8],[12]–[24], [26]). Geometrical uniformity, among other invarianceproperties (see, e.g., [19]) of space–time codes, is of interest becausenot only does it provide knowledge about the algebraic structure ofspace–time codes, but it facilitates the performance analysis, as wellas the design—perhaps via systematic search—of space–time codes.While the former reason is obvious, the latter needs some justification;this is because the measures of performance for space–time codesseem unrelated to the Euclidean distance between codewords (see,for example, [24]), while geometrical uniformity is usually defined inthe context of Euclidean distance (an isometry is defined in a metricspace, which is usually the Euclidean space). It was shown in [14, The-orem 5] that the Euclidean distance remains relevant in flat fading—inspite of the multiplicative nature of flat fading distortions, whichapparently questions the relevance of distance-preserving properties.The role of the Euclidean distance in fading channels has also beenadvocated in [4], [13], [14], [23]. In that respect, features derivedfrom Euclidean distance characterizations, including geometricaluniformity, remain potentially meaningful for flat fading channels, andspace–time codes in particular. For example, geometrical uniformitycan reduce the complexity of systematic searches for good space–timecodes. The performance analysis for space–time codes is usuallybased on pair-wise error probabilities over all possible codeword pairs,and hence a systematic search has to check all possible codewordpairs. However, if the space–time code is geometrically uniform, itsuffices to consider only the case where one of the codewords in thecodeword pair is fixed. An example which illustrates the applicationof geometrical uniformity and a more detailed discussion about therelevance of Euclidean distance in fading channel are given in thiscorrespondence.Geometrical uniformity for space–time codes has been studied in the

literature (see, for example, [16], [24]). The geometrical uniformityof signal sets and codes in [16], considered in the context of MIMOchannels, is not in the sense defined by Forney in [11]. In contrast,geometrical uniformity considered herein and in [24] is in the sensedefined by Forney. In [24], space–time codes were proposed by Tarokhet al. and proved, therein, to be geometrically uniform by definition;however, a proof by definition would be difficult in the case of moregeneral codes. Since many space–time codes can be viewed as signalspace codes [11], the approach in [11] can be used to prove or disprovegeometrical uniformity of such coding schemes. This correspondencetreats the space–time trellis coded modulation schemes proposed in[14] and [15] as signal space codes, and presents a constructive proof,via the approach in [11], of the geometrically uniform nature of thesecodes. The terminology used in the sequel follows Forney [11].

II. GEOMETRICALLY UNIFORM CODES

The proofs of geometrical uniformity presented in this correspon-dence follow Forney’s approach [11]. First of all, the members of the

0018-9448/04$20.00 © 2004 IEEE