trellis codes

Chapter 8

Trellis Codes

In this chapter we discuss construction of signals via a trellis. That is, signals are constructed by labeling thebranches of an infinite trellis with signals from a small set. Because the trellis is of infinite length this is concep-tually different than the signals created in the previous chapter. We the codes generated are linear (the sum of anytwo sequences is also a valid sequence) then the codes are known as convolutional codes. We first discuss convo-lutional codes, then optimum decoding of convolutional codes, then discuss ways to evaluated the performance ofconvolutional codes. Finally we discuss the more general trellis codes for QAM and PSK types of modulation.

1. Convolutional Codes

Unlike block codes, convolutional codes are not of fixed length. The encoder instead processes using a slidingwindow the information bit sequence to produce a channel bit sequence. The window operates on a number ofinformation bits at a time to produce a number of channel bits. For example, the encoder shown below examinesthree consecutive information bits and produces two channel bits. The encoder then shifts in a new information bitand produces another set of two channel bits based on the new information bit and the previous two informationbits. In general the encoder stores M information bits. Based on these bits and the current set of k input bitsproduces n channel bits. The memory of the encoder is M. The constraint length is the largest number of consec-utive input bits that any particular output depends. In the above example the outputs depend on a maximum of 3consecutive input bits. The rate is k

�n.

The operation to produce the channel bits is a linear combination of the information bits in the encoder. Becauseof this linearity each output of the encoder is a convolution of the input information stream with some impulseresponse of the encoder and hence the name convolutional codes.

Example (K=3,M=2, rate 1/2 code)In this example, the input to the encoder is the sequence of information symbols � u j : j �� 2 � 2 0 1 2 3 �� .

The output of the top part of the encoder is � c 0 �j : j �� 2 �� 2 0 1 2 3 �� and the output of the bottom part of

the decoder is � c 1 �j : j �� 2 �� 2 0 1 2 3 �� . The relation between the input and output c 0 � is

c 0 �j � u j � u j � 1 � u j � 2

�M

∑l � 0

u j � lg 0 �l

where M � 2 and g 0 �0 � 1, g 0 �1 � 1 and g 0 �2 � 1. Similarly the relation between the input and the output c 1 � is

c 0 �j � u j � u j � 2

�M

∑l � 0

u j � lg 1 �l

8-1

8-2 CHAPTER 8. TRELLIS CODES

�i j �

� �� c 1 �j

� �� c 0 �j

� � �

Figure 8.1: Convolutional Encoder

where g 1 �0 � 1, g 1 �1 � 0 and g 1 �2 � 1. The above relations are convolutions of the input with the vector g knownas the generator of the code. From the above equations it is easy to check that the sum of any two codewords gen-erated by two information sequences corresponds to the codeword generated from the sum of the two informationsequences. Thus the code is linear. Because of this we can assume in our analysis without loss of generality thatthe all zero information sequence (and codeword) is the transmitted sequence.

The operation of the encoder can be determined completely by way of a state transition diagram. The statetransition diagram is a directed graph with nodes for each possible encoder content and transition between nodescorresponding to the results of different input bits to the encoder. The transitions are labeled with the output bitsof the code. This is shown for the previous example.

2. Maximum Likelihood Sequence Detection of States of a Markov Chain

Consider a finite state Markov chain. Let xm be the sequence of random variables representing the state at time m.Let x0 be the initial state of the process p x0 � 1 . Later on we will denote the states by the integers 1,2,...,N.Since this is a Markov process we have that

p xm � 1 � xm xm � 1 �� x1 x0 � p xm � 1 � xm Let wm �� xm � 1 xm be the state transition at time m. There is a one-to-one correspondence between state sequencesand transition sequences. In addition if we know the initial state and the final state then there is also a one-to-onecorrespondence between the state sequence and the sequence of input symbols u0 �� uN . By some mechanism(e.g. a noisy channel) a noisy version � zm � of the state transition sequence is observed. Based on this noisy versionof � wm � we wish to estimate the state sequence � xm � or the transition sequence wm. Since � wm � and � xm � containthe same information we have that

p z � x � p z �w where z � z0 z1 �� zM � 1, x � x0 x1 �� xM , and w � w0 �� wM � 1. If the channel is memoryless then we have that

p z �w � M � 1

∏m � 0

p zm �wm So given an observation z find the state sequence x for which the aposteriori probability p x � z is largest.

This minimizes the probability that we chose the wrong sequence. Thus the optimum (minimum sequence error

2. 8-3

01

11

00

10

1/11

0/00

1/010/01

0/10

1/00

0/11

1/10

Figure 8.2: State Transition Diagram of Encoder

probability) decoder chooses x which maximizes p x � z : i.e

x̂ � argmaxx p x � z � argmaxx p x z � argminx � � log p x z �� argminx � � log p z � x p x �

Using the memoryless property of the channel we obtain

p z � x � M � 1

∏m � 0

p zk �wm and using the Markov property of the state sequence

p x � M � 1

∏m � 0

p xm � 1 � xm p x0 Define λ wm as follows:

λ wm �� ln p xm � 1 � xm � ln p zm �wm Then

x̂ � argminx �M � 1

∑m � 0

λ wm �This problem formulation leads to a recursive solution. The recursive solution is called the Viterbi Algorithm bycommunication engineers and is a form of Dynamic Programming as studied by control engineers. They are reallythe same though.

VITERBI ALGORITHM (DYNAMIC PROGRAMMING)


Let Γ xm be the length (optimization criteria) of the shortest (optimum) path to state xm at time m. Let x̂ xm bethe shortest path to state xm at time m. Let Γ̂ xm � 1 xm be the length of the path to state xm � 1 at time m that goesthrough state xm at time m.

Then the algorithm works as follows.

Storage:

m, time index,x̂ xm xm � � 1 2 �� M �Γ xm xm � � 1 2 �� M �

Initialization

m � 0,x̂ x0 � x0

x̂ xm arbitrary, m �� x0

Γ x0 � 0Γ m � ∞, m �� ∞

Recursion

Γ̂ xm � 1 xm � Γ xm � λ wm Γ xm � 1 � minxm Γ̂ xm � 1 xm for each xm � 1

Let x̂m xm � 1 � argminxmΓ̂ xm � 1 xm . x̂ xm � 1 � x̂ xm xm � 1

Justification:Basically we are interested in finding the shortest length path through the trellis. At time m we find the shortestlength paths to each of the possible states at time m by computing all possible ways of getting to state xm � u froma state at time m � 1. If the shortest path (denoted by x̂ u ) to get to xm � u at time m goes through state xm � 1 � vat time m � 1 (i.e. x̂ u � x̂ v u) then the corresponding path x̂ v to state xm � 1 � v must be the shortest path tostate v at time m � 1 since if there was a shorter path, say x̃ v , to state v at time m � 1 then the path x̃ v u to stateu at time m that used this shorter path to state v at time m � 1 would be shorter then what we assumed was theshortest path). Stated another way if the shortest way of getting to state u at time m is by going through state v attime m � 1 then the path used to get to state v at time m � 1 must be the shortest of all paths to state v at time m � 1.

We identify a state transition with the pair of bits transmitted. The received pair of decision statistics is ournoisy information about the transition. Thus p z �w in this case is just the transition probabilities from the input ofthe channel to the output of the channel. This is because knowing the state transition determines the channel input.

First consider the binary symmetric channel.

� � � � � � � � � � � �� 1 1

0 0

1 � p

1 � p

p

p

p z �w � pdH z � c � 1 � p N � dH z � c �

� � p1 � p

� dH z � c � 1 � p Nln p z �w � pdH z � c � 1 � p N � dH z � c �

� � p1 � p

� dH z � c � 1 � p NSo that minimizing the metric is the same as choosing the sequence with the closest Hamming distance.

3. 8-5

Example 2

Consider an additive white Guassian noise channel.

��

ci

ni

ri

The noise ni is Gaussian, mean 0 and variance N0�2. The possible inputs to the channel are a finite set of real num-

bers (e.g ui � � �� E �� E � ). These are obtained by the simple mapping u � � 1 c j . The transition probabilityis

p z � u �N

∏i � 1

1

� 2πσexp � � 1

2σ2 zi � ui 2 � �� 1

� 2πσ� N exp � � 1

2σ2

N

∑i � 1 zi � ui 2 �

� � 1

� 2πσ� N exp � � 1

2σ2 d2E z u � �

where d2E z u � ∑N

i � 1 zi � ui 2 is the squared Euclidean distance between the input and the output of the channel.Thus finding u to maximize p z � u is equivalent to finding u to minimize d2

E z u .Thus in these two cases we can equivalently use a distance function (between what is received and what would

have been transmitted along a given path) as the function we minimize. This should be obvious from the case ofblock codes.

3. Weight Enumerator for Convolutional Codes

In this section we show how to determine the weight enumerator polynomials for convolutional codes. The weightenumerator polynomial is a method for counting the number of codewords of a given Hamming weight with acertain number of input ones and a certain length. It will be useful when error probability bounds are discussed.

Consider the earlier example with four states. We would like to determine the number of paths that start in thezero state (say), diverge from the zero state for some time and then remerge with the zero state such that there area certain number of input ones, a certain number of output ones and a certain length. To do this let us split the zerostate into a begining state and ending state. In addition we label each path with three variables (x y z). The poweron x is the number of input ones; the power on y is the number of output ones; the power on z is the length of thatpath (namely one). To get the parameters for two consecutive paths we multiply these variables.

Let T10 represent the number of paths stating in the all zero state and ending in state 10. This includes pathsthat go through state 10 any number of times. Similarly let T11 be the number of paths stating from the 00 state andending in state 11. Finally let T01 be the number of paths stating from the 00 state and ending in state 01. Then wecan write the following equations

T10 � xy2z � xzT01

T11 � xyzT10 � xyzT11

T01 � yzT10 � yzT11


1001

11

00

X1Y1Z

X0Y1Z

X0Y2Z

X0Y1Z

X1Y0Z

X1Y2Z

X1Y1Z

Figure 8.3: State transition diagram of encoder

From these equations we can solve for T01. Then the number of paths stating at 00 and ending in 00 is A x y z �y2zT01. In this case the solution is

A x y z � xy5z3

1 � xyz 1 � z � xy5z3 � x2y6z4 ��

Thus there is one path through the trellis with one input one, 5 output ones and length 3. There is one path divergingand remerging with 2 input ones, length 4 and 6 output ones. The minimum (or free) distance of a convolutionalcode is the minimum number of output ones on any path that diverges from the all zero state and then remerges.This code has d f of 5.

To calculate Al used in the first event error probability bound we calculate A 1 y 1 . The coefficient on yl isAl .

In order to determine the bit error probability the next section shows that the polynomial given by

w y � ∂A x y z ∂x � x � 1 � z � 1 �

is needed. For this example the polynomial is

w y � y5

1 � 2y 2 � y5 � 4y6 ��

The encoder shown in Figure 8.4 is a very common encoder for a variety of applications. This code has d f

of 10. The weight enumerator polynomial is given as follows. The first event error probability is determined bya y � b y . The bit error probability is determined by c y � b2 y .

a y � 11y10 � 6y12 � 25y14 � y16 � 93y18 � 15y20 �176y22 � 76y24 � 243y26 � 417y28 � 228y30 � 1156y32

� 49y34 � 2795y36 � 611y38 � 5841y40 � 1094y42

� 9575y44 � 1097y46 � 11900y48 � 678y50 � 11218y52

3. 8-7

� � � � � �

� ��

� ��

Figure 8.4: Encoder for K=7,M=6, rate 1/2 code

� 235y54 � 8068y56 � 18y58 � 4429y60 � 20y62 �1838y64 � 8y66 � 562y68 � y70 � 120y72 � 16y76 � y80

b y � 1 � 4y2 � 6y4 � 30y6 � 40y8 � 85y10 � 81y12 � 345y14

� 262y16 � 844y18 � 403y20 � 1601y22 � 267y24

� 2509y26 � 389y28 � 3064y30 � 2751y32 � 2807y34

� 8344y36 � 1960y38 � 16133y40 � 1184y42 � 21746y44

� 782y46 � 21403y48 � 561y50 � 15763y52 � 331y54

� 8766y56 � 131y58 � 3662y60 � 30y62 � 1123y64

� 3y66 � 240y68 � 32y72 � 2y76

c y � 36y10 � 77y12 � 140y14 � 813y16 � 269y18 � 4414y20 �321y22 � 14884y24 � 5273y26 � 40509y28 � 39344y30

� 83884y32 � 177469y34 � 111029y36 � 608702y38

� 29527y40 � 1820723y42 � 817086y44 � 4951082y46

� 3436675y48 � 12279246y50 � 10300306y52 � 27735007y54

� 25648025y56 � 56773811y58 � 55659125y60

� 104376199y62 � 106695512y64 � 170819460y66

� 180836818y68 � 247565043y70 � 270555690y72

� 317381295y74 � 356994415y76 � 360595622y78

� 415401723y80 � 364292177y82 � 426295756y84

� 328382391y86 � 385686727y88 � 264812337y90

� 307287819y92 � 191225378y94 � 215144035y96

� 123515898y98 � 131946573y100 � 71124860y102

� 70570661y104 � 36310569y106 � 32722089y108

� 16308558y110 � 13052172y112 � 6380604y114


� 4433332y116 � 2147565y118 � 1265046y120 � 612040y122

� 297721y124 � 144665y126 � 56305y128

� 27569y130 � 8232y132 � 4066y134 � 874y136

� 435y138 � 60y140 � 30y142 � 2y144 � y146

The weight enumerator polynomial for determining the bit error probability is given by

c d � b2 d � 36y10 � 211y12 � 1404y14 � 11633y16 � 77433y18

� 502690y20 � 3322763y22 � 21292910y24

� 134365911y26 � 843425871y28 ��

4. Error Bounds for Convolutional Codes

We are interested in determining the probability that the decoder makes an error. We will define several typesof errors. Without loss of generality we will assume that the information sequence is the all zeros sequence sothe transmitted codeword is the all zeros codeword. Furthermore we will assume the trellis starts at time unit 0.Normally a code is truncated (forced back to the all zero state) after a large number of code symbols have beentransmitted but we don’t require this.

1. First Event Error

A first event error is said to occur at time m if the all zeros path (correct path) is eliminated for the first time at timem, that is, if the path with the shortest distance to the 0 state at time m is not the all zeros path and this is the firsttime that the all zero path has been eliminated. At time m an incorrect path will be chosen over the correct pathif P � incorrect path � received sequence � is greater than P � correct path � received sequence � . If the incorrect path has(output) weight d then the probability that the incorrect path being more likely than the all zeros path is denoted asP2 d . This is easily calculated for most channels since it is just the error probability of a repetition code of lengthd. For an additive white Gaussian channel it is given by

P2 d � Q � 2Ed�N0

For a binary symmetric channel with crossover probability p it is given by

P2 d ��

∑dn � d � 1 �� 2 � dn � pn 1 � p d � n d odd

∑dn � d � 2 � 1 � dn � pn 1 � p d � n � 1

2 � dd � 2 � pd � 2 1 � p d � 2 d even

The first event error probability at time m, Pf � m, can then be bounded (using the union bound) as

Pf � m E �� ∞

∑l � 0

AlP2 l where Al is the number of paths through the trellis with output weight l. This is a union type bound. However, itis also an upper bound since at any finite time m there will only be a finite number of incorrect paths that mergewith the correct path at time m. We have included in the upper bound paths of all lengths as if the trellis started att � � ∞. This makes the bound independent of the time index j. (We will show later how to calculate Al for all lvia a generating function).

Since each term in the infinite sum is nonnegative, the sum is either a finite positive number or ∞. For examplethe standard code of constraint length 3 has Al � 2l � 5 so unless the pairwise error probability decreases fasterthan 2 � l the above bound will be infinite. The pairwise error probability will decrease fast enough for reasonablylarge signal-to-noise ratio or reasonably small crossover probability. In general the sequence � Al � may have aperiodic component that is zero but otherwise is a positive increasing sequence. P2 l for reasonable channels is adecreasing sequence. If the channel is fairly noisy then the above upper bound on first event error probability may

4. 8-9

converge to something larger than 1, even ∞. In this case clearly 1 is an upper bound on any probability we areinterested in, so the above bound can be “improved” to

Pf � m E � min � 1 ∞

∑l � 0

AlP2 l � �For example, the well known constraint length 7, rate 1/2 convolutional code has coefficients that grow no fasterthan 2 � 3876225 k so that provided P2 l decreases faster than 2 � 3876225 � k the bound above will converge.Since P2 l � Dl where (for hard decisions) D � 2

�p 1 � p the above bound converges for p � 0 � 046. For soft

decisions (and additive white Gaussian noise) D � e � E � N0 and thus the bound converges provided E�N0

� � 0 � 6dB.

2. Bit error probability

Below we find an upper bound on the error probability for binary convolutional codes. The generalization tononbinary codes is straightforward.

In order to calculate the bit error probability in decoding between time m and m � 1 we need to consider allpaths through the trellis that are not in the all zeros state at either time unit m or m � 1. We also need to realize thatnot all incorrect paths will cause an error. First consider a rate 1

�n code (i.e. k � 1) so each transition from one

state to the next is determined by a single bit. To compute an upper bound on the bit error probability we will doa union bound on all paths that cause a bit error. We assume that the trellis started at t � � ∞. We do this in twosteps. First we look at each path diverging and then remerging to the all zero state. This is called an error event.(An error event can only diverge once from the all zero state). Then sum over all possible starting times (timeswhen the path diverges from the all zero state) for each of these error events. So take a particular error event oflength l corresponding to a state sequence with i input ones and j output ones and let Ai � j � l be the number of suchpaths. If the error event started at time unit m then that clearly would cause an error since the input bit need be oneupon starting an error event (diverging from the all zero state). However, if the event ended at time unit m � 1 inthe all zero state then there would not be a bit error made since remerging to the all zero state corresponds to aninput bit of 0. Of the l phases that overlap with the transition from m to m � 1 there are exactly i of these that willcause a bit error. So for each error event we need to weight the probability by the number of input ones that are onthat path. Thus the bit error probability (for k � 1) can be upper bounded by

Pb � ∑i � j � l

iAi � j � lP2 j where P2 j is the probability of error between two codewords that differ in j positions. As before, this bound isindependent of the time index m since we have included all paths as if the trellis started at t � � ∞ and goes on tot �� ∞.

If we define the weight enumerator polynomial for a convolutional code as

A x y z � ∑i � j � l

Ai � j � lxiy jzl

and upper bound P2 j using the Chernoff or Bhattacharyya bound by D j then the upper bound on first event errorprobability is just

Pf E � A x y z � x � 1 � y � D � z � 1

Similarly the bit error probability can be further upper bounded by

Pb � ∂A x y z ∂x � x � 1 � y � D � z � 1

As mentioned previously the above bounds may be larger than one (for small signal-to-noise ratio or high crossover probability). This will happen for a larger range of parameters when we use the generating function with theBhattacharyya bound as opposed to just the union bound. There is a way for certain codes to use just the unionbound for the first say L terms and the Bhattacharyya bound for remaining to get a tighter bound than the boundbased on the generating function but without the infinite computation required for just the union bound. (See some


problem). The above bounds only are finite for a certain range of the parameter D depending on the specific code.However for practical codes and reasonable signal-to-noise ratios or crossover probabilities the above bounds arefinite. (See another problem).

Now consider a rate k�n convolutional code. The trellis for such a code has 2k branches emerging from each

state. We will consider the bit error probability for the r-th bit in each k bit input sequence. Let Ai1 � � � � � ik � j � l be thenumber of paths through the trellis with j output ones, length l, ir input ones in the r-th input bit (1 � r � k) of thesequence of k bit inputs. Then clearly

Ai � j � l � ∑i1 � � � � � ik:∑k

r � 1 ir � i

Ai1 � � � � � ik � j � l

The bit error probability for the r-th bit is then bounded by

Pb � r � ∑i1 � � � � � ik � j � l

irAi1 � � � � � ik � j � lP2 j The average bit error probability is

Pb � 1k

k

∑r � 1

Pb � r � 1k

k

∑r � 1

∑i1 � � � � � ik � j � l

irAi1 � � � � � ik � j � lP2 j Pb � 1

k ∑j � l

P2 j ∑i1 � � � � � ik

k

∑r � 1

irAi1 � � � � � ik � j � l

Pb � 1k ∑

j � lP2 j ∑

i1 � � � � � ikAi1 � � � � � ik � j � l

k

∑r � 1

ir

Now consider the last two sums.

∑i1 � � � � � ik

k

∑r � 1

ir Ai1 � � � � � ik � j � l � ∑i

∑i1 � � � � � ik :∑ ir � i

k

∑r � 1

ir Ai1 � � � � � ik � j � l

� ∑i

∑i1 � � � � � ik :∑ ir � i

i Ai1 � � � � � ik � j � l

� ∑i

i ∑i1 � � � � � ik:∑ ir � i

Ai1 � � � � � ik � j � l

� ∑i

iAi � j � l

Thus

Pb � 1k ∑

i � j � liAi � j � lP2 j �

We can write this as

Pb � 1k ∑

jw jP2 j �

wherew j � ∑

i � liAi � j � l

We can upper bound the bit error probability by

Pb � ∑j

w jP2 j �� ∑j

w jDj � w D

The first bound is the union bound. It is impossible to exactly evaluate this bound because there are an infinitenumber of terms in the summation. Dropping all but the first N terms gives an approximation. It may no longer be

4. 8-11

0 1 2 3 4 5 6 7 810

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Eb/N

0 (dB)

Pe,

b

Simulation

Upper Bound

Lower Bound

Figure 8.5: Error probability of constraint length 3 convolutional codes on an additive white Gaussian noise channelwith soft decisions decoding (upperbound, simulation and lower bound).

an upper bound though. If the weight enumerator is known we can get arbitrarily close to the union bound and stillget a bound as follows.

Pb � ∑j

w jP2 j � N

∑j � d f

w jP2 j � ∞

∑j � N � 1

w jP2 j � N

∑j � d f

w jP2 j � ∞

∑j � N � 1

w jDj

�N

∑j � d f

w j P2 j � D j � ∞

∑j � d f

w jDj

�N

∑j � d f

w j P2 j � D j � w D

The second term is the Union-Bhattacharyya (U-B) bound. The first term is clearly less than zero, so we getsomething that is tighter than the U-B bound. By chosing N sufficiently large we can sometimes get significantimprovements over the U-B bound. Below we show the error probability bounds and simulation for the constraintlength 3 (memory 2) convolutional code. Note that the upper bound is fiarly tight when the bit error probability isless than 0.001.

Example (K=7,M=6, rate 1/2 code). This code is used in many commercial systems including IS-95 Standardfor digital cellular. This is also a NASA standard code.


0 1 2 3 4 5 610

−6

10−5

10−4

10−3

10−2

10−1

100

Eb/N

0

Pb

Figure 8.6: Error probability of constraint length 4 convolutional codes on an additive white Gaussian noise channelwith soft decisions decoding (upperbound, simulation).

4. 8-13

Figure 8.7: Error probability of constraint length 7 rate 1/2 convolutional codes on an additive white Gaussiannoise channel (hard and soft decisions).

−4 −2 0 2 4 6 8 10 1210

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Eb/N0 (dB)

Bit

Err

or R

ate

Simulation

Union Bound

Uncoded BPSK

Figure 8.8: Error probability of constraint length 7 rate 1/2 convolutional codes on an additive white Gaussiannoise channel with soft decisions decoding (upperbound, simulation and uncoded).

� � � � � �

� ��

� ��

� � � � � �


Figure 8.9: Error probability of constraint length 9 convolutional codes on an additive white Gaussian noise channel(hard and soft decisions).

Memory Generators in octal d f ree Ad f ree

3 5 7 54 15 17 65 23 35 76 53 75 87 133 171 108 247 371 109 561 753 12

10 1167 1545 12

Rate 1/2 maximum free distance codes

Memory Generators in octal d f ree Ad f ree

3 5 7 7 84 13 15 17 105 25 33 37 126 47 53 75 137 133 145 175 158 225 331 367 169 557 663 711 18

10 1117 1365 1633 20

Rate 1/3 maximum free distance codesNote that low rate convolutional codes do not perform any better than a rate 1/2 code convolutional code

concatenated with a repetition code. There are better approaches to achieving high distance at low rates. Thisusually involves concatenating a convolutional code with an orthogonal code as described below (to be added).

5. Dual-k Convolutional Codes

Consider using a code with symbols in the alphabet � 0 1 �� M � 1 � . The code symbols are mapped to signals froman orthogonal signal set. The performance of the receiver depends on the Euclidean distance between signals.For orthogonal signal sets Euclidean distance between signals corresponds to Hamming distance between codesymbols. Thus designing a code with maximum Hamming distance will yield a coded modulation with maximumEuclidean distance when the signals are from an orthogonal signal set.

We will treat the data and channel symbols as elements of GF 2k (finite field with 2k elements). The code ratewill be denoted as 1

�v v � 2 3 �� 2k � 1. The encoder is shown below.

�I � GF 2k S

S � GF 2k � g1

�x � �

� �g0 x I � gi x STo channel

� g0�x � �

+

As can be seen from the structure of the code, dual-k codes are linear codes. For every information symbol Iand encoder state S the output of the encoder (input to the discrete time channel) will be g0 x I � g1 x S where

g0 x � g0 � 0 � g0 � 1x �� g0 � v � 1xv � 1

g1 x � g1 � 0 � g1 � 1x �� g1 � v � 1xv � 1

5. 8-15

That is, for every encoder input symbol there are v outputs (namely, g0 � 0I � g1 � 0S �� g0 � v � 1I � g1 � v � 1S. The encoderoutput can be expressed in polynomial form as

c x �v � 1

∑i � 0

g0 � iI � g1 � iS xi � 1

�v � 1

∑i � 0

g0 � i � I � g1 � ig0i

S � xi � 1

If any of the g0 � i � 0 then distance could be improved by making g0 � i �� 0. All paths leaving all zero state shouldhave all nonzero symbols. So without loss of generality we can assume g0 � i � 1 i � 1 2 �� v.

Upon leaving the zero state the encoder output is a nonzero multiple of 1 1 �� 1 (v ones). Returning to theall zero state the output is a nonzero multiple of g1 � 1 �� g1 � v . Since we can return to the zero state from any statemaximum distance is obtained if all the components of g1 � 1 �� g1 � v are nonzero. This means the distance willbe 2v.

The code will be noncatastrophic if an input sequence of infinite Hamming weight never produces a codedsequence with finite weight. This will not occur in dual-k codes if the coefficients g1 � 1 �� g1 � v are not all the same.The state diagram with paths labeled by appropriate weights is shown below.

D Lv

D NLv

NL

NL

NL

NL

0

1

a

b

NL

NL

NL

NL

D NLv

D NLv

D LvD Lv

NL

Figure 8.10: State Transition Diagram of Dual-k Encoder

We would like to find the number of paths starting at state 0, leaving state 0 and returning back with a givennumber of input ones, length, output weight. Label each path by DiN jL where i is the number of nonzero outputsymbols and j is the number of nonzero input symbols.


The transfer function from state a to state b is ∑i � j � l ai � j � lDiNJLl where ai � j � l = number of paths from a to b withoutput weight i, input weight j, and length l.

Let

X �� x1...

x2k

��where xi is the transfer function from all zero state to state i. Let

B �� B1...

B2k � 1

��where Bi is the label on the path from state 0 to state i. Let A � � ai � j

�be matrix of transfer functions from state j to

state i and C � � C1 �� C2k � 1� Ci be the transfer function from state i to zero. Then

X � AX � B

Let T D N L be the transfer function from state zero leaving state 0 and then back to state 0.

T D N L � CX � CAX � CB �For dual-k codes

B � DvNL��

11...

1

� ��C � DvL � 1 1 �� 1 �

CB � D2vNL2 2k � 1 �Notes:(1) CA is a row vector with the j-th component begin the sum of the jth column of A multiplied by DvL.(2) For i �� 0, ai � j � NLDn i � j � where n i j is the number of nonzero symbols in i1 � jg1.

α �2k � 1

∑i � 1

ai � j � NL2k � 1

∑i � 1

Dn i � j � �

Let βl = number of times symbol l appears in g1, 1 � l � 2k � 1. Then

0 � βl � v 2k � 1

∑l � 1

βl � v

andn j � l j � v � βl �

As l goes through all nonzero elements of GF 2k , j � l goes through all nonzero elements also. Thus

2k � 1

∑i � 1

Dn i � j � �2k � 1

∑l � 1

Dv � βl

This is independent of j. Thus

α �2k � 1

∑i � 1

ai � j � NL2k � 1

∑l � 1

Dv � βl

5. 8-17� CAX � αCX

CX � CAX � CB � αCX � CB

CX 1 � α � CB � CX � CB1 � α

T D N L � CB1 � α

� D2vNL2 2k � 1 �1 � NL∑2k � 1

l � 1 Dv � βl �� D2vNL2 2k � 1 �� 1 � NL

2k � 1

∑l � 1

Dv � βl �� NL2k � 1

∑l � 1

Dv � βl � 2

��

To “optimize” the code make βl as small as possible, make

β1 � β2 � �� βv � 1

βn � 1 � �� β2k � 1 � 0 �

That is, make all terms in g distinct.

T D N L � D2vNL2 2k � 1 1 � NL � vDv � 1 � 2k � 1 � v Dv

�

T D N � D2vN 2k � 1 1 � N � vDv � 1 � 2k � 1 � v Dv

�

∂∂x

x1 � ax

� 11 � ax

� x 1 � ax 2 � a 1 � ax � ax 1 � ax 2 � 1 1 � ax 2

∂T D N ∂N � N � 1 � 2k � 1 D2v

� 1 � vDv � 1 � 2k � 1 � v Dv � 2

1. Example

For dual-2, k � 2 code with v � 2 and rate 1/2 the code generator are determined as follows. First we need arepresentation for GF(4). Let the set � 0 1 a b � be GF(4) with the following arithmetic.

a2 � b b2 � a ab � 1 1 � a � b

1 � b � a 1 � 1 � 0 a � a � 0 b � b � 0 a � b � 0 0 � 1 � a � b � 0

Then the code generator polynomial is

g0 � 1 � 1 g1 � 1 � 1 g0 � 2 � 1 g1 � 2 � a

The trellis is illustrated below.


b b

a a

1 1

0 0

S S

�

�

�

�

�

�

�

�

� � � � � � � ��

� � � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � � � � �

��

��

��

��

The weight enumerator polynomial is

T D N L � 3D4L2N1 � NL � 2D � D2

�

T D � T D 1 1 � 3D4

1 � 2D � D2 � 3D4 � 1 � 2D � D2 � 2D � D2 2 ��

� 3D4 � 1 � 2D � 5D2 �� 3D4 � 6D5 � 15D6 ��Since diverging paths have distance 2 and remerging Paths have distance 2 the total distance is at least 4. From theweight enumerator there are exactly 3 paths of distance 4. For calcualting the symbol error probability the folliwngweight enumerator is needed.

∂T D N ∂N

� N � 1 � 3D4

� 1 � 2D � D2 � 2

� 3D4 � 1 � 2 2D � D2 � 6 2D � D2 2 �� 3D4 � 1 � 4D � 26D2 �� 3D4 � 12D5 � 78D6

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

� � � � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � � � �

� � � � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � � � �

� � � � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � � � �

� � � � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � � � �

� � � � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � �

� � � � � � � � � �

6. Minimum Bit Error Probability Decoding

In the previous sections we derived the optimal decoder for minimizing the codeword error probability. That isminimizing the probability that the decoder chooses the wrong information sequence. In this section we derive an

6. 8-19

algorithm for minimizing the probability of bit error.Consider a finite state Markov chain with state space � 0 1 �� M � 1 � . Let xm be the sequence of random

variables representing the state at time m. Let x0 be the initial state of the process with p x0 � 1 and let xJ bethe final state. For a convolutional code the input bits u1 u2 �� determine the sequence state transitions. Since x j

is a Markov process we have that

p x j � 1 � x j x j � 1 �� x1 x0 � p x j � 1 � x j Let w j � x j x j � 1 be the state transition at time j. There is a one-to-one correspondence between state sequencesand transition sequences. That is the two sequences

x0 x1 �� xJ w1 w2 �� wJ

contain the same information. Let xlk � xk �� xl denote a sequence. By some mechanism (e.g. a noisy channel)

a noisy version � zm � of the state transition sequence is observed. Based on this noisy version of � w j � we wish todetermine the following probabilities

p x j � 1 � m x j � m � � zJ1

and

p x j � 1 � m � zJ1 �

These two quantities can be calculated from

λ j m � P � x j � m zJ1 �

σ j m � m � P � x j � 1 � m � x j � m zJ1 �

by appropriate normalization. Now let

α j m � P � x j � m z j1 �

β j m � P � zJj � 1 � x j � m �

γ j m � m � P � x j � m z j � x j � 1 � m � � �

We can calculate λ j m as follows.

λ j m � P � zJj � 1 � x j � m z j

1 � P � x j � m z j1 �

� P � zJj � 1 � x j � m � α j m

� β j m α j m �We can calculate σ j m � m as follows.

σ j m � m � P � x j � 1 � m � x j � m zJ1 �

� P � zJj � 1 � x j � m x j � 1 � m � z j

1 � P � x j � m x j � 1 � m � z j1 �

� P � zJj � 1 � x j � m � P � x j � m z j � x j � 1 � m � z j � 1

1 � P � x j � 1 � m � z j � 11 �

� P � zJj � 1 � x j � m � P � x j � m z j � x j � 1 � m � � P � x j � 1 � m � z j � 1

1 �� β j m γ j m � m α j � 1 m �


We now develop recursions for α j m and β j m . For j � 0 1 �� J we have

α j m � P � x j � m z j1 �

�M � 1

∑m

� � 0

P � x j � 1 � m � x j � m z j1 �

�M � 1

∑m

� � 0

P � x j � m z j � x j � 1 � m � z j � 11 � P � x j � 1 � m � z j � 1

1 �

�M � 1

∑m

� � 0

γ j m � m α j � 1 m � The boundary conditions are given as

α0 m ��

1 m � 00 m �� 0 �

Here we are assuming the Markov chain starts in state 0 and ends in state 0 at time J (P � xJ � 0 � � 1). The recursionfor β j m is given as follows.

β j m � P � zJj � 1 � x j � m �

�M � 1

∑m

� � 0

P � x j � 1 � m � zJj � 1 � x j � m �

�M � 1

∑m

� � 0

P � zJj � 2 � x j � 1 � m � z j � 1 x j � m � P � x j � 1 � m � z j � 1 � x j � m �

�M � 1

∑m

� � 0

P � zJj � 2 � x j � 1 � m � � P � x j � 1 � m � z j � 1 � x j � m �

�M � 1

∑m

� � 0

β j � 1 m � γ j � 1 m m � �The boundary condition is

βJ m ��

1 m � 00 m �� 0 �

Finally we can calculate γ j m � m as follows

γ j m � m � P � x j � m z j � x j � 1 � m � �� ∑

w j

P � x j � m z j w j � x j � 1 � m � �

� ∑w j

P � z j � x j � m w j x j � 1 � m � � P � x j � m w j � x j � 1 � m � �

� ∑w j

P � z j �w j � P � w j � x j � m x j � 1 � m � � P � x j � m � x j � 1 � m � �

The first term is the transition probability of the channel. The second term is the output of the encoder whentransitioning from state m to state m � . The last term is the probability of going to state m from state m � . This willbe either a nonzero constant (e.g. 1/2) or zero.

The algorithm works as follows. First initialize α and β. After receiving the vector z1 �� zJ perform the recur-sion on α and β. Then combine α and β to determine λ and σ. Normalize to determine the desired probabilities.

Now consider a convolutional code which is used to transmit information. The input sequence to the encoderis u1 �� uJ � m 0 �� 0 where m zeros have been appended to the input sequence to force the encoder to the all zero

6. 8-21

state at time J. We wish to determine the minimum bit error probability decision rule for bit u j. The input sequencedetermines a state transition sequence � x j j � 0 �� J � . The state sequence determines the output code symbolsc1 �� cN . The output symbols are modulated and the received and a decision statistic is derived for each codedsymbol via a channel p z � c . Based on observing r we wish to determine the optimum rule for deciding if u j � 0or u j � 1. The optimial decision rule is to compute the log-likelihood ratio and compare that to zero

λ � log � p u j � 0 � z p u j � 1 � z �

� log � ∑m � m �:u j � 0 p x j � m x j � 1 � m � � z

∑m � m �:u j � 1 p x j � m x j � 1 � m � � z �

� log � ∑m � m �:u j � 0 σ m m �

∑m � m �:u j � 1 σ m m � �

� log � ∑m � m �:u j � 0 α j � 1 m � γ j m � m β j m

∑m � m �:u j � 1 α j � 1 m � γ j m � m β j m �

Information

RSC1

RSC2

Interleaver

Figure 8.11: Turbo Code Encoder

�

�

Figure 8.12: Recursive Systematic Encoder


Decoder 1

Decoder 2

Interleaver

Deinterleaver

Interleaver

Figure 8.13: Decoding Architecture

Figure 8.14: Bit error probability for turbo codes

Alternative algorithm:We can calculate σ j m � m as follows.

σ̂ j m � m � P � x j � 1 � m � x j � m � zJ1 �

� P � x j � 1 � m � x j � m zJ1 � 1 � P � zJ

1 � � P � zJ

j � 1 � x j � m x j � 1 � m � z j1 � P � x j � m x j � 1 � m � z j

1 � 1 � P � zJ1 �

� P � zJj � 1 � x j � m � P � x j � m z j � x j � 1 � m � z j � 1

1 � P � x j � 1 � m � z j � 11 � 1 � P � zJ

1 � � P � zJ

j � 1 � x j � m � P � x j � m z j � x j � 1 � m � � P � x j � 1 � m � � z j � 11 � P � z j � 1

1 �P � zJ

1 � � P � zJ

j � 1 � x j � m �P � zJ

j � z j � 11 � P � x j � m z j � x j � 1 � m � � P � x j � 1 � m � � z j � 1

1 �

� β̂ j m γ j m � m α̂ j � 1 m � We now develop recursions for α̂ j m and β̂ j m . For j � 0 1 �� J we have

α̂ j m � P � x j � m � z j1 �

�M � 1

∑m

� � 0

P � x j � 1 � m � x j � m z j1 ��P � z j

1 �

�M � 1

∑m

� � 0

P � x j � m z j � x j � 1 � m � z j � 11 � P � x j � 1 � m � z j � 1

1 � � P � z j1 �

�M � 1

∑m

� � 0

P � x j � m z j � x j � 1 � m � � P � x j � 1 � m � � z j � 11 � P � z j � 1

1 � � P � z j1 �

�M � 1

∑m

� � 0

γ j m � m α̂ j � 1 m � � P � z j � z j � 11 �

6. 8-23

We can rewrite the denominator as

P � z j � z j � 11 � �

M � 1

∑m

� � 0

M � 1

∑m � 0

P � x j � m x j � 1 � m � z j � z j � 11 �

�M � 1

∑m

� � 0

M � 1

∑m � 0

P � x j � m z j � x j � 1 � m � z j � 11 � P � x j � 1 � m � � z j � 1

1 �

�M � 1

∑m � 0

M � 1

∑m

� � 0

γ j m � m α̂ j � 1 m � Thus the recursion for α̂ is

α̂ j m � ∑M � 1m

� � 0 γ j m � m α̂ j � 1 m � ∑M � 1

m � 0 ∑M � 1m

� � 0 γ j m � m α̂ j � 1 m � The boundary conditions are given as

α̂0 m ��

1 m � 00 m �� 0 �

Here we are assuming the Markov chain starts in state 0 and ends in state 0 at time J (P � xJ � 0 � � 1). The recursionfor β̂ j m is given as follows.

β̂ j m � P � zJj � 1 � x j � m � � P � zJ

j � z j � 11 �

�M � 1

∑m

� � 0

P � x j � 1 � m � zJj � 1 � x j � m � � P � zJ

j � z j � 11 �

�M � 1

∑m

� � 0

P � zJj � 2 � x j � 1 � m � z j � 1 x j � m � P � x j � 1 � m � z j � 1 � x j � m � � P � zJ

j � z j � 11 �

�M � 1

∑m

� � 0

P � zJj � 2 � x j � 1 � m � � P � x j � 1 � m � z j � 1 � x j � m � � P � zJ

j � z j � 11 �

�M � 1

∑m

� � 0

β̂ j � 1 m � P � zJj � 1 � z j

1 � γ j � 1 m m � � P � zJj � z j � 1

1 �

�M � 1

∑m

� � 0

β̂ j � 1 m � γ j � 1 m m � � P � z j � z j � 11 �

� ∑M � 1m

� � 0 β̂ j � 1 m � γ j � 1 m m � ∑M � 1

m � 0 ∑M � 1m

� � 0 γ j m � m α̂ j � 1 m � The boundary condition is

β̂J m ��

1 m � 00 m �� 0 �

Finally we can calculate γ j m � m as follows

γ j m � m � P � x j � m z j � x j � 1 � m � �� ∑

w j

P � x j � m z j w j � x j � 1 � m � �

� ∑w j

P � z j � x j � m w j x j � 1 � m � � P � x j � m w j � x j � 1 � m � �

� ∑w j

P � z j �w j � P � w j � x j � m x j � 1 � m � � P � x j � m � x j � 1 � m � �


The first term is the transition probability of the channel. The second term is the output of the encoder whentransitioning from state m to state m � . The last term is the probability of going to state m from state m � . This willbe either a nonzero constant (e.g. 1/2) or zero.

The algorithm works as follows. First initialize α and β. After receiving the vector z1 �� zJ perform the recur-sion on α and β. Then combine α and β to determine λ and σ. Normalize to determine the desired probabilities.

Now consider a convolutional code which is used to transmit information. The input sequence to the encoderis u1 �� uJ � m 0 �� 0 where m zeros have been appended to the input sequence to force the encoder to the all zerostate at time J. We wish to determine the minimum bit error probability decision rule for bit u j. The input sequencedetermines a state transition sequence � x j j � 0 �� J � . The state sequence determines the output code symbolsc1 �� cN . The output symbols are modulated and the received and a decision statistic is derived for each codedsymbol via a channel p z � c . Based on observing r we wish to determine the optimum rule for deciding if u j � 0or u j � 1. The optimial decision rule is to compute the log-likelihood ratio and compare that to zero

λ � log � p u j � 0 � z p u j � 1 � z �

� log � ∑m � m �:u j � 0 p x j � m x j � 1 � m � � z

∑m � m �:u j � 1 p x j � m x j � 1 � m � � z �

� log � ∑m � m �:u j � 0 σ m m �

∑m � m �:u j � 1 σ m m � �

� log � ∑m � m �:u j � 0 α j � 1 m � γ j m � m β j m

∑m � m �:u j � 1 α j � 1 m � γ j m � m β j m �

Information

RSC1

RSC2

Interleaver

Figure 8.15: Turbo Code Encoder

7. Trellis Representation of Block Codes

We can represent linar block codes using a trellis and as such apply the Viterbi algorithm to decoding the code. Toillustrate this consider a (7,4) Hamming code with parity check matrix

H � �� 0 0 0 1 1 1 10 1 1 0 0 1 11 0 0 0 1 0 1

��

7. 8-25

�

�

Figure 8.16: Recursive Systematic Encoder

Decoder 1

Decoder 2

Interleaver

Deinterleaver

Interleaver

Figure 8.17: Decoding Architecture

Every codeword x must satisfy the parity check equation

HxT � 0

Let hi represent the i-th column of the parity check matrix H and x � x0 x1 �� xn � 1 . Then the parity checkequation can be written as

n � 1

∑i � 0

hixi � 0

Now define a trellis which has 2n � k states. The state at time 0 is the vector 0 0 0 T . The first bit of the codewordcould be 0 or 1. The parity check on just the first bit is 0 0 1 T if the first bit is one and is 0 0 0 T if the firstbit is zero. The parity check for the first two bits is either 0 0 0 T or 0 0 1 T or 0 1 0 T or 0 1 1 T . Once thelast bit is considered the parity checks must be all zero. The possible parity checks after can be described by thebelow trellis.


000

001

010

011

100

101

110

111

0

1

0

1

0

1

0

1

0

0

0

1

1

1

0

0

0

0

01

0

0 0

Figure 8.18: Trellis representation for the (7,4) Hamming code

trellis codes

Documents