an introduction to codesmath.ntnu.edu.tw/~li/note/code.pdf · · 2005-10-21an introduction to...

CHAPTER 1

An Introduction to Codes

Basic Definitions

The concept of a string is fundamental to the subjects of information and coding theory.Let A = {a1, a2, . . . , an} be a finite nonempty set which we refer to as an alphabet. A stringover A is simply a finite sequence of elements of A. Strings will be denoted by boldfaceletters, such as x and y. If x = x1x2 · · ·xk is a string over A, then each xi in x is called anelement of x. The length of a string x, denoted by Len(x), is the number of elements in thestring x.

A code is nothing more than a set of strings over a certain alphabet. Of course, codesare generally used to encode messages.

Definition 1.1. Let A = {a1, a2, . . . , ar} be a set of r elements, which we call a codealphabet and whose elements are called code symbols. An r-ary code over A is a subset C ofthe set of all strings over A. The number r is called the radix of the code. The elements of Care called codewords and the number of codewords of C is called the size of C. When A = Z2

and A = Z3, codes over A are referred to as binary codes and ternary codes, respectively.

Definition 1.2. Let S = {s1, s2, . . . , sn} be a finite set which we refer to as sourcealphabet. The elements of S are called source symbol and the number of source symbol inS is called the size of S. Let C be a code. An encoding function is a bijective functionf : S → C, from S to C. We refer to the ordered pair (C, f) as an encoding scheme for S.

Definition 1.3. If all the codewords in a code C have the same length, we say that C isa fixed length code, or block code. Any encoding scheme that uses a fixed length code will bereferred to as a fixed length encoding scheme. If C contains codewords of different lengths,we say that C is a variable length code. Any encoding scheme that uses a variable lengthcode will be referred to as a variable length encoding schemes.

Fixed length codes have advantages and disadvantages over variable length codes. Oneadvantage is that they never require a special symbol to separate the source symbols inthe message being coded. Perhaps the main disadvantage of fixed length codes is thatsource symbols that are used frequently have codes as long as source symbols that are usedinfrequently. On the other hand, variable length codes, which can encode frequently usedsource symbols using shorter codewords, can save a great deal of time and space.

Uniquely Decipherable Codes

Definition 1.4. A code C over an alphabet A is uniquely decipherable if no two differentsequences of codewords in C represents the same string over A. In symbols, if

c1c2 · · · cm = d1d2 · · ·dn

for ci, dj ∈ C, then m = n and ci = di for all i = 1, . . . , n.

1

2 1. AN INTRODUCTION TO CODES

The following theorem, proved by McMillan in 1956, provides us some information aboutthe codeword lengths for unique decipherable code.

Theorem 1.5 (McMillan’s Theorem). Let C = {c1, c2, . . . , cn} be a uniquely decipherabler-ary code and let li = Len(ci). Then its codeword lengths l1, l2 . . . , ln must satisfy

n∑i=1

1

rli≤ 1.

Remark 1.6. Consider the binary code C = {0, 11, 100, 110}. Its codeword lengths1, 2, 3 and 3 satisfy Kraft’s Inequality, but it is not uniquely decipherable. Hence McMillan’sTheorem cannot tell us when a particular code is uniquely decipherable, but only when it isnot.

Instantaneous Codes

Definition 1.7. A code is said to be instantaneous if each codeword in any string ofcodewords can be decoded (reading from left to right) as soon as it is received.

If a code is instantaneous, then it is also uniquely decipherable. However, there existcodes that are uniquely decipherable but not instantaneous.

Definition 1.8. A code is said to have the prefix property if no codeword is a prefix ofany other codeword, that is, if whenever c = x1x2 · · · xn is a codeword, then x1x2 · · · xk isnot a codeword for 1 ≤ k < n.

Given a code C, it is easy to determine whether or not it has the prefix property. It isonly necessary to compare each codeword with all codewords of greater length to see if it isa prefix. The importance of the prefix property comes from the following proposition.

Proposition 1.9. A code is instantaneous if and only if it has the prefix property.

Now we come to a theorem, published by L. G. Kraft in 1949, gives a simple criterion todetermine whether or not there is an instantaneous code with given codeword lengths.

Theorem 1.10 (Kraft’s Theorem). There exists an instantaneous r-ary code C withcodeword lengths l1, . . . , ln, if and only if these lengths satisfy Kraft’s inequality,

n∑i=1

1

rli≤ 1.

Remark 1.11. Again we should point out that as in Remark 1.6, Kraft’s Theorem doesnot say that any code whose codeword lengths satisfy Kraft’s inequality must be instanta-neous. However, we can construct an instantaneous code with these codeword lengths.

Definition 1.12. An instantaneous code C is said to be maximal instantaneous if C isnot contained in any strictly larger instantaneous code.

Corollary 1.13. Let C be an instantaneous r-ary code with codeword lengths l1, . . . , ln.Then C is maximal instantaneous if and only if these lengths satisfy

n∑i=1

1

rli= 1.

EXERCISES 3

McMillan’s Theorem and Kraft’s Theorem together tell us something interesting aboutthe relationship between uniquely decipherable codes and instantaneous codes. We have thefollowing useful result.

Corollary 1.14. If a uniquely decipherable code exists with codeword lengths l1, . . . , ln,then an instantaneous code must also exist with these same codeword lengths.

Our interest in Corollary 1.14 will come later, when we turn to questions related tocodeword lengths. For it tells us that we lose nothing by considering only instantaneouscodes rather than all uniquely decipherable codes.

Exercises(1) What is the minimum possible length for a binary block code containing n codewords?(2) How many encoding functions are possible from the source alphabet S = {a, b, c} to the

code C = {00, 01, 11}? List them.(3) How many r-ary codes are there with maximum codeword length n over an alphabet A?

What is this number for r = 2 and n = 5?(4) Which of the the following codes

C1 = {0, 01, 011, 0111, 01111, 11111} and C2 = {0, 10, 1101, 1110, 1011, 110110}are uniquely decipherable?

(5) Is it possible to construct a uniquely decipherable code over the alphabet {0, 1, . . . , 9}with nine codewords of length 1, nine codewords of length 2, ten codewords of length 3and ten codewords of length 4?

(6) For a given binary code C = {0, 10, 11}, let N(k) be the total number of sequences ofcodewords that contain exactly k bits. For instance, we have N(3) = 5. Show that inthis case N(k) = N(k − 1) + 2N(k − 2), for all k ≥ 3.

(7) Suppose that we want an instantaneous binary code that contains the codewords 0, 10and 1100. How many additional codewords of length 6 could be added to this code?

(8) Suppose that C is a maximal instantaneous code with maximum codeword length m.Show that C must contain at least two codewords of maximum length m.

CHAPTER 2

Noiseless Coding

Optimal Encoding Schemes

In order to achieve unique decipherability, McMillan’s Theorem tells us that we mustallow reasonably long codewords. Unfortunately, this tends to reduce the efficiency of acode. On the other hand, it is often the case that not all source symbols occur with the samefrequency within a given class of messages. When no errors can occur in the transmissionof data, it makes sense to assign the longer codewords to the less frequently used sourcesymbols, thereby improving the efficiency of the code.

Definition 2.1. An information source is an ordered pair I = (S,P), where S ={s1, . . . , sn} is a source alphabet and P is a probability law that assigns to each source symbolsi of S a probability P(si). The sequence P(s1), . . . ,P(sn) is the probability distribution forI.

For the noiseless coding, the measure of efficiency of an encoding scheme is its averagecodeword length.

Definition 2.2. The average codeword length of an encoding scheme (C, f) for an infor-mation source I = (S,P), where S = {s1, . . . , sn}, is defined by

n∑i=1

Len(f(si))P(si).

We should emphasizes the fact that the average codeword length of an encoding schemeis not the same as the average codeword length of a code, since the former depends also onthe probability distribution.

It is clear that the average codeword length of an encoding scheme is not affected bythe nature of the source symbols themselves. Hence, for the purposes of measuring averagecodeword length, we may assume that the codewords are assigned directly to the proba-bilities. Accordingly, we may speak of an encoding scheme (c1, . . . , cn) for the probabilitydistribution (p1, . . . , pn). With this in mind, the average codeword length of an encodingscheme C = (c1, . . . , cn) is

AveLen(C) =n∑

i=1

pi Len(ci).

Let (C1, f1) and (C2, f2) be two encoding schemes of the information source I such thatthe corresponding codes have the same radix. We say that (C1, f1) is more efficient than(C2, f2), if AveLen(C1) < AveLen(C2). We should point out that it makes sense to comparethe average codeword lengths of different encoding schemes only when the correspondingcodes have the same radix. For in general the larger the radix, the shorter we can make theaverage codeword length.

5

6 2. NOISELESS CODING

We will use the notation MinAveLenr(p1, . . . , pn) to denote the minimum average code-word length among all r-ary instantaneous encoding schemes for the probability distribution(p1, . . . , pn).

Definition 2.3. An optimal r-ray encoding scheme for a probability distribution (p1, . . . , pn)is an r-ary instantaneous encoding scheme (c1, . . . , cn) for which

AveLen(c1, . . . , cn) = MinAveLenr(p1, . . . , pn).

Note the optimal encoding schemes are, by definition, instantaneous. By virtue of Corol-lary 1.14, this minimum is also over all uniquely decipherable schemes. Hence, we mayrestrict attention to instantaneous codes.

Huffman Encoding

In 1952 D. A. Huffman published a method for constructing optimal encoding schemes.This method is now known as Huffman encoding.

Since we are dealing with r-ary codes, we may as well assume that the code alphabet is{1, 2, . . . , r}.

Lemma 2.4. Let P = (p1, . . . , pn) be a probability distribution, with p1 ≥ p2 ≥ · · · ≥ pn.Then there exists an optimal r-ary encoding scheme C = (c1, . . . , cn) for P that has exactlys codewords of maximum length of the form d1,d2, . . . , ds, where s is uniquely determinedby the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r.

As a result, for such probability distributions, we have

MinAveLenr(p1, . . . , pn) = MinAveLenr(p1, . . . , pn−s, q) + q,

where q =∑n

i=n−s+1 pi.

By Lemma 2.4 we can present Huffman’s algorithm.

Theorem 2.5. The following algorithm H produces r-ary optimal encoding schemes Cfor probability distributions P:

(1) If P = (p1, . . . , pn), where n ≤ r, then let C = (1, . . . , n).(2) If P = (p1, . . . , pn), where n > r, then

(a) Reorder P if necessary so that p1 ≥ p2 ≥ · · · ≥ pn.(b) Let Q = (p1, . . . , pn−s, q), where q =

∑ni=n−s+1 pi and s is uniquely determined

by the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r.(c) Perform the algorithmH on Q, obtaining an encoding scheme D = (c1, . . . , cn−s,d).(d) Let C = (c1, . . . , cn−s,d1,d2, . . . ,ds).

Entropy of a Source

For the information obtained from a source symbol, it should have the property that theless likely a source symbol is to occur, the more information we obtain from an occurrence ofthat symbol, and conversely. Because the information obtained from a source symbol is nota function of the symbol itself, but rather of the symbol’s probability of occurrence p, we usethe notation I(p) to denote the information obtained from a source symbol with probabilityof occurrence p.

ENTROPY OF A SOURCE 7

Definition 2.6. For a source alphabet S, the r-ary information Ir(p) obtained from asource symbol s ∈ S with probability of occurrence p, is given by

Ir(p) = logr

1

p.

Ir(p) can be characterized by the fact that it is the only continuous function on (0, 1]with the property that Ir(pq) = Ir(p) + Ir(q) and Ir(1/r) = 1.

Definition 2.7. Let P = {p1, . . . , pn} be a probability distribution. The r-ary entropyof the distribution P is

Hr(P) =n∑

i=1

piIr(pi) =n∑

i=1

pi logr

1

pi

.

(When pi = 0 we set pi logr(1/pi) = 0.) If I = (S,P) is a information source, with probabilitydistribution P = {p1, . . . , pn}, then we refer to Hr(I) = Hr(P) as the entropy of the sourceI.

The quantity Hr(I) is the average information obtained from a simple sample of I. Itseems reasonable to say that sampling from I with equal probability gives an amount ofinformation equal to one r-ary unit. For instance, if S = {0, 1} with P(0) = 1/2 andP(1) = 1/2, then it gives us one binary unit of information (or one bit of information). Wemention that many books on information theory restrict attention to binary entropy and usethe notation H(p1, . . . , pn) for binary entropy.

To begin with the main properties of entropy, we begin with a lemma which can be easilyderived from the fact that ln x ≤ x− 1, for all x > 0, and equality holds only when x = 1 .

Lemma 2.8. Let P = {p1, . . . , pn} be a probability distribution. Let Q = {q1, . . . , qn}have the property that 0 ≤ qi ≤ 1 for all i, and

∑ni=1 qi ≤ 1. Then

n∑i=1

pi logr

1

pi

≤n∑

i=1

pi logr

1

qi

,

(We set 0 · logr10

= 0 and p logr10

= +∞, for p > 0.)Furthermore, the equality holds if and only if pi = qi for all i.

With Lemma 2.8 at our disposal, we can get the range of th entropy function.

Theorem 2.9. For a information source I = (S,P) of size n (i.e. |S| = n), the entropysatisfies

0 ≤ Hr(P) ≤ logr n.

Furthermore, Hr(P) = logr n if and only if the source has a uniform distribution (i.e. allof the source symbols are equally likely to occur), and Hr(P) = 0 if and only if one of thesource symbols has probability 1 of occurring.

Theorem 2.9 confirms the fact that, on the average, the most information is obtainedfrom sources for which each source symbol is equally likely to occur.

8 2. NOISELESS CODING

The Noiseless Coding Theorem

As we know, the entropy H(I) of an information source I is the amount of informationcontained in the source. Further, since an instantaneous encoding scheme for I capturesthe information in the source, it is reasonable to believe that the average codeword lengthof such a code must be at least as large as the entropy. In fact, this is what the NoiselessCoding Theorem says.

Theorem 2.10 (The Noiseless Coding Theorem). For any probability distribution P =(p1, . . . , pn), we have

Hr(p1, . . . , pn) ≤ MinAveLenr(p1, . . . , pn) < Hr(p1, . . . , pn) + 1.

Notice that the condition for equality in Theorem 2.10 is that li = − logr pi, which meansthat logr pi is an integer. Since this is not often the case, we cannot often expect equality.In general, if we choose the integer li to satisfy

logr

1

pi

≤ li < logr

1

pi

+ 1,

for all i, then, by Kraft’s Theorem, there is an instantaneous encodings with these codewordlengths. An encoding scheme constructed by this method is referred as a Shannon-Fanoencoding scheme. However, this method does not, in general, give the smallest possibleaverage codeword length.

The Noiseless Coding Theorem determines MinAveLenr(p1, . . . , pn) to within 1 r-ary unit,but this may still be too much for some purposes. Fortunately, there is a way to improveupon this, based on the following idea.

Definition 2.11. Let S = {x1, . . . , xn} with probability distribution P(xi) = pi, for alli. The k-th extension of I = (S,P) is Ik = (Sk,Pk), where Sk is the set of all strings oflength k over S and Pn is the probability distribution defined for x = x1x2 · · · xk ∈ Sk byPk(x) = P(x1) · · · P(xk).

The entropy of an extension Ik is related to the entropy of I in a very simple way.It seems intuitively clear that, since we get k times as much information from a string oflength k as from a single symbol, the entropy of Ik should be k times the entropy of I. Thefollowing lemma confirms this.

Lemma 2.12. Let I be an information source and let Ik be its k-th extension. ThenHr(Ik) = kHr(I).

Applying the Noiseless Coding Theorem to the extension Ik and using Lemma 2.12, givesthe final version of the Noiseless Coding Theorem.

Theorem 2.13. Let P be a probability distribution and let Pk be its k-th extension. Then

Hr(P) ≤ MinAveLenr(Sk)

k< Hr(P) +

1

k.

Since each codeword in the k-th extension Sk encodes k source symbol from S, thequantity

MinAveLenr(Sk)

k

EXERCISE 9

is the minimum average codeword length per source symbol of S, taken over all uniquelydecipherable r-ary encodings of Sk. Theorem 2.13 says that, by encoding a sufficiently longextension of I, we may make the minimum average codeword length per source symbol of Sas close to the entropy Hr(P) as desired. The penalty for doing so is that, since |Sk| = |S|k,the number of codewords required to encode the k-th extension Sk grows exceedingly largeas k gets large.

Exercise(1) Let P = (0.3, 0.1, 0.1, 0.1, 0.1, 0.06, 0.05, 0.05, 0.05, 0.04, 0.03, 0.02). Find the Huff-

man encodings of P for the given radix r, with r = 2, 3, 4.(2) Determine possible probability distributions that have (00, 01, 10, 11) and (0, 10, 110, 111)

as binary Huffman encodings.(3) Determine all possible ternary Huffman encodings of sizes 5 and 6.(4) Let C be a binary Huffman encoding. Prove that C is maximal instantaneous.(5) Let C be a binary Huffman encoding for the uniform probability distribution P =

(1/n, . . . , 1/n) and suppose that Len(ci) = li for i = 1, . . . , n. Let m = maxi{li}(a) Show that C has the minimum total codeword length

∑ni=1 li among all instantaneous

encodings.(b) Show that there exist two codewords c and d in C such that Len(c) = Len(d) = m,

and c and d differ only in their last positions.(c) Show that m− 1 ≤ li ≤ m for i = 1, . . . , n.(d) Let n = α2k, where 1 < α ≤ 2. Let u be the number of codewords of length m− 1

and let v be the number of codewords of length m. determine u, v and m in termsof α and k.

(e) Find MinAveLen2(P).(6) Prove the following properties of entropy.

(a) Let {p1, . . . , pn, q1, . . . , qm} be a probability distribution. If p = p1 + ·+ pn, then

Hr(p1, . . . , pn, q1, . . . , qm) = Hr(p, 1− p) + pHr

(p1

p, . . . ,

pn

p

)+ (1− p)Hr

( q1

1− p, . . . ,

qm

1− p

).

(b) Let P = {p1, . . . , pn} and Q = {q1, . . . , qn} be two probability distributions. For0 ≤ t ≤ 1, we have

Hr(tp1 + (1− t)q1, . . . , tpn + (1− t)qn) ≥ tHr(p1, . . . , pn) + (1− t)Hr(q1, . . . , qn).

(c) Let P = {p1, . . . , pn} be a probability distribution. Suppose that ε is a positive realnumber such that p1 − ε > p2 + ε ≥ 0. Thus, {p1 − ε, p2 + ε, p3, . . . , pn} is also aprobability distribution. Show that

Hr(p1, . . . , pn) < Hr(p1 − ε, p2 + ε, p3, . . . , pn).

(7) Let S = {0, 1}. In order to guarantee that the average codeword length per sourcesymbol of S is at most 0.01 greater than the entropy of S, which extension of S shouldwe encode? How many codewords would we need?

(8) Let I be an information source and let I2 be its second extension. Is the second extensionof I2 equal to the fourth extension of S?

(9) Show that the Noiseless Coding Theorem is best possible by showing that for any ε > 0,there is a probability distribution P = {p1, . . . , pn} for which MinAveLenr(p1, . . . , pn)−Hr(p1, . . . , pn) ≥ 1− ε.

CHAPTER 3

Noisy Coding

Communications Channels

In the previous chapter, we discussed the question of how to most efficiently encode sourceinformation for transmission over a noiseless channel, where we did not need to be concernedabout correcting errors. Now we are ready to consider the question of how to encode sourcedata efficiently and, at the same time, minimize the probability of uncorrected errors whentransmitting over a noisy channel.

Definition 3.1. A communications channel consists of a finite input alphabet I ={x1, . . . , xs} and output alphabet O = {y1, . . . , yt}, and a set of forward channel probabil-ities or transition probabilities, Pf (yj | xi), satisfying

∑tj=1Pf (yj |xi) = 1, for all i = 1, . . . , s.

Intuitively, we think of Pf (yj |xi) as the probability that yj is received, given that xi issent through the channel. It is important not to confuse the forward channel probabilityPf (yj |xi) with the so-called backward channel probability Pb(xi | yj). In the forward prob-abilities, we assume a certain input symbol was sent. In the backward probabilities, weassume a certain output symbol is received.

Example 3.2. The noiseless channel, which we discussed in previous chapter, has thesame input and output alphabet I = O = {x1, . . . , xs} and channel probabilities Pf (xi |xj) ={

1 i = j,

0 otherwise.

Example 3.3. A communications channel is called symmetric if it has the same inputand output alphabet I = O = {x1, . . . , xs} and channel probabilities Pf (xi |xi) = Pf (xj |xj)and Pf (xi |xj) = Pf (xj |xi), for all i, j = 1, . . . , s. Perhaps the most important memorylesschannel is the binary symmetric channel, which has I = O = {0, 1} and channel probabilitiesPf (1 | 0) = Pf (0 | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1− p. Thus, the probability of a symbolerror, also called the crossover probability, is p.

Example 3.4. Another important memoryless channel is the binary erasure channel,which has input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilitiesPf (1 | 0) = Pf (0 | 1) = q, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1− p− q.

We will deal only with channels that have no memory, in the following sense.

Definition 3.5. A communications channel is said to be memoryless if for c = c1 · · · cn ∈I and d = d1 · · · dn ∈ O, the probability that d is received, given that c is sent, is

Pf (d | c) =n∏

i=1

Pf (di | ci).

We will also refer to the probabilities Pf (d | c) as forward channel probabilities.

11

12 3. NOISY CODING

We use the the term memoryless because the probability that an output symbol di isreceived depends only on the current input ci, and not on previous inputs.

Decision Rules

A decision rule for C is a partial function f from the set of output strings to the set of thecodewords C. The process of applying a decision rule is referred to as decoding. The word“partial” refers to the fact that f may not be defined for all output strings. The intentionis that, if an output string d is received and if f(d) ∈ C is defined, then the decision ruledecodes that f(d) is the codeword that was sent or else declares a decoding error.

Our goal is to find a decision rule that maximizes the probability of correct decoding.The probability of correct decoding can be expressed in a variety of ways.

Conditioning on the codeword sent gives

P(correct decoding) =∑c∈C

∑

d∈Bc

Pf (d | c)Pi(c),

where Bc = {d|f(d) = c} and Pi(c) is the probability that c is sent through the channel.The probabilities {Pi(c)| c ∈ C} form the so-called input distribution for the channel.

Conditioning instead on the string received gives

P(correct decoding) =∑

d

Pb(f(d) |d)Po(d),

where Po(d) is the probability that d is received through the channel and is called the outputdistribution for the channel.

The probability of correct decoding can be maximized by choosing the decision rule thatmaximizes each of the conditional probability Pb(f(d) |d).

Definition 3.6. Any decision rule f for which f(d) has the property that

Pb(f(d) |d) = maxc∈C

Pb(c |d),

for every possible received string d, is called an ideal observer.

Proposition 3.7. An ideal observer decision rule maximizes the probability of the correctdecoding of received strings among all decision rules.

We remark that an ideal observer decision rule depends on the input distribution because

Pb(c |d) =Pf (d | c)Pi(c)∑

c′∈C Pf (d | c′)Pi(c′).

For the case that the input probability distribution is uniform, i.e. Pi(c) = 1/|C|, we have

Pb(c |d) =Pf (d | c)∑

c′∈C Pf (d | c′) .

Now the denominator on the right is a sum of forward channel probabilities and thus dependsonly on the communications channel. Thus, maximizing Pb(c |d) is equivalent to maximizingPf (d | c). This leads to the following definition and proposition.

Definition 3.8. Any decision rule f for which f(d) maximizes the forward channelprobabilities, that is, for which

Pf (d | f(d)) = maxc∈C

Pf (d | c),

CONDITIONAL ENTROPY AND CHANNEL CAPACITY 13

for every possible received string d, is called a maximum likelihood decision rule.

Proposition 3.9. For the uniform input distribution, an ideal observer is the same asa maximum likelihood decoding.

Conditional Entropy and Channel Capacity

In general, knowing the value of the output of a channel will have an effect on ourinformation about the input. This leads us to make the following definition.

Definition 3.10. Consider a communications channel with the input alphabet I andthe output alphabet O. The r-ary conditional entropy of I, given y ∈ O, is defined by

Hr(I | y) =∑x∈I

Pb(x | y) logr

1

Pb(x | y).

The r-ary conditional entropy of I, given O, is the average conditional entropy defined by

Hr(I |O) =∑y∈O

Hr(I | y)Po(y).

Note that Hr(I |O) measure the amount of information remaining in I, after samplingO, and so it can be interpreted as the loss of information about I caused by the channel.

Conditional entropy can also be defined for strings.

Definition 3.11. Let C be a code over the input alphabet I and D be the set ofoutput strings over the output alphabet O. The r-ary conditional entropy of C, given thatd = y1 · · · ym ∈ D, is defined by

Hr(C |d) =∑c∈C

Pb(c |d) logr

1

Pb(c |d).

The r-ary conditional entropy of C, given D is defined by

Hr(C |D) =∑

d∈D

Hr(C |d)Po(d).

The quantity Ir(I, O) = Hr(I) −Hr(I |O) is the amount of information in I minus theamount of information still in I after knowing O. In other words, Ir(I, O) is the amount ofinformation about I that gets through the channel.

Definition 3.12. The r-ary mutual information of I and O is defined by

Ir(I, O) = Hr(I)−Hr(I |O) =∑x∈I

Pi(x) logr

1

Pi(x)−Hr(I |O).

Notice that the quantity Ir(I, O) depends upon the input distribution of I as well as theforward channel probabilities Pf (y |x).

We are now ready to define the concept of the capacity of a communications channel.This concept plays a key role in the main results of information theory.

Definition 3.13. The capacity of a communications channel is the maximum mutualinformation Ir(I, O), taken over all input distributions of I.

14 3. NOISY CODING

Proposition 3.14. Consider a symmetric channel with input alphabet and output al-phabet I of size r. Then capacity of this symmetric channel is

1−∑y∈I

Pf (y |x) logr

1

Pf (y |x),

for any x ∈ I. Furthermore, the capacity is achieved by the uniform input distribution.

Corollary 3.15. The capacity of the binary symmetric channel with crossover proba-bility p is

1 + p log2 p + (1− p) log2(1− p).

The Noisy Coding Theorem

It is sometimes said that there are two main results in information theory. One is theNoiseless Coding Theorem, which we discussed in previous chapter, and the other is theso-called Noisy Coding Theorem.

Before we can state the Noisy Coding Theorem formally, we need to discuss in detail thenotion of rate of transmission. Let us suppose that the source information is in the formof strings of length k, over the input alphabet I of size r and that the r-ary block codeC consist of codewords of fixed length n over I. Now, since the channel must transmit ncode symbols in order to send k source symbols, the rate of transmission is R = k/n sourcesymbols per code symbol. Further, since there are rk possible source strings, the code musthave size at least rk in order to accommodate all of these strings. Assuming that |C| = rk,we have k = logr|C| and hence R = logr|C|/n. Thus we have the following.

Definition 3.16. An r-ary block code C of length n and size |C| is called an (n, |C|)−code. The number

R(C) =logr|C|

n

is called the rate of C.

Now, we can state the Noisy Coding Theorem. Let dxe denote the smallest integer greaterthan or equal to x.

Theorem 3.17 (The Noisy Coding Theorem). Consider a memoryless communicationschannel with capacity C. For any positive number R < C, there exists a sequence Cn of r-aryblock codes and corresponding decision rules fn with the following properties.

(1) Cn is an (n, drnRe)-code. Thus, Cn has length n and rate at least R.(2) The probability of decoding error of fn approach 0 as n →∞.

Roughly speaking, the Noisy Coding Theorem says that, if we choose any transmissionrate below the capacity of the channel, there exists a code that can transmit at that rateand yet maintain a probability of decoding error below some predefined limit.

The price we pay for this efficient encoding is that the code size n may be extremelylarge. Furthermore, the known proofs of this theorem tell us only that such a code mustexist, but do not show us how to actually find these codes.

EXERCISE 15

Exercise(1) Consider a channel whose input alphabet is the set of all integers between −n and n and

whose output is the square of the input. Determinate the forward channel probabilitiesof this channel.

(2) Suppose that codewords from the code {0000, 1111} are being sent over a binary sym-metric channel (c.f. Example 3.3) with crossover probability p = 0.01. Use the maximumlikelihood decision rule to decode the received strings 0000, 0010 and 1010.

(3) Let C be a block code consists of all 8 binary strings of length 3. Denote the input code-word by i1i2i3 and the received string by o1o2o3. Let B.S.C. denote a binary symmetricchannel with crossover probability p = 0.001. Consider the following different channels.(a) The first channel works as follows: send i1 through the B.S.C. to get o1 and no

matter what i2 and i3 are, choose o2 and o3 randomly.(b) The second channel works as follows: send i1 through the B.S.C. to get o1, send i2

through the B.S.C. to get o2 and send i3 through the B.S.C. to get o3.(c) The third channel works as follows: choose o1 = o2 = o3 to be the majority bit

among i1, i2 and i3.Compute the probability of correct decoding for each of these channels, assuming auniform input distribution. Which channel is best?

(4) Show that for a symmetric channel with uniform input distribution, the output distri-bution is also uniform.

(5) Let I and O be the input alphabet and the output alphabet of a noiseless communicationschannel. Show that Hr(I |O) = 0.

(6) Let I and O be the input alphabet and the output alphabet of a communications channelwith forward channel probabilities {Pf (y | x) |x ∈ I, y ∈ O}. Suppose that {Pi(x) | x ∈I} is the input distribution and {Po(y) | y ∈ O} is the output distribution for the channel.(a) Show that the backward channel probability for x ∈ I and y ∈ O is

Pb(x | y) =Pf (y |x)Pi(x)

Po(y).

(b) Show that for an r-ary symmetric channel,

Ir(I, O) =∑y∈O

Po(y) logr

1

Po(y)−

∑y∈O

Pf (y | x) logr

1

Pf (y |x),

for any x ∈ I.(7) Consider the special case of a binary erasure channel (c.f. Example 3.4), which has

input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilitiesPf (1 | 0) = Pf (0 | 1) = 0, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p.Calculate the mutual information I2(I, O) in terms of the input probability Pi(0) = p0.Then determine the capacity of the channel, and an input probability that achieves thatcapacity.

CHAPTER 4

General Remarks on Codes

Nearest Neighbor Decoding

In general the problem of finding good codes is a very difficult one. However, by makingcertain assumptions about the channel, we can at least give the problem a highly intuitiveflavor. We begin with a definition.

Definition 4.1. Let x = x1x2 · · · xn and y = y1y2 · · · yn be strings of the same length nover the same alphabet A. The Hamming distance d(x,y) between x and y is the numberof positions in which xi 6= yi.

For instance, if x = 10112 and y = 20110, then d(x,y) = 2. The following result saysthat Hamming distance is a metric.

Proposition 4.2. Let An be the set of all strings of length n over the alphabet A. Thenthe Hamming distance function d : An × An → N satisfies the following properties. For allx, y and z in An,

(1) d(x,y) ≥ 0 and d(x,y) = 0 if and only if x = y;(2) d(x,y) = d(y,x);(3) d(x,y) ≤ d(x, z) + d(z,y).

In other words, (An, d) is a metric space.

Suppose that C is a block code of length n over A. The codewords that are closest to agiven received string x are referred to as nearest neighbor codewords. The nearest neighbordecoding or minimum distance decoding is the decision rule that decodes a received stringsas a nearest neighbor codeword. When there are more than one nearest neighbor codeword,we will refer to this situation as a tie. In some cases, we may wish to choose randomly fromamong the candidates. In other cases, it might be more desirable simply to admit a decodingerror. The term complete decoding refers to the case where all received strings are decodedand the term incomplete decoding refers to the case where we prefer occasionally to simplyadmit an error, rather than always decodes.

There are many channels for which maximum likelihood decoding takes the intuitiveform of nearest neighbor decoding. For instance, the r-ary symmetric channel with forwardchannel probabilities

Pf (xi |xj) =

{1− p if i = j,

pr−1

otherwise.

has this property, for p < 1/2.In implementing nearest neighbor decoding, the following concepts are useful.

Definition 4.3. Let C be a block code with at least two codewords. The minimumdistance of C is defined to be

d(C) = min{d(c,d) | c,d ∈ C, c 6= d}.17

18 4. GENERAL REMARKS ON CODES

An (n,M, d)-code is a block code of size M , length n and minimum distance d. Thenumbers n, M and d are called the parameters of the code.

Since for c 6= d, d(c,d) ≥ 1, the minimum distance of a code must be at least 1.

Perfect Code

Definition 4.4. Let x be a string in An, where |A| = r and let ρ > 0. The sphere inAn with center x and radius ρ is the set

Snr (x, ρ) = {y ∈ An | d(x,y) ≤ ρ}.

The volume V nr (ρ) of the sphere Sn

r (x, ρ) is the number of elements in Snr (x, ρ).

This volume is independent of the center and is given by

V nr (ρ) =

bρc∑

k=0

(n

k

)(r − 1)k,

where bρc denote the greatest integer smaller than or equal to ρ.We can determine the minimum distance of a code C by simply increasing the radius t of

the spheres centered at each codeword of C until just before two spheres become “tangent”(which will happen when d(C) = 2t + 2), or just before two spheres “overlap” (which willhappen when d(C) = 2t + 1).

Definition 4.5. Let C ∈ An be a code. The packing radius of C is the largest integer ρfor which the spheres Sn

r (c, ρ) centered at each codeword c are disjoint. The covering radiusof C is the smallest integer ρ′ for which the spheres Sn

r (c, ρ′) centered at each codeword ccover An. We will denote the packing radius of C by pr(C) and the covering radius by cr(C).

Proposition 4.6. The packing radius of an (n,M, d)-code C is pr(C) = bd−12c.

The following concept plays a major role in coding theory.

Definition 4.7. An r-ary (n,M, d)-code C is perfect if pr(C) = cr(C)

In words, a code C ⊆ An is perfect if there exists a number ρ for which the spheresSn

r (c, ρ) centered at each codeword c are disjoint and cover An.The size of a perfect code is uniquely determined by the length and the minimum distance.

The following result is known as the sphere-packing condition.

Proposition 4.8. Let C be an r-ary (n,M, d)-code. Then C is perfect if and only ifd = 2v + 1 is odd and

M · V nr (v) = M ·

v∑

k=0

(n

k

)(r − 1)k = rn.

It is important to emphasize that the existence of numbers n, M and d = 2v + 1 forwhich the sphere-packing condition holds does not mean that there is a perfect code withthese parameters. The problem of determining all perfect codes has not yet been solved.However, a great deal is known about perfect codes over alphabets whose size is a power ofa prime.

ERROR DETECTION AND ERROR CORRECTION 19

Error Detection and Error Correction

Let u be a positive integer. If u errors occur in the transmission of a codeword, we willsay that an error of size u has occurred. It is possible that so many errors occurred asto change the codeword into another codeword, so that we cannot detect if any error hasoccurred or not.

Definition 4.9. A code C is u-error-detecting, if whenever an error of size of at mostu but at least one has occurred, the resulting string is not a codeword. A code C is exactlyu-error-detecting if it is u-error-detecting but not u + 1-error-detecting.

The next theorem is essentially just a restatement of the definition of u-error-detectingin terms of minimum distance.

Theorem 4.10. A code C is u-error-detecting if and only if d(C) ≥ u+1. In particular,C is exactly u-error-detecting if and only if d(C) = u + 1.

Definition 4.11. Let v be a positive integer. A code C is v-error-correcting if nearestneighbor decoding is able to correct v or fewer errors, assuming that if a tie occurs in thedecoding process, a decoding error is reported. A code is exactly v-error-correcting if it isv-error-correcting but not (v + 1)-error-correcting.

It should be kept in mind that, as long as the received word is not a codeword, nearestneighbor decoding will decode it as some codeword, but the receiver has no way of knowingwhether that codeword is the one that was actually sent. We know only that, under av-error-correcting code, if no more than v errors were introduced, then nearest neighbordecoding will produce the codeword that was sent.

Theorem 4.12. A code is v-error-correcting if and only if d(C) ≥ 2v + 1. In particular,C is exactly v-error-correcting if and only if d(C) = 2v + 1 or d(C) = 2v + 2.

Corollary 4.13. A code C has d(C) = d if and only if it is exactly bd−12c-error-

correcting.

The following result is a consequence of Proposition 4.6 and Theorem 4.12. It shows theconnection between error correction and pr(C).

Corollary 4.14. Assuming that ties are always reported as error, a code C is exactlyv-error-correcting if and only if pr(C) = v.

Example 4.15. The r-ary repetition code of length n is

Repr(n) = {00 · · · 0, 11 · · · 1, . . . , (r − 1)(r − 1) · · · (r − 1)},consisting of r codewords each of length n. The r-ary repetition code of length n can detectup to n− 1 errors in transmission, and so it is exactly (n− 1)-error-detecting. Furthermore,it is exactly bn−1

2c-error-correcting.

Suppose that a code C has minimum distance d. If we use C for error detecting only,it can detect up to d − 1 errors. On the other hand, if we want C to also correct errorswhenever possible, then it can correct up to bd−1

2c errors, but may no longer be able to

detect a situation where more than bd−12c but less than d errors have occurred. For if more

than bd−12c are made, nearest neighbor decoding might “correct” the received word to the

wrong codeword and thus the errors will go undetected.


We consider the following strategy: Let v be a positive integer. If a string x is receivedand if the closed codeword c to x is at a distance of at most v, and there is only one suchcodeword, then decode x as c. If there is more than one codeword at minimum distance tox or if the closest codeword has distance greater than v, then simply declare an error.

Definition 4.16. A code C is simultaneously v-error-correcting and u-error-detecting, ifwhenever at least one but at most v errors are made, the strategy describe above will correctthese errors and if whenever at least v + 1 but at most v + u errors are made, the strategyabove simply reports an error.

Theorem 4.17. A code C is simultaneously v-error-correcting and u-error-detecting ifand only if d(C) ≥ 2v + u + 1.

It is intuitively clear that, given any code C, we may continually add new codewords toit at no cost to its minimum distance. This leads us to make the following definition.

Definition 4.18. An (n,M, d)-code is said to be maximal if it is not contained in anylarger code with the same minimum distance, that is, if it is not contained in any (n,M+1, d)-code.

Thus an (n,M, d)-code C is maximal if and only if, for all strings x ∈ An, there is acodeword c ∈ C with the property that d(x, c) < d.

Proposition 4.19. For the binary symmetric channel with crossover probability p usingminimum distance decoding, the probability of a decoding error for maximal (n,M, d)-codesatisfies

n∑

k=d

(n

k

)pk(1− p)n−k ≤ P(decode error) ≤ 1−

b d−12c∑

k=0

(n

k

)pk(1− p)n−k.

Furthermore, for a non-maximal code, the upper bound still holds, but the lower bound maynot.

Making New Codes from Old Codes

There are several useful techniques that can be used to obtain new codes from old codes.In the following, we always suppose that our codes are over the alphabet A = Zr = Z/rZ.

Extending a Code. The process of adding one or more additional positions to all thecodewords in a code, thereby increasing the length of the code, is referred to as extending thecode. The most common way to extend a code is by adding an overall parity check, which isdone as follows. If C is an r-ary (n, M, d)-code over Zr, we define the extended code C by

C = {c1c2 · · · cncn+1 | c1c2 · · · cn ∈ C andn+1∑

k=1

ck ≡ 0 (mod r)}.

If C is an (n, M, d)-code, then n = n + 1, M = M and d = d or d + 1.We remark that for a binary (n,M, d)-code C, the minimum distance of C depends on

the parity of d. In particular, since all of the codewords in C have even sum, the minimumdistance of C is even. It follows that if d is even then d(C) = d and if d is odd then

d(C) = d + 1. Moreover, since bd(C)−12

c = bd(C)−12

c, the error-correcting capabilities of thecode do not increase.

MAKING NEW CODES FROM OLD CODES 21

Puncturing a Code. The opposite process to extending a code is puncturing a code, inwhich one or more positions are removed from the codewords. If C is an r-ary (n,M, d)-codeand if d ≥ 2, then the code C∗ obtained by puncturing C once has parameters n∗ = n− 1,M∗ = M and d∗ = d or d− 1.

For binary code, the process of extending and puncturing can be used to prove thefollowing useful result.

Lemma 4.20. A binary (n,M, 2v+1)-code exists if and only if a binary (n+1,M, 2v+2)-code exists.

Shortening a Code. Shortening a code refers to the process of keeping only thosecodewords in a code that have a given symbol in a given position, and then deleting thatposition. If C is an (n,M, d)-code then a shortened code has length n − 1 and minimumdistance at least d. In fact, shortening a code can result in a substantial increase in theminimum distance, but shortening a code does result in a code with smaller size.

The shortened code formed by taking codewords with an s in the i-th position is referredto as the cross-section xi = s. We will have many occasions to use cross-sections in thesequel.

Augmenting a Code. Augmenting a code which simply means adding additional stringsto the code. A common way to augment a binary code C is to include the complements ofeach codeword in C, where the complement of a binary codeword c is the string obtainedfrom c by interchanging all 0’s and 1’s.

Let us denote the complement of c by cc and denote the set of all complements of thecodewords in C by Cc. It is easy to check that if x,y ∈ Zn

2 , then d(x,yc) = n− d(x,y).

Proposition 4.21. Let C be a binary (n, M, d)-code. Suppose that d′ is the maximumdistance between codewords in C. Then d(C ∪ Cc) = min{d, n− d′}.

The Direct Sum Construction. If C1 is an r-ary (n1,M1, d1)-code and C2 is an r-ary(n2,M2, d2)-code, the direct sum C1 ¯ C2 is the code

C1 ¯ C2 = {cd | c ∈ C1, d ∈ C2}.Clearly, C1 ¯ C2 has parameters n = n1 + n2, M = M1M2 and d = min{d1, d2}.

The u(u + v) Construction. A much more useful construction than the direct sum isthe following. If C1 is an r-ary (n, M1, d1)-code and C2 is an r-ary (n,M2, d2)-code, then wedefine a code C1 ⊕ C2 by

C1 ⊕ C2 = {c(c + d) | c ∈ C1,d ∈ C2}.Certainly, the length of C1 ⊕ C2 is 2n and the size is M1M2. As for the minimum distance,consider two distinct codewords x = c1(c1 + d1) and y = c2(c2 + d2). If d1 = d2, thend(x,y) ≥ 2d1. On the other hand, if d1 6= d2, then d(x,y) ≥ d2. Since equality can hold inboth cases, we get the following result.

Lemma 4.22. Let C1 be an r-ary (n,M1, d1)-code and C2 be an r-ary (n,M2, d2)-code.Then C1 ⊕ C2 is a (2n,M1M2, d

′)-code, where d′ = min{2d1, d2}.


Equivalence of Codes. There are various definitions of equivalence of codes in theliterature. We will adopt the following definitions.

Definition 4.23. Two r-ary (n,M)-codes C1 and C2 are equivalent if there exists apermutation σ of the n positions and permutations π1, . . . , πn of the code alphabet for whichc1c2 · · · cn ∈ C1 if and only if π1(cσ(1))π2(cσ(2)) · · · πn(cσ(n)) ∈ C2.

In particular, any r-ary code over Zr is equivalent to a code that contains the zero code-word 0 = 00 · · · 0. Furthermore, equivalent codes have the same length, size and minimumdistance.

The Main Coding Theory Problem

A good r-ary (n,M, d)-codes should have a relatively large size so that it can be usedto encode a large number of source messages and it should have a relatively large minimumdistance so that it can be used to correct a large number of errors. Not surprisingly, thesegoals are conflicting.

For given values of n and d, it is customary to let Ar(n, d) denote the largest possible sizeM for which there exists an r-ary (n,M, d)-code. Any r-ary (n,M, d)-code with M = Ar(n, d)is called an optimal code. The numbers Ar(n, d) play a central role in coding theory andmuch effort has been expended in attempting to determine their values. In fact, determiningthe values of Ar(n, d) has come to be known as the main coding theory problem.

Note that in order to show that Ar(n, d) = M , it is enough to show that Ar(n, d) ≤ Mand then find a specific r-ary (n,M)-code C for which d(C) ≥ d, which shows that Ar(n, d) ≥Ar(n, d(C)) ≥ M .

Example 4.24. Let C be a binary (4,M, 3)-code. Without lose of generality, we mayassume that C contains the zero codeword 0 = 0000. Now since d(c,0) ≥ 3 for any othercodeword c in C. This leaves five possibilities for additional codewords in C, namely:

1110, 1101, 1011, 0111, 1111.

But no pair of these has distance 3 apart, and so only one can be included in C. HenceA2(4, 3) = 2.

Example 4.25. Let C be a binary (5,M, 3)-code. Consider the cross-section C0 definedby x1 = 0. We know that C0 has minimum distance d0 where 4 ≥ d0 ≥ 3 and sinceA2(4, 3) = A2(4, 4) = 2, it follows that C0 has size M0 ≤ 2. Similarly, consider the cross-section C1 defined by x1 = 1. C1 has size M1 ≤ 2. Thus M = M0 + M1 ≤ 4 and henceA2(5, 3) ≤ 4. On the other hand, the code C = {00000, 11100, 00111, 11011} has minimumdistance d(C) = 3 and so A2(5, 3) = 4.

The approach used in Example 4.25 will not go very far in determining values of A2(n, d).In fact, very few actual values of A2(n, d) are known. For instance, we only know that72 ≤ A2(10, 3) ≤ 79.

Let us now turn to the establishment of some general results about the numbers Ar(n, d).

Proposition 4.26. For any n ≥ 1,

(1) Ar(n, d) ≤ rn for all 1 ≤ d ≤ n.(2) Ar(n, 1) = rn.(3) Ar(n, n) = r.

THE MAIN CODING THEORY PROBLEM 23

Let C be an optimal r-ary (n,M, d)-code. By use of the pigeon-hole principle, one ofthe cross-sections x1 = i of C must contain at least M/r codewords, and so we have thefollowing.

Proposition 4.27. For any n ≥ 2, Ar(n, d) ≤ rAr(n− 1, d).

According to Lemma 4.20, a binary (n,M, 2v + 1)-code exists if and only if a binary(n + 1,M, 2v + 2)-code exists. Hence, we immediately have the following.

Proposition 4.28. If d > 0 is even, then A2(n, d) = A2(n− 1, d− 1).

Thus, for binary codes, it is enough to determine A2(n, d) for all odd values of d.Let us now turn our attention to some upper and lower bounds on the numbers Ar(n, d)

that arise from considering spheres in Znr .

Let C = {c1, . . . , cM} be an optimal r-ary (n,M, d)-code over Zr. Thus M = Ar(n, d).Because C has maximal size, there can be no string in Zn

r whose distance from every codeword

in C is at least d. In symbols Znr ⊆

⋃Mi=1 Sn

r (ci, d − 1). Since |Znr | = rn, it implies that

rn ≤ V nr (d− 1) ·M . We arrive at the following result, called the sphere-covering bound for

Ar(n, d).

Theorem 4.29 (The sphere-covering bound for Ar(n, d)). If V nr (ρ) denotes the volume

of a sphere of radius ρ in Znr , then

rn

V nr (d− 1)

≤ Ar(n, d).

The sphere-covering bound is a lower bound for Ar(n, d). We can derive an upper boundfor Ar(n, d) by similar methods. In particular, let C = {c1, . . . , cM} be an optimal (n,M, d)-

code. Since pr(C) = bd−12c and

⋃Mi=1 Sn

r (ci, pr(C)) ⊆ Znr , we have the sphere-packing bound

for Ar(n, d).

Theorem 4.30 (sphere-packing bound for Ar(n, d)). If V nr (ρ) denotes the volume of a

sphere of radius ρ in Znr , then

Ar(n, d) ≤ rn

V nr (bd−1

2c) .

The sphere-packing bound is not the only useful upper bound on the values of Ar(n, d).We consider two additional bounds.

Let C be an (n,M, d)-code. If we remove the last d− 1 positions from each codeword inC, the resulting shortened codewords must all be distinct. Since the length of the shortenedcodewords is n− d− 1, we have the following.

Theorem 4.31 (The Singleton bound).

Ar(n, d) ≤ rn−d+1.

Example 4.32. According to the Singleton bound, Ar(4, 3) ≤ r2. On the other hand,the sphere-packing bound is Ar(4, 3) ≤ r4/(4r− 3). Thus, for r ≥ 4. the Singleton bound ismuch more better than the sphere-packing bound.

Let C be an r-ary (n,M, d)-code and consider the sum of the distance between codewords,which is given by S =

∑c,d∈C d(c,d). Since the minimum distance of C is d, we have

S ≥ M(M − 1)d. On the other hand, suppose that the number of j’s in the i-th position of


all codewords in C is kij, where j = 0, . . . , r − 1. Then the i-th position contributes a totalof

r−1∑j=0

kij(M − kij) = M2 −r−1∑j=0

k2ij ≤ M2 − M2

r

to S, since the last sum above is smallest when kij = M/r. Since there are n positions, wehave M(M − 1)d ≤ S ≤ nM2(1− 1/r). Solving for M gives the following result.

Theorem 4.33 (The Plotkin Bound). If n < dr/(r − 1), then

Ar(n, d) ≤ dr

dr − nr + n.

The Plotkin bound can easily be refined a bit when r = 2.

Theorem 4.34. (The Plotkin Bound for Binary Code).

(1) If d is even and n < 2d, then

A2(n, d) ≤ 2b d

2d− nc

and for n = 2d, A2(2d, d) ≤ 4d.(2) If d is odd and n < 2d + 1, then

A2(n, d) ≤ 2b d + 1

2d + 1− nc

and for n = 2d + 1, A2(2d + 1, d) ≤ 4d + 4.

The Plotkin bound applies only when the minimum distance d is rather large. It seemssuperior to the sphere-packing bound.

Example 4.35. The Plotkin bound can also be used, in conjunction with Proposition4.27, to give an upper bound when d ≤ n(r − 1)/r. For example, We have A2(13, 5) =23A2(10.5) ≤ 96.

Exercise(1) Consider the code C consisting of all strings in Zn

2 that have an even number of 1s. Whatis the length, size, and minimum distance of C?

(2) Let c, d ∈ An and consider the sets S = {x ∈ An | d(x, c) < d(x,d)} and T = {x ∈An | d(x, c) > d(x,d)}. Show that |S| = |T |.

(3) Construct an explicit example to illustrate that simultaneous error detection and correc-tion can reduce the error detecting capabilities of a code.

(4) Estimate the probability of a decoding error using the binary repetition code of length5 under a binary symmetric channel with crossover probability p = 0.001.

(5) Dose a binary (8, 4, 5)-code exist? Justify your answer.(6) Let C be an r-ary (n,M, d)-code over the alphabet Zr. Show that, as long as d < n,

then for some position i, there is a cross-section that has minimum distance d. Whatcan happen if d = n?

(7) Suppose that C is an (n,M, d)-code. Show that C is a cross-section of a larger code withparameters (n + 1,M + 2, 1).

(8) Let C1 = {c1c2c3c4 | c1 + c2 + c3 + c4 ≡ 0 (mod 2)} be the code over Z2.(a) What are the parameters of C1?

EXERCISE 25

(b) Construct C2 = C1 ⊕ Rep2(4). What are the parameters of C2?(c) What are the parameters of C3 = C2 ⊕ Rep2(8)?(d) What are the parameters of C4 = C3 ⊕ Rep2(16)?(e) Show that we can construct a binary (2m, 2m+1, 2m−1)-code in this fashion.

(9) If C is a code over Zp and C is the code obtained by adding an overall parity check,what is the relation between the minimum distances of C and C?

(10) Verify that A2(6, 5) = 2, A2(7, 5) = 2 and A2(8, 5) = 4.(11) Let C be an (n,M, d)-code.

(a) If C is not maximal, is it always possible to add codewords to C until the resultingcode is maximal?

(b) If C is not optimal, is it always possible to add codewords to C until the resultingcode is optimal?

(c) Given an example of a code that is maximal but not optimal.(12) Is there a binary (8, 29, 3)-code? Explain.(13) Show that Ar(r + 1, 5) ≤ 2rr−2/(r − 1).(14) Compare the Singleton, Plotkin and sphere-packing upper bounds for A2(9, 5).(15) Let C be a perfect binary (n,M, 7)-code. Use the sphere-packing condition to show that

n = 7 or n = 23.

CHAPTER 5

Linear Codes

Finite Fields

Finite fields play a major role in coding theory and so it is important to gain a solidunderstanding of the structure of such fields.

Let K and F be fields. If K is an extension of F , we write K/F . In this case, K isalso a vector space over F . If the dimension of K over F is finite, we say that K is a finiteextension of F and denote this dimension by [K : F ]. It is easy to check that if F is a finitefield and K is a finite extension of F with d = [K : F ], then K is a finite field such that|K| = |F |d.

If R is a ring and if there exists a positive integer n for which

n · a = a + · · ·+ a︸︷︷︸n times

= 0

for all a ∈ R, then the smallest such n is called the characteristic of R and is denoted bychar(R). If no such positive integer n exists, we say that R has characteristic 0.

In a field of characteristic 0, the positive integers 1, 2, . . . , are all distinct, and so a finitefield must have nonzero characteristic. Suppose that the characteristic of a finite field F isn. If n = uv where 1 < u, v < n, then (u · 1)(v · 1) = 0 implying u · 1 = 0 or v · 1 = 0. Ineither case, we have a contradiction to the fact that n is the smallest positive integer suchthat n · 1 = 0. Thus, n must be a prime number.

Lemma 5.1. If F is a finite field, then F has prime characteristic. Furthermore, ifchar(F ) = p, then F has pn elements, for some positive integers n.

From now on, p will represent a prime number and q will represent a prime power.The following result is a key reason why the theory of finite fields has its characteristic

flavor.

Lemma 5.2. If F is a finite field of characteristic p, then

(α + β)pn

= αpn

+ βpn

,

for any positive integer n and for all α, β ∈ F .

According to the definition, the set F ∗ of nonzero elements of a field F forms a groupunder multiplication. If |F | = q, then |F ∗| = q− 1 and since the order of every element in agroup divides the order of the group, we have αq−1 = 1 for all α ∈ F ∗. In other words, everyelement of F is a root of the polynomial fq(x) = xq − x. But since this polynomial has atmost q roots, we see that F is the set of all roots of fq(x) and therefore is also the splittingfield for fq(x).

Lemma 5.3. If F is a finite field of q elements, then F is both the set of all roots offq(x) = xq − x and the splitting field for fq(x).

27

28 5. LINEAR CODES

Since any two splitting fields for the same polynomial are isomorphic, Lemma 5.3 tellsus that any finite field of the same size is isomorphic. We will denote a finite field of size qby Fq.

It remains now to determine whether or not there is a finite field of size q for every primepower q = pn. Let K be the splitting field for fq(x) = xq − x and let R be the set of rootsof fq(x). If α, 0 6= β ∈ R, then by Lemma 5.2, α + β and αβ−1 are also in R. Thus, R is asubfield of K which implies that R = K. Let us summarize our results.

Theorem 5.4. All finite fields have size q = pn, for some prime p. On the other hand,for every q = pn, there exists a unique (up to isomorphism) field of size q.

Our goal now is to describe the subfield of a finite field. Suppose that K is a field of sizepn and let d |n. It is not hard to show that pd − 1 | pn − 1 and so xpd − x |xpn − x. Hence

fpd(x) = xpd − x splits into linear factors over K. In other words, K contains a subfield ofsize pd.

Theorem 5.5. Let K be a finite field of size pn. Then K has exactly one subfield of sizepd for each d |n. Furthermore, this accounts for all of the subfields of K.

For a finite field, the multiplicative group K∗ could not have a simpler structure: it iscyclic. Recall that if G is a cyclic group of order n, then G contains exactly φ(d) elementsof each order d dividing n, where φ is the Euler’s phi function. This gives the formula∑

d |nφ(d) = n.

Now, suppose that |F ∗| = q−1 and α is an element of F ∗ of order d. Thus, d | q−1. Considerthe cyclic subgroup < α > generated by α. Every element of < α > has order dividing dand so is a root of the polynomial xd−1. But this polynomial can have at most d roots in Fand so < α > is the set of all roots of xd− 1. In particular, all of the elements of F of orderd must lie in < α >. However, in < α >, there are exactly φ(d) elements of order d. Hence,letting ψ(d) denote the number of elements of F of order d, then ψ(d) = φ(d) or ψ(d) = 0,we have ∑

d | q−1

ψ(d) = |F ∗| = q − 1 =∑

d | q−1

φ(d).

We have the following result.

Theorem 5.6. If F is a finite field of q elements, then F contains exactly φ(d) elementsof order d, for each d | q − 1. In particular, the multiplicative group F ∗ of nonzero elementsof F is cyclic.

Basic Definitions

The set Fnq of all n-tuples whose components belong to Fq is a vector space over Fq of

dimension n. We will write the vector (x1, x2, . . . , xn) in the form x1x2 · · · xn.We can now define the most important and most studied type of code.

Definition 5.7. A code C ⊆ Fnq that is also a subspace of Fn

q is called a linear code.If C has dimension k and minimum distance d(C) = d, then C is an [n, k, d]-code. Whenwe do not care to emphasize the minimum distance d, we use the notation [n, k]-code. Thenumber n, k and d are called the parameters of the linear code.

BASIC DEFINITIONS 29

Note that a linear code C being a subspace of Fnq , must contain the zero codeword

0 = 0 · · · 0. Note also that a q-ary linear [n, k, d]-code is an (n, qk, d)-code.Since a linear code is a vector space, we can describe it by giving a basis. It is customary

to arrange the basis vectors as rows of a matrix.

Definition 5.8. Let C be an [n, k]-code with a basis B = {b1, . . . ,bk}. If

b1 = b11b12 · · · b1n

b2 = b21b22 · · · b2n...

bk = bk1bk2 · · · bkn

then the k × n matrix

G =

b11 b12 · · · b1n

b21 b22 · · · b2n...

bk1 bk2 · · · bkn

whose rows are the codewords in B, is called the generator matrix for C.

If C is a q-ary linear [n, k]-code with generator matrix G, then the codewords in C areprecisely the row space of G. Put another way, C = {x · G |x ∈ Fn

q }. Since performingelementary row operations does not change the row space of a matrix, any matrix that isrow equivalent to G is also a generator matrix for C. On the other hand, interchanging twocolumn of G, gives us a generator matrix for a code which is equivalent to C.

A generator matrix of the form G = (Ik |Mk,n−k) (where Ik is the identity matrix of sizek × k and Mk,n−k is a matrix of size k × (n − k)), is said to be in left standard form. Inview of the previous remarks, every linear code is equivalent to a linear code which has agenerator matrix in standard form. When a k× n generator matrix is in left standard form,it makes both encoding and decoding processes very simple.

Example 5.9. As we will see later, the matrix

G =

1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1

is a generator matrix for the Hamming code H2(3). The Hamming code H2(3) can encodesource words from F4

2 as follows

x·G = (x1, x2, x3, x4)

1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1

= (x1, x2, x3, x4, x2+x3+x4, x1+x3+x4, x1+x2+x4)

Since G is in left standard form. the original source message appears as the first k symbolsof its codeword.

30 5. LINEAR CODES

The Dual of a Linear Code

We have seen several ways of constructing new codes from old ones. Now, we describeanother method (perhaps the most important one for linear codes).

Definition 5.10. Let x = x1x2 · · · xn and y = y1y2 · · · yn be strings in Fnq . The inner

product of x and y, denoted by x · y, is the element of Fq defined by

x · y = x1y1 + x2y2 + · · ·+ xnyn

where the sum and product are taken in Fq.

For any set S ⊆ Fnq , we let S⊥ denote the set of all strings in Fn

q that are orthogonal toevery strings in S. Thus,

S⊥ = {x ∈ Fnq | s · x = 0, ∀ s ∈ S}.

This set is called the orthogonal complement of S.

Lemma 5.11. For any subset S in Fnq , the set S⊥ is a linear code.

From Lemma 5.11, we have the following definition.

Definition 5.12. The orthogonal complement C⊥ of any code C is a linear code calledthe dual code of C.

We may apply some basic linear algebra to get the following results which give some ofthe most basic properties of dual codes.

Proposition 5.13. Let C be a linear [n, k]-code over Fq, with generator matrix G.

(1) C⊥ is the set of all strings that are orthogonal to every row of G. In symbols,

C⊥ = {x ∈ Fnq |x ·Gt = 0}.

(where Gt is the transpose of G)(2) C⊥ is a linear [n, n− k]-code. In other words,

dim(C⊥) = n− dim(C).

(3) We have (C⊥)⊥ = C.

We should remark that the properties of the dual of a linear code over a finite field canbe quite different from those of the dual space of a vector space over the real numbers. Forinstance, if W is a subspace of a finite dimensional real vector space V , then W⊥∩W = {0},since no vector is orthogonal to itself. This is not always the case for linear codes over finitefield, however. In fact, as the next example illustrates, we can even have C⊥ = C.

Example 5.14. For the binary [4, 2]-code, C = {0000, 1100, 0011, 1111}, we have C ⊆C⊥, and since C⊥ is also a [4, 2]-code, we get C = C⊥.

Definition 5.15. A linear code C is said to be self-orthogonal if C ⊆ C⊥. A linear codeC for which C = C⊥ is said to be self-dual.

It is easy to check that a linear code C with generator matrix G is self-orthogonal if andonly if the rows of G are orthogonal to themselves and to each other. Note that a linear[n, k]-code is self-dual if and only if it is self-orthogonal and k = n/2.

By Proposition 5.13 (1), we can describe the dual code as the solutions to certain equa-tions. The system of equations x · Gt = 0 is called the parity check equations for the code

THE DUAL OF A LINEAR CODE 31

C⊥. A string x = x1x2 · · · xn ∈ Fnq is in the dual code C⊥ if and only if its components

x1, . . . , xn satisfy the parity check equations for C⊥.

Definition 5.16. A parity check matrix for a linear q-ary [n, k]-code C is a matrix Pwith the property that

C = {x ∈ Fnq |x · P t = 0}.

Note that, unlike a generator matrix, we make no requirement that the rows of P belinearly independent. Of course, parity check matrices in which the rows are linearly inde-pendent are smaller and therefore more efficient than other parity check matrices.

Any linear code C has a parity check matrix. In particular, a generator matrix for thedual code C⊥ is a parity check matrix for C. We have now two convenient ways to define alinear code C: by giving a generator matrix or by giving a parity check matrix.

One of the advantages of a generator matrix in left standard form is that such a descrip-tion makes it easy to encode and decode source messages. Another advantage is that it iseasy to construct a parity check matrix from a generator matrix that is in left standard form.

Let G = (Ik |B) be a generator matrix for C. Consider P = (−Bt | In−k). Then

GP t = (Ik |B)

(−BIn−k

)= −B + B = O

where O is the k × (n− k) zero matrix. Hence, the rows of P are orthogonal to the rows ofG and since rank(P ) = n − k = dim(C⊥), we deduce that P is a generator matrix for thedual code C⊥. We have the following.

Proposition 5.17. The matrix G = (Ik |B) is a generator matrix for an [n, k]-code Cif and only if the matrix P = (−Bt | In−k) is a parity check matrix for C.

Example 5.18. The code H2(3) in Example 5.9 has parity check matrix

P =

0 1 1 1 1 0 01 0 1 1 0 1 01 1 0 1 0 0 1

.

In this case, the parity check equations are

x2 + x3 + x4 + x5 = 0x1 + x3 + x4 + x6 = 0x1 + x2 + x4 + x7 = 0

.

A matrix of the form A = (M | Ik) is said to be in right standard form. By Proposition5.17, it is easy to go back and forth between generator matrices in left standard form andparity check matrices in right standard form.

The use of parity check matrices that are in right standard form also has some interestingfeatures. For instance, the code H2(3) in Example 5.18 has parity check matrix in rightstandard form. A string x = x1x2 · · · x7 is in H2(3) if and only if

x5 = x2 + x3 + x4

x6 = x1 + x3 + x4

x7 = x1 + x2 + x4

.

This description of H2(3) is very pleasant, for we can easily generate codewords from it byjust picking values for x1, x2, x3 and x4 and substituting, or we can easily determine whetheror not a given string is a codeword.

32 5. LINEAR CODES

The Minimum Distance of a Linear Code

In order to determine the minimum distance for an arbitrary code C of size M , we needto check each of the M(M − 1)/2 distance d(c,d) between codewords. For linear codes, wecan greatly simplify the task.

Definition 5.19. The weight w(x) of a string x ∈ Fnq is defined to be the number of

nonzero positions in x. The weight of a code C, denoted by w(C), is the minimum weightof all nonzero codewords in C.

Lemma 5.20. d(x,y) = w(x− y) for all strings x, y in Fnq .

Since for a linear code C, we have that c,d ∈ C implies c− d ∈ C, by Lemma 5.20, wehave the following.

Proposition 5.21. If C is a linear code, then d(C) = w(C).

It is important to emphasize that Proposition 5.21 holds only for codes which are additivesubgroups of Fn

q .As we have said, a linear code C can be described either by giving a generator matrix

G or a parity check matrix P . Both method have advantages. For instance, it is easier togenerate all codewords in C from G. On the other hand, to use P to generate all codewordsin C requires solving a system of linear equations. However, it is easier to determine whetheror not a given string is in C by using P . Furthermore, there does not seem to be a simple,direct method for determining the minimum weight of a linear code from a generator matrix.However, the following result shows that it is easy to do so from a parity check matrix.

Proposition 5.22. Let P be a parity check matrix for a linear code C. Then the mini-mum distance of C is the smallest integer r for which there are r linearly dependent columnsin P .

Recall the sphere-covering lower bound on Ar(n, d), is given by

rn

V nr (d− 1)

≤ Ar(n, d).

It happens that we can improve upon this bound, in some cases, by considering linear codesand using Proposition 5.22.

Theorem 5.23 (Gilbert-Varshamov bound). There exists a q-ary linear [n, k, d]-code if

qk <qn

V n−1q (d− 2)

.

Thus, if qk is the largest power of q satisfying this inequality, we have Aq(n, d) ≥ qk.

The inequality displayed in Theorem 5.23 is known as the Gilbert-Varshamov inequality.The following example will show that the Gilbert-Varshamov bound is better than the

sphere-covering bound.

Example 5.24. The sphere-covering bound says that A2(5, 3) ≥ 2. On the other hand,the Gilbert-Varshamov bound says that there exists a binary linear (5, 2k, 3)-code if 2k < 32/5and so we may take k = 2, showing that there is a binary linear (5, 4, 3)-code, whenceA2(5, 3) ≥ 4.

CORRECTING ERRORS IN A LINEAR CODE 33

Correcting Errors in a Linear Code

Nearest neighbor decoding involves finding a codeword closest to the received string.There are betters methods for decoding with linear codes.

Let us recall a few simple fact about quotient spaces. If W is a subspace of V overK, the quotient space of V modulo W is defined by V/W = {v + W | v ∈ V }. The setv + W = {v + w |w ∈ W} is called a coset of W . The quotient space is also a vector spaceover K, where λ(v + W ) = λv + W and (v + W ) + (v′ + W ) = (v + v′) + W for all λ ∈ Kand v, v′ ∈ V . Recall that v + W = v′ + W if and only if v − v′ ∈ W .

Now let us suppose that a string x ∈ Fnq is received. nearest neighbor decoding requires

that we decode x as a codeword c for which x − c has smallest weight. But as c rangesover a linear code C, x − c ranges over the coset x + C. Hence, nearest neighbor decodingrequires that we decode x as the codeword c = x− f , where f is a string in x+C of smallestweight.

Let C be a q-ary linear [n, k]-code. The process can be described in terms of so-calledstandard array for C,

0 c1 c2 · · · cqk

f2 f2 + c1 f2 + c2 · · · f2 + cqk

f3 f3 + c1 f3 + c2 · · · f3 + cqk

......

......

...fqn−k fqn−k + c1 fqn−k + c2 · · · fqn−k + cqk

The first row of the arry consists of codewords in C. To form the second row, we choose astring f2 of smallest weight that is not in the first row and add it to each codeword of the firstrow. This forms the coset f2 +C. In general, the i-th row of the array is formed by choosinga string fi of smallest weight that is not yet in the array and adding it to each codewordof the first row, to form the coset fi + C. The elements fi are called the coset leader of thearray.

The following basic fact about standard arrays will be used repeatedly

Lemma 5.25. Let C be a q-ary linear [n, k]-code with standard array A.

(1) Every strings in Fnq appears exactly once in A.

(2) The number of rows of A is qn−k.(3) Two strings x and y in Fn

q lie in the same coset (row) of A if and only if theirdifference x− y is in C.

(4) The coset leader has minimum weight among all strings in its coset.

Example 5.26. A standard array for the binary [4, 2]-code C = {0000, 1011, 0110, 1101}is

0000 1011 0110 11011000 0011 1110 01010100 1111 0010 10010001 1010 0111 1100

We remark that standard arrays are not unique. For instance, in the array of the previousexample, we could have chosen 0010 to be the coset leader for the third row of the array.

We now come to the purpose of standard arrays, which is to implement nearest neighbordecoding.

34 5. LINEAR CODES

Proposition 5.27. Let C be a q-ary [n, k]-code with standard array A. For any stringx in Fn

q , the codeword c that lies at the top of the column containing x is a nearest neighborcodeword to x.

Notice that the difference x− c between the received string x and the nearest neighborinterpretation c at the top of the column containing x, is the coset leader for the cosetcontaining x. This coset leader is called the error string.

Nearest neighbor ties are always decided when using a standard array. Thus, standardarray decoding is complete decoding. Recall that if C is a linear [n, k, d]-code, then it isv-error-correcting, where v = bd−1

2c. Put another way, any errors that result in an error

string of weight v or less are corrected. It follows that the coset leaders of any standardarray for C must include all strings of weight v or less.

One of the advantage of parity check matrices is that they can be used for efficientimplementation of nearest neighbor decoding.

Definition 5.28. Let P be the parity check matrix for a linear code C ⊆ Fnq . The

syndrome S(x) of a string x ∈ Fnq is the product x · P t.

We remark that the syndrome function S has the properties that S(x+y) = S(x)+S(y)and S(λx) = λS(x). Note also that the parity check equation x · P t = 0 is equivalent toS(x) = 0 and so x ∈ C if and only if S(x) = 0.

The main importance of the syndrome comes from the following lemma.

Lemma 5.29. Let C be a linear code. Two strings x and y are in the same coset of anystandard array for C if and only if they have the same syndrome.

Recall that under nearest neighbor decoding, the error string e in a received word xis the coset leader of the coset containing x and that the nearest neighbor codeword isc = x − e. But, the syndrome of x is equal to the syndrome of e and since the syndromesof the coset leaders are all distinct, we can find e simply by comparing the syndrome of xto the syndromes of the coset leaders.

Now, nearest neighbor decoding can be implemented by the following simple algorithm.

(1) Compute the syndrome S(x) of the received string x.(2) Compare it with the list of syndromes of the coset leaders {fi}. If S(x) = S(fi),

then fi is the error string and c = x− fi is a nearest neighbor codeword.

Thus, we need only a list of coset leaders and their syndromes, which we refer to as asyndrome table for C

coset leader syndrome0 0f2 S(f2)f3 S(f3)...

...fqn−k S(fqn−k)

This process is referred to as syndrome decoding.Note that a standard array for a q-ary [n, k]-code C has qn−k rows. If P is a parity check

matrix for C and P has linearly independent rows, then it has size (n− k)×n and thereforeeach syndrome x · P t is an element of Fn−k

q . Since |Fn−kq | = qn−k, we conclude that the set

of syndrome is precisely the entire space Fn−kq .

CORRECTING ERRORS IN A LINEAR CODE 35

Let C be a linear code. We have seen that syndrome decoding will result in correctcodeword if and only if the error made in transmission is one of the coset leader. Assuminga channel wherein the probability that a code symbol is changed to any other code symbolis p, if we let wi be the number of coset leaders that have weight i, then the probability ofcorrect decoding is

P(correct decoding) =n∑

i=1

wipi(1− p)n−i.

In general, the problem of determining the number wi of coset leaders of weight i is quitedifficult. However, in the case of perfect linear [n, k, d]-codes, we can easily determine thesenumbers. By using the result in Exercise 11, we have wi =

(ni

)for 0 ≤ i ≤ bd−1

2c and wi = 0

for i > bd−12c.

An error in the transmission of a codeword c will go undetected if and only if the errorstring is a nonzero codeword. Hence, if Ai denotes the number of codewords of weight i, forthe channel wherein the probability that a code symbol is changed to any other code symbolis p, the probability of an undetected error is

P(undetected error) =n∑

i=1

Aipi(1− p)n−i.

In defining a communications channel, we included the requirements that symbol errorsbe independent of time. While these assumptions make life a lot simpler, they are not alwaysrealistic. This leads us to the concept of a burst error.

Definition 5.30. A burst in Fnq of length b is a string in Fn

q whose nonzero coordinatesconfined to b consecutive positions, the first and last of which must be nonzero.

For example, the string 0001100100 in F102 is a burst of length 5. Note that not all of the

coordinates between the first and last 1s need be nonzero.Note that if a linear code is to correct any burst of length b or less, then no such burst

can be a codeword. The following lemma will be useful.

Lemma 5.31. Let C be a linear [n, k]-code over Fq. If C contains no bursts of length bor less, then k ≤ n− b.

We have seen that the more errors we expect a code to detect or correct, the smaller mustbe the size of the code. This situation for burst error detection is settled by the followingresult.

Proposition 5.32. If a linear [n, k]-code C can detect all burst errors of length b or less,then k ≤ n− b. Furthermore, there is a linear [n, n− b]-code that will detect all burst errorsof length b or less.

Now let us consider burst correction.

Proposition 5.33. If a linear [n, k]-code C can correct all burst errors of length b orless, using nearest neighbor decoding, then k ≤ n− 2b.

If a code can correct any burst error of length b or less, then no two such burst can liein the same coset of a standard array of C. Thus by counting the total number of bursts oflength b or less, we get an lower bound on the number of cosets of C, and hence an upperbound on the dimension of C.

36 5. LINEAR CODES

Proposition 5.34. If a linear [n, k]-code C over Fq can correct all burst errors of lengthb or less, using nearest neighbor decoding, then

k ≤ n− b + 1− logq[(n− b + 1)(q − 1) + 1].

Finally, we introduce a procedure referred to as majority logic decoding. This procedureoften provides a simple method for decoding a linear code.

Definition 5.35. A system of parity check equations for a linear code is said to beorthogonal with respect to the variable xi provided xi appears in every equation of thesystem, but all other variables appear in exactly one equation.

Now suppose a system of parity check equations is orthogonal with respect to xi andsuppose that a single error occurs in transmission. If the error is in the i-th position, thenxi is incorrect, but all other xj are correct. Hence, each equation will be unsatisfied. Onthe other hand, if the error is in any position other then the i-th position, then exactly oneof the equations will be unsatisfied. Thus, the number of unsatisfied equations will tell uswhether or not the i-th position in the received string is correct (assuming a single error).

More generally, suppose we have r parity check equations which is orthogonal with respectto the variable xi. Suppose further that t ≤ r/2 error have occurred in transmission. If oneof the errors is in the i-th position, then at most t− 1 of the equations can be corrected bythe remaining errors, and so at least r− (t− 1) ≥ r/2 + 1 equations will be unsatisfied. Onthe other hand, if the i-th position does not suffer an error, then at most t ≤ r/2 equationswill be unsatisfied. Therefore, the i-th position in the received string is in error if and onlyif the majority of equations is unsatisfied. This is majority logic decoding.

Exercise(1) Let F be an arbitrary field. Prove that if F ∗ is cyclic, then F must be a finite field.(2) Is the code En consisting of all codewords in Fn

2 with even weight a linear code? If so,give a basis, state the dimension and find the minimum weight.

(3) Prove that for a binary linear code C, either all of the codewords have even weight orelse exactly half of the codewords have even weight.

(4) Write out all of the codewords for the ternary code with generator matrix(1 0 1 10 1 1 2

)

and find the parameters of the code. Show that it is perfect.(5) Let C be a binary linear code. Let Cc be the set of complements (c.f. Chapter 4) of

codewords in C. Let 1 = 1 · · · 1 ∈ Fn2 .

(a) Show that if 1 ∈ C then Cc = C.(b) Is C also a linear code?(c) Show that C ∪ Cc is a linear code.

(6) Let C be a linear code and let C (c.f. Chapter 4) be the extended code defined by addingan overall parity check to C.(a) Show that C is also a linear code.(b) If P is the parity check matrix for C, what is the parity check matrix for C?

(7) Prove that a binary self-dual [n, n/2]-code exists for all positive even integers n.(8) Let G be a generator matrix for a q-ary linear code C. Then C is self-dual if and only

if distinct rows of G are orthogonal and each row of G has weight divisible by q.

EXERCISE 37

(9) Show that there is no binary linear [90, 78, 5]-code.(10) Let 1 = 1 · · · 1 ∈ Fn

2 .(a) Show that if C is a binary self-orthogonal code, then all codewords in C have even

weight and 1 ∈ C⊥

(b) Suppose that n is odd. Show that if C is a binary [n, (n − 1)/2]-code, then C⊥ isgenerated by any basis for C together with the string 1

(11) Let C be a linear [n, k, d]-code with standard array A. Show that C is perfect if andonly if the coset leaders of A are precisely the strings of weight bd−1

2c or less.

(12) Let A and B be mutually orthogonal subset of Fnq , that is, a · b = 0 for all a ∈ A and

b ∈ B. Suppose furthermore that |A| = qk and |B| > qn−k−1. Show that A is a linearcode.

CHAPTER 6

Some Linear Codes

Maximum Distance Separable Codes

For fixed n and k, We may ask for the largest minimum distance d among all linear[n, k]-code. This problem has a very simple answer and leads to some fascinating theory.The Singleton bound (Theorem 4.31) or Proposition 5.22 implies the following.

Lemma 6.1. For a linear [n, k]-code, we must have d ≤ n− k + 1.

Definition 6.2. A linear [n, k]-code with minimum distance d = n − k + 1 is called amaximum distance separable code or an MDS code.

It is not hard to see that q-ary MDS codes exist with parameters [n, n, 1], [n, 1, n] and[n, n− 1, 2]. This codes are referred to as the trivial MDS codes. Thus, any nontrivial MDS[n, k]-code must satisfy 2 ≤ k ≤ n− 2.

Proposition 5.22 says that a linear code has minimum distance d if and only if any d− 1columns of a parity check matrix are linearly independent but some d columns are linearlydependent. Thus we have the following

Lemma 6.3. Let C be a linear [n, k]-code with parity check matrix P . Then C is MDS ifand only if any n− k columns of P are linearly independent.

If we choose the parity check matrix P of C with the property that rows of P are linearlyindependent, then P is a genera—or matrix for the dual code C⊥. We can characterize MDScodes in terms of their generator matrices.

Proposition 6.4. Let C be a linear [n, k]-code with generator matrix G. Then C isMDS if and only C⊥ is MDS. Furthermore, we have that C is MDS if and only if any kcolumns of G are linearly independent.

Hear is another beautiful characterization of MDS codes.

Proposition 6.5. Let C be an [n, k]-code with generator matrix G = (Ik |M) in leftstandard form. Then C is an MDS code if and only if every square submatrix of A isnonsingular.

The support of a vector x ∈ Fnq is the set of all coordinate positions where x is nonzero.

Our next result characterizes MDS codes in yet another way.

Proposition 6.6. A linear [n, k, d]-code C is an MDS code if and only if given any dcoordinate positions, there is a codeword whose support is precisely these positions.

Since MDS codes are very special, it is not surprising that the existence of such a codeputs strong constraints on the possible values of the parameters of the code.

Lemma 6.7. There are no nontrivial MDS [n, k]-codes for which 1 ≤ k ≤ n− q.

39

40 6. SOME LINEAR CODES

By applying Lemma 6.7 to the dual code C⊥, we get the dual result.

Corollary 6.8. There are no nontrivial MDS [n, k]-codes for which q ≤ k ≤ n.

Lemma 6.7 and Corollary 6.8 can be restated as the following.

Proposition 6.9. If a nontrivial MDS [n, k]-code exists, then n− q + 1 ≤ k ≤ q − 1.

This Proposition spells apathy for binary MDS code.

Corollary 6.10. The only binary MDS codes are the trivial codes.

One of the most important problems related to MDS codes is the following. Given k andq, find the largest value of n for which there exists a q-ary MDS [n, k]-code. Let us denotethis value of n by m(k, q). According to Proposition 6.9, m(k, q) ≤ k + q − 1.

It is not difficult to construct a family of MDS codes. Let α1, . . . , αu be nonzero elementsfrom a field. The Vandermonde matrix based on these elements is

V (α1, . . . , αu) =

1 1 · · · 1α1 α2 · · · αu

α21 α2

2 · · · α2u

...... · · · ...

αu−11 αu−1

2 · · · αu−14

.

Lemma 6.11. The determinant of the Vandermonde matrix is

det[V (α1, . . . , αu)] =∏

1≤i<j≤u

(αj − αi).

Now let Fq = {0, α1, . . . , αq−1} and consider the following (q − k + 1) × (q + 1) matrixobtained from a Vandermonde matrix by adding two additional columns

H1 =

1 · · · 1 1 0α1 · · · αq−1 0 0α2

1 · · · α2q−1 0 0

... · · · ......

...

αq−k1 · · · αq−k

q−1 0 1

where 1 ≤ k ≤ q. Using Lemma 6.11, any q−k+1 columns of H1 form a nonsingular matrix.Therefore, we have the following.

Proposition 6.12. For 1 ≤ k ≤ q, the matrix H1 is a parity check matrix of a q-aryMDS [q + 1, k]-code.

Notice that we cannot, in general, add additional columns to the matrix H1 and expecta parity check matrix for an MDS code. For instance, consider the matrix

H2 =

1 · · · 1 1 0 0α1 · · · αq−1 0 1 0α2

1 · · · α2q−1 0 0 1

.

Choosing 2 columns among the first q − 1 columns along with the q + 1-th column, we get

1 1 0αi αj 1α2

i α2j 0

.

HAMMING CODES 41

This matrix has determinant α2i −α2

j . In order for any choice of distinct αi and αj, the field

Fq must have the property that if α 6= β then α2 6= β2. This says that the characteristic ofFq must be 2, that is, q must be a power of 2. We have the following.

Proposition 6.13. For q = 2m, the matrix H2 is the parity check matrix of a q-aryMDS [q + 2, q − 1]-code.

Taking account the dual codes, Propositions 6.12 and 6.13 give the following.

Corollary 6.14. For 1 ≤ k ≤ q, there exist q-ary MDS [q+1, k]-code and [q+1, q−k+1]-code. For q = 2m, there exist q-ary MDS [q + 2, q − 1]-code and [q + 2, 3]-code.

We remark that for k ≥ 3 and q odd, we can improve upon this slightly. Thus, for anontrivial q-ary MDS [n, k]-code, with k ≥ 3 and q odd, we have n ≤ q + k − 2. At thispoint, we have gathered enough information to determine the value of m(3, q). In fact, wehave

m(3, q) =

{q + 1 if q is odd,

q + 2 otherwise.

It has been conjectured that, except for the case k = 3 and q even, if there exists a nontrivialMDS [n, k]-code, then m(k, q) = q + 1.

Hamming Codes

The Humming codes Hq(h) are probably the most famous of all error-correcting codes.They are perfect, linear codes that decode in a very elegant manner.

For a given code alphabet Fq, we can construct a parity check matrix P with h rows andwith the maximum possible number of columns such that no two of its columns are linearlydependent but some set of three columns is linearly dependent. First, pick any nonzerocolumn v1 in Fh

q . Then pick any nonzero column v2 in Fhq \ {αv1 |α 6= 0}. We continue to

pick nonzero columns and then discard all nonzero scalar multiples of the chosen column untilall columns have been discarded. The result is a parity check matrix with (qh − 1)/(q − 1)columns and with the properties we want.

The resulting matrix, known as a Hamming matrix of order h, has the following property.

Theorem 6.15. The Hamming matrix of order h is a parity check matrix of a q-arylinear [n, k, 3]-code with parameters

n =qh − 1

q − 1, k = n− h, d = 3.

This code Hq(h) is known as a q-ary Hamming code of order h. It is an exactly single-error-correcting perfect code.

Notice that the choice of columns is not unique and so there are many different Hammingmatrices and Hamming codes with a given set of parameters. However, any Hamming matrixcan be obtained from any other with the same parameters by permuting the columns andmultiplying some columns by nonzero scalars. Hence any two Hamming codes of the samesize are equivalent.

Example 6.16. The binary case is by far the most common, where H2(h) is a binarylinear [2h − 1, n− h, 3]-code. For instance, the parity check matrix for the binary Hamming


code H2(3) is

H2(3) =

0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

Notice that the i-th column of H2(3) is simply the binary representations of i. Now, if asingle error occurs in transmission in the i-th position, resulting in the error vector ei, thesyndrome of the received word is eiH2(3)t, which is just the i-th column of H2(3) written asa row.

The previous example leads to the following.

Proposition 6.17. If a codeword from the binary Hamming code H2(h) suffers a singleerror, resulting in the received string x, then the syndrome S(x) = xH2(h)t is the binaryrepresentation of the position in x of the error.

In the non-binary case, we can do almost as well by choosing the columns of the paritycheck matrix Hq(h) in increasing size as q-ary numbers, but for which the first nonzero entryin each column is 1. For instance, the parity check matrix for H3(3) is

H3(3) =

0 0 0 0 1 1 1 1 1 1 1 1 10 1 1 1 0 0 0 1 1 1 2 2 21 0 1 2 0 1 2 0 1 2 0 1 2

Now, if an error occurs in the i-th position, the error will have the form αei for some nonzeroscalar α. Hence, the syndrome is αeiHq(h)t which is α times the i-th column of Hq(h) writtenas a row. Because the way Hq(h) was constructed, we see that α is the first nonzero entryin the syndrome. Multiplying the syndrome by α−1 will give us the i-th column of Hq(h),telling us the position of the error.

Since the Hamming codes have some special properties, it is not surprising that theirdual codes also have special properties. we will restrict attention to binary codes. The dualof the binary Hamming code H2(h) is called the simplex code S(h). Since the rows of theparity check matrix H2(h) for H2(h) is linearly independent, H2(h) is a generator matrix forS(h).

The simplex code S(h) is a [2h − 1, h]-code. To determine the distance properties of thesimplex codes, we observe that the generator matrix H2(h + 1) can be obtained from twocopies of the matrix H2(h) as follows

H2(h + 1) =

0 · · · 0 1 1 · · · 10

H2(h)... H2(h)0

Now, any codeword c ∈ S(h+1) is a sum of some of the rows of H2(h+1). Hence, c = xαy,where x is a sum of rows of H2(h) and is therefore a codeword in S(h), α is either 0 or 1 andy is equal to x or xc (the complement of x), depending upon whether or not the first rowof H2(h + 1) is included in the sum. These cases are summarized in the following theoremwhich completely describes the simplex codes.

Theorem 6.18. The simplex code S(h) can be described as follows.

(1) S(2) = {000, 011, 101, 110}.

GOLAY CODES 43

(2) For any integer h ≥ 2,

S(h + 1) = {x0x |x ∈ S(h)} ∪ {x1xc |x ∈ S(h)}.Furthermore, d(c,d) = 2h−1, for every pair of distinct codewords c and d in S(h).

Theorem 6.18 explains why the codes S(h) are referred to as simplex codes. The linesegments connecting the codewords form a regular simplex.

Golay Codes

There are total of four Golay codes: two binary codes and two ternary codes. We willdefine these codes by giving generating matrices, as did Marcel Golay in 1949.

The Binary Golay Code G24. The binary Golay code G24 is a [24, 12]-code whosegenerator matrix has the form G = [I12 |A], where

A =

0 1 1 1 1 1 1 1 1 1 1 11 1 1 0 1 1 1 0 0 0 1 01 1 0 1 1 1 0 0 0 1 0 11 0 1 1 1 0 0 0 1 0 1 11 1 1 1 0 0 0 1 0 1 1 01 1 1 0 0 0 1 0 1 1 0 11 1 0 0 0 1 0 1 1 0 1 11 0 0 0 1 0 1 1 0 1 1 11 0 0 1 0 1 1 0 1 1 1 01 0 1 0 1 1 0 1 1 1 0 01 1 0 1 1 0 1 1 1 0 0 01 0 1 1 0 1 1 1 0 0 0 1

We will show that G24 has minimum weight 8.It is straightforward to check that, if r and s are rows of G, then r · s = 0. Hence

G24 ⊆ G⊥24. Since G24 and G⊥24 both have dimension 12, they must be equal. By Proposition5.17, we have that the matrix [At | I12] is a parity check matrix of G24. Since G24 is self-dualand A is a symmetric matrix, we have the following.

Lemma 6.19. Let [I12 |A] be the generator matrix for the Golay code G24 in left standardform. Then the matrix [A | I12] is also a generator matrix for G24.

We take advantage of the two generator matrices G1 = [I12 |A] and G2 = [A | I12] forG24. Suppose that c ∈ G24 and consider c = xy. We have that w(x) ≥ 1 and w(y) ≥ 1.Suppose that w(c) = 4. If w(x) = 1, then c must be a row of G1 and similarly, if w(y) = 1,then c must be a row of G2. None of these has weight 4. These leave only the possibilityw(x) = w(y) = 2, which can be rule out by checking that no sum of any two rows of G1 hasweight 4. Hence, there is no codeword in G24 of weight 4. However, if r and s are rows ofG1, then w(r + s) ≡ w(r) + w(s) (mod 4). It implies that the weight of every codeword inG24 is divisible by 4. We can now state the following

Theorem 6.20. The binary Golay code G24 is a [24, 12, 8]-code.

Since G24 is a [24, 12, 8]-code, syndrome decoding would require that we construct 224/212 =4096 syndromes. On the other hand, using the structure of G24, we can considerably reducethe work involved in decoding.


Since G24 is self-dual, the matrix G1 = [I12 |A] and G2 = [A | I12] are both parity checkmatrices for G24. Suppose that 3 or fewer errors occur in the transmission of a codewordand let x be the received string and e be the error string. Let us write e = fg. we cancompute the syndromes of the received string using both parity check matrices as follows:S1(x) = xGt

1 = f + gA and S2(x) = xGt2 = fA + g. Now let us examine the possibilities.

(1) If w(f) = 0, then e = 0g = 0S2(x) and w(S1(x)) ≥ 5, w(S2(x)) ≤ 3.(2) If w(g) = 0, then e = f0 = S1(x)0 and w(S1(x)) ≤ 3, w(S2(x)) ≥ 5.(3) If w(f) ≥ 1 and w(g) ≥ 1 then w(S1(x)) ≥ 5, w(S2(x)) ≥ 5.

Thus, if either syndrome has weight at most 3, we can easily recover the error string e. Ifw(S1(x)) and w(S2(x)) are both greater than 3, we know that one of the following holds

(a) w(f) = 1 and w(g) = 1 or 2: We have f = ei, where ei is the string with 1 in thei-th position and zeros elsewhere. Consider yu = (x + eu0)Gt

2 = (eig + eu0)Gt2 =

eiA + g + euA. Then w(yu) = 1 or 2 precisely when u = i; otherwise, w(yu) ≥ 4.Thus, we can determine both the error position i and second half g, by looking atthe 12 strings y1, . . . ,y12.

(b) w(f) = 2 and w(g) = 1: We have f = ei + ej for some i 6= j. Consider yu =(x + eu0)Gt

2 = eiA + ejA + g + euA. Then w(yu) ≥ 4 for all u = 1, . . . , 12. In thiscase, we use similar computation using Gt

1. Because g = eλ for some λ, we havezu = (x+ 0eu)G

t1 = f + eλA + euA which has weight w(zu) = 2 if u = λ and weight

w(zu) ≥ 5 for u 6= λ. Thus we may easily pick out f = zλ and the error position λby looking at the 12 strings z1, . . . , z12.

In summary, if at most three errors occur, then we can decode correctly by computingat most the 26 syndromes

xGt1,xGt

2, (x + e10)Gt2, . . . , (x + e120)Gt

2, (x + 0e1)Gt1, . . . , (x + 0e12)G

t1.

The Binary Golay Code G23. The binary Golay code G23 is obtained by puncturingthe code G24 in its last coordinate position. (We remark that puncturing the code G24 inany of its coordinate positions will lead to an equivalent code.) The resulting puncturedcode has length 23 and since the distance between codewords in G24 is greater than 1, allof the punctured codewords are distinct, so G23 has the same size as G24. It is clear thatpuncturing a code cannot increase the minimum distance nor decrease it by more than 1and so d(G23) = 7 or 8. But the parameters [23, 12, 7] satisfy the sphere-packing conditionand so d(G23) = 7.

Theorem 6.21. The binary Golay code G23 is a perfect binary [23, 12, 7]-code.

We will see that the code G23 can also be defined as a cyclic code, and this leads toefficient decoding procedures for G23.

The Ternary Golay Codes. The ternary Golay code G12 is the code with generatormatrix G = [I6 |B] where

B =

0 1 1 1 1 11 0 1 2 2 11 1 0 1 2 21 2 1 0 1 21 2 2 1 0 11 1 2 2 1 0

REED-MULLER CODES 45

As with the binary Golay code, the ternary Golay code G12 is self-dual and it is alsogenerated by the matrix [B | I6]. We can also construct the ternary Golay code G11 bypuncturing G12 in its last coordinate position.

Theorem 6.22. The ternary Golay code G12 is a [12, 6, 6]-code and the ternary Golaycode G11 is a perfect [11, 6, 5]-code.

Many coding theorists established the uniqueness of Golay code. Their results can besummarized by saying that any code (linear or nonlinear) that has the parameters of a Golaycode is equivalent to a Golay code.

We also mention another remarkable result concerning the existence of perfect codes. Aswe have seen, the code consisting of a single codeword, the entire space and the repetitioncodes are all perfect. These are referred to as the trivial perfect codes.

Theorem 6.23. For alphabets of prime power size, all nontrivial perfect codes C havethe parameters of either a Hamming code or a Golay code. Furthermore,

(1) if C has the parameters of a Golay code, then it is equivalent to that Golay code.(2) if C is linear and has the parameters of a Hamming code, then it is equivalent to

that Hamming code. However, there are nonlinear perfect codes with the Hammingparameters.

However, over any alphabet, the only nontrivial t-error-correcting perfect code with t ≥ 3 isthe binary Golay code G23.

Notice that there are some gaps in Theorem 6.23. With regard to alphabets of primepower size, it is not know how many nonequivalent, nonlinear perfect codes there are withHamming parameters. In 1962, Vasil′ev discovered a family of such codes, which we discussin the exercise. More generally, it is still not known whether there are perfect double-error-correcting codes over any alphabet whose size is not a power of a prime. (It is conjecturedthat there are none.) The issue of how many nonequivalent single-error-correcting perfectcodes may exist seems to be extremely difficult.

Reed-Muller Codes

Reed-Muller codes are one of the oldest families of codes and have been widely used inapplications. For each positive integer m and each integer r satisfying 0 ≤ r ≤ m, the r-thorder Reed-Muller code R(r,m) is a binary linear [n, k, d]-code with parameters

n = 2m, k = 1 +

(m

1

)+ · · ·+

(m

r

), d = 2m−r.

At first, we restrict attention to the first order Reed-Muller codes R(m), which are binarylinear [2m,m + 1, 2m−1]-codes.

Definition 6.24. The Reed-Muller codes R(m) are binary codes defined for all integersm ≥ 1 as follows.

(1) R(1) = Z22 = {00, 01, 10, 11}

(2) For m ≥ 1, R(m + 1) = {uu |u ∈ R(m)} ∪ {uuc |u ∈ R(m)}.In words, the codewords in R(m + 1) are formed by juxtaposing each codeword in R(m)with itself and with its complement.


To demonstrate the virtues of an inductive definition, note that R(1) is a linear [2, 2, 1]-code in which every codeword except 00 and 11 has weight 20. We can easily extend thisstatement to the other Reed-Muller code by induction.

Theorem 6.25. For m ≥ 1, the Reed-Muller code R(m) is a linear [2m,m + 1, 2m−1]-codefor which every codeword except 0 and 1 has weight 2m−1.

The inductive definition of R(m) also allows us to define generator matrices for thesecodes. If Rm is a generator matrix for R(m), then a generator matrix for R(m + 1) is

Rm+1 =

(0 · · · 0 1 · · · 1

Rm Rm

)

We can describe the generator matrices Rm directly, both in terms of their rows and theircolumns.

The first row of Rm consists of a block of 2m−1 0s followed by 2m−1 1s

0 · · · 0︸︷︷︸2m−1

1 · · · 1︸︷︷︸2m−1

The next row of Rm consists of alternating blocks of 0s and 1s of length 2m−2

0 · · · 0︸︷︷︸2m−2

1 · · · 1︸︷︷︸2m−2

0 · · · 0︸︷︷︸2m−2

1 · · · 1︸︷︷︸2m−2

In general, the i-th row of Rm consists of alternating blocks of 0s and 1s of length 2m−i. Thelast row of Rm is a row of all 1s.

The columns of Rm can be describe as follows. Excluding the last row of Rm, the columnsof Rm consist of all possible binary strings of length m, which read from the top down asbinary numbers are 0, 1, . . . , 2m − 1, in this order.

It is interesting to compare the characteristics of the Reed-Muller codes with those ofHamming codes. For approximately the same codeword length, the code size of Reed-Mulleris significantly smaller than that of Hamming code. With Hamming codes, we pay for thelarge code size with a minimum distance of only 3. For the Reed-Muller codes, the relativelylarge minimum distance grows along with the code size.

Since R(m) is a [2m,m + 1, 2m−1]-code, it is capable of correcting 2m−2− 1 errors. How-ever, a standard array for R(m) has 22m−m−1 rows. Thus, decoding using a syndrome tableis time consuming, even with small value of m.

We will describe a special type of majority logic decoding, call Reed decoding, that appliesto Reed-Muller codes. Let Rm be the generator matrix for R(m) defined above and denotethe rows of Rm by r1, . . . , rm+1. Then for a codeword c = c1 · · · cn, we have

c = α1r1 + · · ·αmrm + αm+1rm+1,

for some scalars αi ∈ F2. Fixing i, we would like to find strings xi ∈ Fn2 such that xi ·rj = δij.

Suppose that rj = rj1 · · · rjn. We have that (eu + ev) · rj = rju + rjv. Therefore, if eu + ev

is to be our candidate for xi, then we must have rju = rjv if j 6= i and riu 6= riv. Thus, foreach row i, we want a pair of columns that are identical except in their i-th row. We refer tosuch pair of columns as good pair for the i-th row. Remark that the last row of Rm consistsof all 1s, so the last row has no good pair. On the other hand, the last row will never giveany trouble in finding good pairs for the other rows and so, for now, we can simply ignorethe last row.


In fact, let R′m be the matrix obtained from Rm by removing the last row. The columns

of R′m consist of the binary representations of the numbers 0, 1, . . . , 2m − 1, in this order.

Hence if rju = rjv for j 6= i and riu = 0, riv = 1, then u and v must have distance 2m−i apart.In particular, there are exactly 2m−1 good pairs for each row.

Now imagine that a codeword

c = α1r1 + · · ·αmrm + αm+1rm+1

is sent. Using the 2m−1 good pair for row i, we get 2m−1 expressions for αi (for i ≤ m).Specifically, if c = c1 · · · cn and columns u and v is a good pair foe row i, then αi =(eu + ev) · c = cu + cv. Hence, each of these 2m−1 expressions for αi involves differentpositions in the codeword c. Thus, if no more than 2m−2 − 1 errors occur, then at most2m−2 − 1 of the coordinates cj are incorrect and so at most 2m−2 − 1 of expressions for αi

are incorrect. This means that at least 2m−1 − (2m−2 − 1) = 2m−2 + 1 of these expressionsgive the correct value of αi. It follows that we can get the correct value of αi by computingthe 2m−1 expressions for αi and taking the majority value.

The final step is to obtain the coefficient αm+1. If at most 2m−2− 1 errors have occurredin receiving x, then the error string e = x − c has weight at most 2m−2 − 1. Lettingd = α1r1 + · · ·αmrm, we have x − d = αm+1rm+1 + e = αm+11 + e. There are twopossibilities. If αm+1 = 0, then e = x − d and if αm+1 = 1, then e = (x − c)c. Thus, ifw(x− d) ≤ 2m−2 − 1, we decode αm+1 as 0 and if w((x− d)c) ≤ 2m−2 − 1, then we decodeαm+1 as 1.

Example 6.26. Suppose that a codeword from R(3) is sent and the received string isx = 11011100. Consider the generator matrix

R3 =

0 0 0 0 1 1 1 10 0 1 1 0 0 1 10 1 0 1 0 1 0 11 1 1 1 1 1 1 1

.

For row 1 we have 23−1 = 4 so the good pairs are (1, 5), (2, 6), (3, 7), (4, 8).For row 2 we have 23−2 = 2 so the good pairs are (1, 3), (2, 4), (5, 7), (6, 8).For row 3 we have 23−3 = 1 so the good pairs are (1, 2), (3, 4), (5, 6), (7, 8).

Thus, if

c = c1 · · · c8 = α1r1 + α2r2 + α3r3 + α4r4

the expressions for α1 are

α1 = c1 + c5 = c1 + c5 = c1 + c5 = c1 + c5.

The majority logic decision is α1 = 0 and similarly, α2 = 1 and α3 = 0. Thus,

x− (α1r1 + α2r2 + α3r3) = 11011100− r2 = 11101111.

Since the complement of this string has weight 1 ≤ 23−2 − 1, we decode α4 as 1. It followsthat the codeword sent is 11001100.

Now, we introduce the higher order Reed-Muller code R(r,m). In order to introducethese codes, we begin with a discussion of Boolean functions and Boolean polynomials.

Definition 6.27. A Boolean function of m variables x1, . . . , xm is a function f(x1, . . . , xm)from Fm

2 to F2.


(1) A Boolean monomial in m variables x1, . . . , xm of degree s is an expression of theform

g(x1, . . . , xm) = xi1xi2 · · · xis , 1 ≤ i1 < i2 < · · · < is ≤ m.

(2) A Boolean polynomial in m variables x1, . . . , xm is a linear combination of Booleanmonomials in these variables with coefficients in F2. The degree of a Boolean poly-nomial g is the largest of the degrees of the Boolean monomials that form g.

The set Bm of all Boolean functions of m variables forms a vector space of size 22mover

F2. The set Bm of all Boolean polynomials in m variables is a vector space over F2, as is theset Bm,r of all Boolean polynomials in m variables of degree at most r.

Since there are(

ms

)distinct Boolean monomials of degree s in m variables, the total

number of distinct Boolean monomials is

1 +

(m

1

)+ · · ·+

(m

m

)= 2m

and the total number of distinct Boolean polynomials in m variables is 22m. This also

happens to be the total number of Boolean functions in m variables, which is no mere acoincidence.

Proposition 6.28. For every Boolean function f(x1, . . . , xm) in Bm, there is a uniqueBoolean polynomial g(x1, . . . , xm) in Bm for which f(α1, . . . , αm) = g(α1, . . . , αm) for all(α1, . . . , αm) ∈ Fm

2 .

If we always agree to list the variables in the same order, we obtain a one-to-one corre-spondence between Boolean function f ∈ Bm and binary strings af of length 2m.

Example 6.29. Suppose that f ∈ B3 with

x1 x2 x3 f0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 01 0 1 01 1 0 11 1 1 1

We obtain the binary string af = 01100011. Using a convenient abuse of notation andwriting a binary string in place of the corresponding polynomial, we have

01100011 = 0110 + x1(0011− 0110)

= 0110 + x1(0101)

= 01 + x2(10− 01) + x1(01 + x2(01− 10)

= 01 + x2(11) + x1(01 + x2(00))

= x3 + x2(1) + x1(x3 + x2(0))

= x3 + x2 + x1x3

Definition 6.30. Let 0 ≤ r ≤ m. The r-th order Reed-Muller code R(r,m) is the set ofall binary strings ag of length n = 2m associated with the Boolean polynomials g ∈ Bm,r.


Example 6.31. (1) The 0-th order Reed-Muller code R(0,m) consists of the bi-nary strings associated with the constant polynomials 0 and 1, that is, R(0,m) =Rep(2m). On the other extreme, the m-th order Reed-Muller code R(m,m) consistsof all binary strings of length 2m.

(2) The first order Reed-Muller code of length n = 22 is the set of all binary stringsassociated with the Boolean polynomials p of the form α0 + α1x1 + α2x2, whereαi = 0 or 1. Thus, we can list the codewords in R(1, 2) as follows

Polynomial Codeword0 0000x1 0011x2 0101

x1 + x2 01101 1111

1 + x1 11001 + x2 1010

1 + x1 + x2 1001

The Reed-Muller code can be obtained using the u(u+ v)-construction. Recall that if C1

is an (n,M1, d1)-code and C2 is an (n,M2, d2)-code, then the u(u + v)-construction yields acode C1 ⊕ C2 by

C1 ⊕ C2 = {c(c + d) | c ∈ C1,d ∈ C2}which is a (2n,M1M2, d)-code with d = min{2d1, d2}.

Suppose that 0 < r < m and consider a codeword af ∈ R(r,m), where f ∈ Bm,r. Wecan factor the variable x1 from those terms in which it appears and write f in the form

f(x1, . . . , xm) = x1g(x2, . . . , xm) + h(x2, . . . , xm),

where g ∈ Bm−1,r−1 and h ∈ Bm−1,r. Let ag ∈ R(r − 1,m − 1) and ah ∈ R(r,m − 1) bethe binary strings corresponding to the polynomials g and h, respectively. The string corre-sponding to x1g(x2, . . . , xm) is 0ag and if think of h as a Boolean polynomial in m variablesx1, . . . , xm, then the string corresponding to h is ahah. Hence, the string corresponding to fis

af = 0ag + ahah = ah(ah + ag).

Theorem 6.32. For the Reed-Muller codes R(r,m), we have

(1) R(0,m) = Rep(2m),(2) R(m, m) = Fn

2 , where n = 2m,(3) for 0 < r < m,

R(r,m) = R(r,m− 1)⊕R(r − 1,m− 1),

where ⊕ denotes the u(u + v)-construction.

In particular, R(r,m) has minimum distance 2m−r.

Corollary 6.33. For r < m, R(r,m) contains codewords of even weight only.

Let af ∈ R(r,m) and ag ∈ R(m− r − 1,m) with f ∈ Bm,r and g ∈ Bm,m−r−1. Observethat af · ag ≡ w(afg) (mod 2). Since deg(fg) ≤ deg(f) + deg(g) ≤ m − 1, we have afg ∈R(m− 1,m). According to Corollary 6.33, w(afg) is even, which implies that af · ag = 0.


Theorem 6.34. For 0 < r < m− 1, we have

R(r,m)⊥ = R(m− r − 1,m).

Exercise(1) Let C be a q-ary MDS [n, k]-code and let Aw be the number of codewords in C of weight

w. Show that

Ad = (q − 1)

(n

n− k + 1

).

(2) We remarked in Theorem 6.23 that if C is a linear code with the same parameters asthe Hamming code H2(h), then C is equivalent to H2(h). We now construct a binarynonlinear code V(h) with the same parameters as the Hamming code H2(h).

Let n = 2h − 1 and let f : H2(h) → Z2 be a nonlinear function with f(0) = 0. Letπ : Zn

2 → Z2 be the function defined by

π(x) =

{0 w(x) ≡ 0 (mod 2),

1 otherwise.

Now let

V(h) = {x(x + c)(π(x) + f(c)) |x ∈ Zn, c ∈ H2(h)}Show that V(h) is a binary [2h+1 − 1, 2h+1 − h − 2, 3]-code. Show also that V(h) isnonlinear.

(3) In this exercise, we define and discuss the Nordstrom-Robinson code. This code has theinteresting property that it has strictly larger minimum distance than any linear codewith the same length and size.(a) Let G = [I12 |A] be the generator matrix of G24. Show that, by permuting columns

and using elementary row operations, the matrix G can be brought to the form

G′ =

1 0 0 0 0 0 0 10 1 0 0 0 0 0 10 0 1 0 0 0 0 10 0 0 1 0 0 0 1 *0 0 0 0 1 0 0 10 0 0 0 0 1 0 10 0 0 0 0 0 1 10 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0 *0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

where the asterisks represent some values.(b) Let C be the code generated by the generator matrix G′. Show that there are

8× 25 = 256 codewords in C whose first eight coordinates are one of

10000001, 01000001, 00100001, 00010001,

00001001, 00000101, 00000011, 00000000.

EXERCISE 51

(c) The Nordstrom-Robinson code N is the code whose codewords are obtained from the256 codewords obtained above by deleting the first eight coordinate positions. Showthat N is a (16, 256, 6)-code.

(d) Show that there is no linear (16, 256, 6)-code.(4) Assuming the Reed-Muller codeR(4) is used, decode the received word 0111011011100010.(5) Find Boolean polynomial corresponding to the binary strings 1101111000011001.(6) Show that for any Boolean function f(x1, . . . , xm−1), the function xm + f(x1, . . . , xm−1)

takes on the values 0 and 1 equally often.(7) Find an expression for a generator matrix for R(r,m) in terms of generator matrices for

R(r,m− 1) and R(r − 1,m− 1)

CHAPTER 7

Cyclic Codes

Basic Definitions

Definition 7.1. The right cyclic shift of a string x = x1 · · · xn−1xn is the string xnx1 · · · xn−1

obtained by shifting each element to the right one position, wrapping the last element aroundto the first position.

A linear code C is cyclic if whenever c ∈ C then the right cyclic shift of c is also in C.

As an immediate consequence of this definition, if C is a cyclic code and c ∈ C, then thestring obtained by shifting the elements of c any number of positions with wrapping is alsoa codeword in C.

Example 7.2. The binary code D = {0000, 1001, 0110, 1111} is not cyclic, since shift-ing 1001 gives 1100, which is not in D. However, D is equivalent to a cyclic code C ={0000, 1010, 0101, 1111}.

To get better understanding of cyclic codes, it pays to think of strings as polynomials.In particular, to each string c = c0c1 · · · cn−1, we associate a polynomial c0 + c1x + c2x

2 +· · · + cn−1x

n−1. Note that addition and scalar multiplication of strings corresponds to theanalogous operations for polynomials. Thus, we may think of a linear code C of length n overFq as a subspace of the space Pn(Fq) of polynomials of degree less than n with coefficientsin Fq.

We can express the process of performing a right cyclic shift in terms of operations onpolynomials. Notice that multiplying a codeword p(x) = c0 + c1x + · · · cn−1x

n−1 by x givesxp(x) = c0x+c1x

2 + · · · cn−1xn which has some resemblance to a right cyclic shift and indeed

would be a right cyclic shift if we replace xn by x0 = 1.Let Rn(Fq) = Fq[x]/(xn − 1). Recall that Rn(Fq) is the set of all polynomials over Fq

of degree less than n. Addition in Rn(Fq) is the usual addition of polynomial and multipli-cation is ordinary multiplication of polynomials, then dividing by xn − 1 and keep only theremainder. Note that taking the product modulo xn − 1 is very easy, since we simply takethe ordinary product and then replace xn by 1. As an example, in R4(F2),

(x3 + x2 + 1)(x2 + 1) = x5 + x4 + x3 + 1

= 1 · x + 1 + x3 + 1

= x3 + x

It is also important to note that, since xn − 1 is not irreducible in Fq[x], the product ofnonzero polynomials may equal the zero polynomial.

We can now think of a linear code C over Fq as a subspace of the vector space Rn(Fq). Inaddition, if p(x) ∈ C, then the right cyclic shift of p(x) is the polynomial xp(x). In general,applying k right cyclic shifts is equivalent to multiplying p(x) by xk.

53

54 7. CYCLIC CODES

Lemma 7.3. A linear code C ⊆ Rn(Fq) is cyclic if and only if p(x) ∈ C implies thatf(x)p(x) ∈ C for any f(x) ∈ Rn(Fq).

In the language of abstract algebra, the set Rn(Fq), together with the operations ofaddition, scalar multiplication and multiplication modulo xn − 1 is an algebra over Fq. Anysubset C of Rn(Fq) that is a vector subspace and also has the property described in Lemma7.3 is called an ideal of Rn(Fq). That is, the cyclic codes in Rn(Fq) are precisely the idealsof Rn(Fq).

An ideal C of Rn(Fq) is called a principle ideal if there exists a polynomial g(x) ∈ Csuch that

C = 〈g(x)〉 = {f(x)g(x) | f(x) ∈ Rn(Fq)}.The following theorem, in the language of abstract algebra, says that the ring Rn(Fq) is

a principle ideal domain.

Theorem 7.4. Let C be a cyclic code in Rn(Fq). Then there is a unique polynomial g(x)in C that is both monic and has the smallest degree among all nonzero polynomials in C.Moreover, C = 〈g(x)〉.

The unique polynomial mentioned in Theorem 7.4 is called the generator polynomial ofC.

Example 7.5. For the binary cyclic code C = {0, 1 + x2, x + x2, 1 + x}, we have

0 = 0 · (1 + x) 1 + x2 = x2 · (1 + x)

x + x2 = x · (1 + x) 1 + x = 1 · (1 + x)

and so C = 〈1 + x〉. Since 1 + x has minimum degree in C, it is the generator polynomialfor C. Notice also that

0 = 0 · (1 + x) 1 + x2 = 1 · (1 + x2)

x + x2 = x2 · (1 + x2) 1 + x = x · (1 + x2)

and so C is also generated by the polynomial 1 + x2. However, since 1 + x2 does not haveminimum degree in C, it is not the generator polynomial for C.

It is very easy to characterize those polynomials that are generator polynomials.

Proposition 7.6. A monic polynomial p(x) ∈ Rn(Fq) is the generator polynomial of acyclic code in Rn(Fq) if and only if it divides xn − 1.

Proposition 7.6 is very important, for it tells us that there is precisely one cyclic code inRn(Fq) for each factor of xn − 1. Thus, we can find all cyclic codes in Rn(Fq) by factoringxn − 1.

We have seen that if g(x) is the generator polynomial for a cyclic code C, then C consistsof all polynomial multiples of g(x). We can easily obtain a basis for C from g(x).

Theorem 7.7. Let g(x) = g0 + g1x+ · · ·+ gkxk be the generator polynomial of a nonzero

cyclic code C in Rn(Fq).

(1) C has basisB = {g(x), xg(x), . . . , xn−k−1g(x)}

(2) C has dimension n− deg(g(x)). In fact,

C = {r(x)g(x) | deg(r(x)) < n− k}.

BASIC DEFINITIONS 55

(3) C has generator matrix

G =

g0 g1 · · · gk 0 0 · · · 00 g0 g1 · · · gk 0 · · · 0

0 0 g0 g1 · · · gk 0...

......

. . . . . . . . . 00 0 · · · 0 g0 g1 · · · gk

whose n− k rows each consist of a right cyclic shift of the row above.

We have seen that the generator polynomial g(x) of a cyclic code C ⊆ Rn(Fq) dividesxn − 1. Hence, we can write xn − 1 = h(x)g(x), where h(x) ∈ Rn(Fq). The polynomial h(x)which has degree equal to the dimension of C, is referred to as the check polynomial of C.Since the generator polynomial is unique, so is the check polynomial. The following theoremshows why the check polynomial is important.

Theorem 7.8. Let h(x) = h0 + h1x + · · ·+ hn−kxn−k be the check polynomial of a cyclic

code C ∈ Rn(Fq).

(1) The code C can be described by

C = {p(x) ∈ Rn(Fq) | p(x)h(x) = 0}.(2) The parity check matrix for C is given by

P =

hn−k · · · h1 h0 0 0 · · · 00 hn−k · · · h1 h0 0 · · · 0

0 0 hn−k · · · h1 h0 0...

......

. . . . . . . . . 00 0 · · · 0 hn−k · · · h1 h0

(3) The dual code C⊥ is the cyclic code of dimension k with generator polynomial

h⊥(x) = h−1

0 (h0xn−k + h1x

n−k−1 + · · ·+ hn−k).

Example 7.9. Because x9 − 1 factors over F1 into irreducible factors as follows

x9 − 1 = (x− 1)(x2 + x + 1)(x6 + x3 + 1),

the code C = 〈x6 +x3 +1〉 has check polynomial h(x) = (x−1)(x2 +x+1) = x3 +1. Hence,C has parity check matrix

P =

1 0 0 1 0 0 0 0 00 1 0 0 1 0 0 0 00 0 1 0 0 1 0 0 00 0 0 1 0 0 1 0 00 0 0 0 1 0 0 1 00 0 0 0 0 1 0 0 1

56 7. CYCLIC CODES

The Zeros of a Cyclic Code

If we have convenient access to the roots of the polynomial xn − 1, then it is possibleto characterize the cyclic code in Rn(Fq) in a slightly different way than through generatorpolynomial. Let xn − 1 =

∏i mi(x) be the factorization of xn − 1 into monic irreducible

factors over Fq. If α is a root of mi(x) in some extension field of Fq, then mi(x) is theminimal polynomial of α over Fq. Thus, for any f(x) ∈ Fq[x], we have f(α) = 0 if and only

if f(x) = h(x)mi(x) for some h(x) ∈ Fq[x]. In particular, if consider f(x) ∈ Rn(Fq), then

f(α) = 0 if and only if f(x) = 〈mi(x)〉.Now, since xn − 1 has no multiple roots, if g(x) |xn − 1 then g(x) = m1(x) · · ·mt(x) is a

product of distinct irreducible factors of xn− 1. If αi is a root of mi(x) for i = 1, . . . , t, then

〈g(x)〉 = {f(x) ∈ Rn(Fq) | f(αi) = 0, i = 1, . . . , t}.Definition 7.10. The roots of the generator polynomial of a cyclic code are called the

zeros of the code.

The representation of a cyclic code by its zeros can be used to show that some Hammingcode are cyclic codes. The binary case is the easiest, so let us consider it first.

Let n = 2r − 1. By Theorem 5.6, the set of all n distinct roots of xn − 1 over F2 is amultiplicative cyclic group F∗2r which is the set of nonzero elements of the field of 2r elementscontaining F2. An element that generates F∗2r is called a primitive field element of F2r .

Suppose that β is a primitive field element of F2r . Consider the code C = {f(x) ∈Rn(F2) | f(β) = 0}. As mentioned above C is the binary cyclic code with generator poly-nomial g(x) which is an irreducible factor of xn − 1. Since the degree of F2r over F2 isr, we have that deg(g(x)) = r. Hence, C is an [n, n − r]-code. If there is a polynomialf(x) = xi + xj ∈ Rn(F2) such that f(β) = 0 where 0 ≤ i < j < n, then it implies thatβj−i = 1, which contradicted to the assumption that β is primitive. Hence, the minimumdistance of C is at least 3, which implies that it must be equal to 3 (by the sphere-packingcondition). Therefore, C is a linear code with the same parameters as the Hamming codeH2(r) and hence equivalent to H2(r) by Theorem 6.23.

Example 7.11. Consider the Hamming code H2(4). In this case, n = 24 − 1 = 15 andthe splitting field for xn − 1 is F16. Consider the irreducible polynomial g(x) = x4 + x + 1over F2 and suppose β is a root of g(x). Then we have that β15 = 1, but β3 6= 1 and β5 6= 1.Hence β is a primitive field element of F16. We conclude that H2(4) is equivalent to thecyclic code generated in R15(F2) by g(x) = x4 + x + 1.

In general, not every q-ary Hamming code is equivalent to a cyclic code. For instance,we can write down all ternary cyclic codes of length 4 and find out that none of these codeshas minimum distance 3. We see that the ternary Hamming code H3(2) is not equivalent toa cyclic code. However, we have the following result.

Proposition 7.12. Let n = (qr − 1)/(q − 1) and assume that gcd([r, q − 1] = 1. Thenthe q-ary Hamming code Hq(r) is equivalent to a cyclic code.

The Idempotent Generator of a Cyclic Code

We have seen that a complete list of all cyclic codes in Rn(Fq) can be obtained from afactorization of xn − 1 into monic irreducible factors over Fq. However, factoring xn − 1 is

ENCODING AND DECODING WITH A CYCLIC CODE 57

not an easy task in general. In this section, we explore another approach to describe cycliccodes, involving a different type of generating polynomial than the generator polynomial.

Definition 7.13. A polynomial e(x) ∈ Rn(Fq) is said to be idempotent in Rn(Fq) ife(x)2 = e(x).

Example 7.14. The polynomial x3 + x5 + x6 is an idempotent in R7(F2) because (x3 +x5 + x6)2 = x6 + x10 + x12 ≡ x6 + x3 + x5 (mod x7 − 1).

Let C be a cyclic code in Rn(Fq) with generator polynomial g(x) and check polynomialh(x). Since xn − 1 has no multiple roots, g(x) and h(x) are relatively prime and so thereexist polynomials a(x) and b(x) for which a(x)g(x)+b(x)h(x) = 1. Let e(x) = a(x)g(x) ∈ C.Then for any p(x) ∈ C, we have that e(x)p(x) = p(x), since p(x)h(x) = 0. In other word,e(x) is the unique identity in C, and hence an idempotent in Rn(Fq). e(x) also generates C,since any polynomial is a multiple of e(x).

Proposition 7.15. e(x) is the unique polynomial in C that is both idempotent andgenerates C.

We will refer to the polynomial e(x) as the generating idempotent of C. we can also com-pute the generator polynomial g(x) from the gerarting idempotent e(x). In fact, gcd[e(x), xn−1] = gcd[a(x)g(x), h(x)g(x)], because xn − 1 = g(x)h(x) and e(x) ≡ a(x)g(x) (mod xn − 1).But a(x) and h(x) are relatively prime and so gcd[e(x), xn − 1] = g(x). From this, wehave the following interesting relationship between the generator polynomial and generatingidempotent.

Proposition 7.16. Let C be a cyclic code in Rn(Fq) with generator polynomial g(x) andgenerating idempotent e(x).

(1) If α is a root of xn − 1, then g(α) = 0 if and only if e(α) = 0.(2) Suppose that f(x) is an idempotent in Rn(Fq) with the property that if α is a root

of xn − 1, then g(α) = 0 if and only if f(α) = 0. Then f(x) = e(x).

Let C be a cyclic code with generating idempotent e(x) and check polynomial h(x). Since

h(x)(1− e(x)) ≡ h(x)(1− a(x)g(x)) ≡ h(x) (mod xn − 1),

we see that 1 − e(x) is the identity in 〈h(x)〉. Hence the cyclic code 〈h(x)〉 has generatingidempotent 1 − e(x). Similarly, we have the following result that relates the generatingidempotents of a cyclic code and its dual.

Theorem 7.17. Let C be a cyclic code in Rn(Fq) with generating idempotent e(x). Thenthe dual code C⊥ has generating idempotent 1− e(xn−1) ∈ Rn(Fq).

Encoding and Decoding with a Cyclic Code

There are two rather straightforward ways to encode message strings using a cyclic code.One is systematic and the other one is nonsystematic.

Let C = 〈g(x)〉 be a q-ary cyclic [n, n− r]-code, where deg(g(x)) = r. Thus, C is capableof encoding q-ary messages of length n− r. We consider the nonsystematic method first.

Given a source string a0a1 · · · an−r−1, we form the message polynomial

a(x) = a0 + a1x + · · ·+ an−r−1xn−r−1.

This polynomial is encoded as the product c(x) = a(x)g(x).

58 7. CYCLIC CODES

To obtain a systematic encoder, we form the message polynomial

b(x) = a0xn−1 + a1x

n−2 + · · ·+ an−r−1xr.

Notice that b(x) has no terms of degree less than r. Next, we divide b(x) by g(x), b(x) =h(x)g(x) + r(x), where deg(r(x)) < r and send the codeword c(x) = b(x)− r(x).

Definition 7.18. A q-ary (n, qk)-code is called systematic if there are k positions i1, i2, . . . , ikwith the property that, by restricting the codewords to these positions, we get all of the qk

possible q-ary strings of length k. The set {i1, i2, . . . , ik} is called an information set and thecodeword symbols in these positions are called information symbols.

Since b(x) and r(x) above have no terms of the same degree, this encoder is systematic.In fact, reading the terms from highest degree to lowest degree, we see that the first n − rpositions are information symbols.

Example 7.19. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =x3 +x+1. Consider the message 1001. Using the systematic encoder, we have b(x) = x6 +x3

and since

x6 + x3 = (x3 + x)(x3 + x + 1) + (x2 + x),

the encoded message is c(x) = (x6 + x3)− (x2 + x) = x6 + x3 + x2 + x.

Since a cyclic code is a linear code, we can decode using the polynomial form of syndromedecoding. Let C be a cyclic code. If c(x) ∈ C is the codeword sent and u(x) is the receivedpolynomial, then e(x) = u(x)− c(x) is the error polynomial. The weight of a polynomial isthe number of nonzero coefficients.

Definition 7.20. Let C = 〈g(x)〉 be a cyclic [n, n− r]-code with generator polynomialg(x). The syndrome polynomial of a polynomial u(x), denoted by syn(u(x)), is the remainderupon dividing u(x) by g(x), that is,

u(x) = h(x)g(x) + syn(u(x)), deg(syn(u(x))) < deg(g(x)).

This definition of syndrome polynomial coincides with the definition of syndrome givenfor a parity check matrix of a linear code. As expected, a received polynomial u(x) is a codeword if and only if its syndrome polynomial is a zero polynomial. Also, two polynomials ifand only if they lie in the same coset of C. Thus, the polynomial form of syndrome decodingis analogous to the vector form.

Example 7.21. The binary cyclic [7, 4]-code generated by the polynomial g(x) = x3 +x + 1 is single-error-correcting. The coset leaders and corresponding syndrome polynomialsare

coset leader syndrome0 01 1x xx2 x2

x3 x + 1x4 x2 + xx5 x2 + x + 1x6 x2 + 1

ENCODING AND DECODING WITH A CYCLIC CODE 59

If, for example, the polynomial u(x) = x6 + x + 1 is received, we compute its syndromepolynomial

x6 + x + 1 = (x3 + x + 1)(x3 + x + 1) + (x2 + x).

Since syn(u(x)) = x2 + x, its coset leader is e(x) = x4, and so we decode u(x) as

c(x) = u(x)− e(x) = (x6 + x + 1)− x4 = x6 + x4 + x + 1.

The main practice difficulty with syndrome decoding is that coset leader syndrome de-coding might become quite long. However, we can take advantage of the fact that the codein question is cyclic as follows.

Let us denote the polynomial obtained from p(x) by performing s cyclic shifts by p(s)(x).Suppose that u(x) = c(x) + e(x), where c(x) ∈ C is the codeword sent and u(x) is thereceived polynomial. There must exist some s for which the cyclic shift e(s)(x) of the errorpolynomial e(x) has nonzero coefficient of xn−1. Since u(s)(x) = c(s)(x) + e(s)(x), we havesyn(u(s)(x)) = syn(e(s)(x)). Hence, we only need those rows of the coset leader syndrometable that contain coset leaders of degree n−1. Let us illustrate this process by an example.

Example 7.22. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =x3 + x + 1. We only need one row table: syn(x6) = x2 + 1. Suppose that we receiveu(x) = x6 + x + 1. Since syn(u(x)) = x2 + x is not in the table, we shift u(x) and computeits syndrome, which is syn(u(1)(x)) = x2 + x + 1. Again this is not in the table, so we shiftagain and computing the syndrome gives syn(u(2)(x)) = x2 +1. Since the syndrome is in thetable, we deduce that e(2)(x) = x6, and hence e(x) = x4.

Let us take a closer look at the relationship between the unknown error polynomial andthe known syndrome polynomial. Suppose that u(x) = c(x)+ e(x), where c(x) ∈ C = 〈g(x)〉is the codeword sent and u(x) is the received polynomial. Since u(x) = h(x)g(x)+syn(u(x)),we have that e(x)−syn(u(x)) ∈ C. Suppose that C is a v-error-correcting-code, and supposethat at most v errors have occurred in the transmission. Suppose further that, syn(u(x))has weight at most v. Then e(x) − syn(u(x)) is a codeword of weight at least 2v which isless than the minimum weight of C and so must be the zero codeword. Hence, we have thefollowing.

Lemma 7.23. Let C be a v-error-correcting cyclic code, and suppose that at most v errorshave occurred in the transmission. If the syndrome of the received polynomial u(x) has weightat most v, then the error polynomial is equal to syn(u(x)).

Of course, we may not be lucky enough to encounter syndrome polynomial of weight atmost v. However, if the syndrome polynomial of a cyclic shift of u(x) has weight at mostv, then it is almost as easy to obtain the error polynomial from this syndrome polynomial.Suppose that the syndrome polynomial of the cyclic shift u(s)(x) of u(x) has weight at mostv. Then since u(s)(x) = c(s)(x) + e(s)(x), Lemma 7.23 gives e(s)(x) = syn(u(s)(x)) and so theerror polynomial e(x) can be easily recovered from syn(u(s)(x)) by shifting additional n− splaces. This strategy is known as error trapping.

Example 7.24. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =x3 + x + 1. Suppose that we receive u(x) = x6 + x + 1. We have syn(u(1)(x)) = x2 + x + 1,syn(u(2)(x)) = x2 + 1 and syn(u(3)(x)) = 1, and hence e(3)(x) = 1. This implies thate(x) = x4, just as before.

60 7. CYCLIC CODES

Let C be a v-error-correcting cyclic [n, n− r]-code. If v or fewer errors occur, and if theyare confined to r consecutive positions, including wrap around, then there must exist some sfor which the cyclic shift u(s)(x) of the received polynomial u(x) has its errors confined to ther coefficients of x0, x1, . . . , xr−1. Thus, u(s)(x) = c(s)(x)+e(s)(x) with deg(e(s)(x)) < r. Hence,since c(s)(x) is a codeword, we have syn(u(s)(x)) = e(s)(x). This says that error trappingcan correct any v errors that happen to fall within r consecutive positions, including wraparound.

The result above does not says that any burst of length r or less can be corrected. Infact, this is not possible, because, according to Proposition ??, we know that if a cyclic[n, n − r]-code C can correct all bust errors of length b or less, then we must have 2b ≤ r.However, if b(x) ∈ C is a burst of length r or less, then by performing cyclic shifts of b(x),we obtain a codeword in C with degree less than r (the degree of the generator polynomial),which is impossible. Hence, we have the following.

Proposition 7.25. A cyclic [n, n − r]-code C contains no bursts of length r or less.Hence, it can detect any burst error of length r or less.

Exercise(1) Let C1 = 〈g1〉 and C2 = 〈g2〉 be two q-ary cyclic codes of length n.

(a) Show that C1 ⊆ C2 if and only if g2(x) | g1(x).(b) Show that C1 ∩ C2 is also a cyclic code.(c) Let C1 + C2 = {c1 + c2 | c1 ∈ C1, c2 ∈ C2}. Show that C1 + C2 is also a linear code.

(2) Let En be the set of even weight strings in Fn2 .

(a) Show that En is a cyclic code and En = 〈x− 1〉.(b) Let C = 〈g(x)〉 be a binary cyclic code of length n. Show that w(c) is even for all

c ∈ C if and only if x− 1 | g(x).(3) Let g(x) be the generator matrix of a binary cyclic [n, n − r]-code C. Suppose that C

contains at least one codeword of odd weight.(a) Show that the set E of all codewords in C of even weight is a cyclic code. What is

the generator polynomial of E?(b) Prove that

∑n−1i=0 xi ∈ C.

(4) Let C1 and C2 be two cyclic codes inRn(Fq) with generating idempotent e1(x) and e2(x),respectively.(a) Show that C1 ⊆ C2 if and only if e1(x)e2(x) = e1(x) in Rn(Fq).(b) Show that C1 ∩ C2 has generating idempotent e1(x)e2(x) in Rn(Fq).(c) Show that C1 + C2 has generating idempotent e1(x) + e2(x)− e1(x)e2(x) in Rn(Fq).

(5) Show that any set of k consecutive positions in a cyclic [n, k]-code is an information set.(6) Let g(x) be the generator matrix of a binary cyclic [n, n− r]-code C.

(a) Let si(x) be the remainder obtained by dividing xr+i by g(x). Show that xr+i−si(x),for i = 0, 1, . . . , n− r − 1, is a basis for C.

(b) Find the generator matrix for C, by using the basis in part (6a), and find a corre-sponding parity check matrix H.

(c) Suppose that u(x) = u0 + u1x + · · ·un−1xn−1 is a received polynomial. How is the

syndrome polynomial syn(u(x)) related to the syndrome (u0, u1, . . . , un−1)Ht?

(7) Let C be a cyclic [n, k]-code with generator polynomial g(x) and let ci = ci,1ci,2 · · · ci,n

be codewords in C, for i = 1, . . . , s. We may interleave these codewords by juxtaposingthe first position in each codeword, followed by the second positions in each codewords,

EXERCISE 61

and so on, to obtain the string

c1,1c2,1 · · · cs,1 c1,2c2,2 · · · cs,2 · · · c1,nc2,n · · · cs,n

Let us denote by C(s) the set of all strings forms in this way from all possible choices ofs codewords in C (taken in all possible orders).(a) Show that C(s) is a cyclic [ns, ks]-code with generator polynomial g(xs).(b) Suppose that C is capable of correcting burst of length b or less. Show that C(s) is

capable of correcting burst errors of length bs or less.

an introduction to codesmath.ntnu.edu.tw/~li/note/code.pdf · · 2005-10-21an introduction to...

Documents