principles of digital communication_solutions

8/18/2019 Principles of Digital Communication_Solutions

http://slidepdf.com/reader/full/principles-of-digital-communicationsolutions 1/110

SOLUTION MANUAL for

PRINCIPLES OF DIGITAL

COMMUNICATION

by ROBERT G. GALLAGER

Cambridge University Press, 2008

Emre Koksal, Tengo Saengudomlert, Shan-Yuan Ho, Manish Bhardwaj,Ashish Khisti, Etty Lee, and Emmanuel Abbe were the Teaching Assis-tants in Subject 6.450, Principles of Digital Communication, at M.I.T. inthe years over which this book was written. They all made important con-tributions to the evolution of many of the solutions in this manual. Thisfinal edited version, however, has been prepared by the author, who takesresponsibility for any residual errors.

1



Principles of Digital CommunicationSolutions to Exercises – Chapter 2

Exercise 2.1: What follows is one way of thinking about the problem. It is definitely not theonly way – the important point in this question is to realize that we abstract complex physicalphenomenon using simplified models and the choice of the model is governed by our objective.

Speech encoder/decoder pairs try to preserve not only recognizability of words but also a hostof speaker dependent information like pitch and intonation. If we do not care about any speakerspecific characteristics of speech, the problem is essentially that of coding the English text thatthe speaker is producing. Hence, to estimate the rate (in bits per second) that a good source

encoder would need, we must estimate two quantities. The first is the rate, in English lettersper second, that normal speakers achieve. The second is the average number of bits needed perEnglish letter.

Rough estimates of the former, which can be made by reading this solution with a stop watch,are 15-20 English letters per second. A simple code for 26 letters and a space would need log2 27or 4.75 bits per English letter. By employing more sophisticated models of dependencies in theEnglish language, researchers estimate that one could probably do with as few as 1.34 bits perletter. Hence we could envision source coders that achieve ≈20 bits per second (assuming 15letters per second and 1.34 bits per letter) – which is considerably lower than what the bestspeech encoders today achieve!

Exercise 2.2:

(a) V + W is a random variable, so its expectation, by definition, is

E[V + W ] =v∈V

w∈W

(v + w) pV W (v, w)

=v∈V

w∈W

v pV W (v, w) +v∈V

w∈W

w pV W (v, w)

=v∈V

v

w∈W pV W (v, w) +

w∈W

wv∈V

pV W (v, w)

= v∈V v pV (v) + w∈W w pW (w)

= E[V ] + E[W ].

2



(b) Once again, working from first principles,

E[V · W ] =v∈V

w∈W

(v · w) pV W (v, w)

=

v∈V w∈W

(v · w) pV (v) pW (w) (Using independence)

=v∈V

v pV (v)

w∈W w pW (w)

= E[V ] · E[W ].

(c) To discover a case where E[V · W ] = E[V ] ·E[W ], first try the simplest kind of example whereV and W are binary with the joint pmf

pV W (0, 1) = pV W (1, 0) = 1/2; pV W (0, 0) = pV W (1, 1) = 0.

Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 andhence E[V ]

·E[W ] = 1/4.

The second case requires some experimentation. One approach is to choose a joint distributionsuch that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf,

pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3.

Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?).Hence, E[V · W ] = E[V ] · E[W ].

(d)

σ2V +W = E[(V + W )2] − (E[V + W ])2

= E

[V

2

] +E

[W

2

] +E

[2V · W ] − (E

[V ] +E

[W ])

2

= E[V 2] + E[W 2] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ]

= E[V 2] − E[V ]2 + E[W 2] − E[W ]2

= σ2V + σ2

W .

Exercise 2.3:

(a) This is a frequently useful form for the expectation of a non-negative integer rv. ExpressingPr(N ≥ n) as pn + pn+1 + · · · ,

∞n=1

Pr(N ≥ n) = p1 + p2 + p3 + p4 · · ·+ p2 + p3 + p4 · · ·

+ p3 + p4 · · ·+ p4 · · ·

· · ·

= p1 + 2 p2 + 3 p3 + · · · =∞

n=1

npn = E[N ].

3



(b) Viewing the integral as a limiting sum, ∞0

Pr(X ≥ a) da = lim→0

n>0

Pr(X ≥ n) = E

X

= E[X ].

(c)

Area = a Pr(X ≥ a)

a

Pr(X ≥ a)

a Pr(X ≥ a) ≤ ∞0 Pr(X ≥ x) dx = E[X ].

(d) As the hint says, let X = (Y − E[Y ])2. Then for any a ≥ 0,

Pr{(Y − E[Y ])2 ≥ a) ≤ E[(Y − E[Y ])2

a = σ

2Y

a .

Letting b =√

a completes the derivation.

Exercise 2.4:

(a) Since X 1 and X 2 are iid, symmetry implies that Pr(X 1 > X 2) = Pr(X 2 > X 1). These twoevents are mutually exclusive and the event X 1 = X 2 has 0 probability. Thus Pr(X 1 > X 2) andPr(X 1 < X 2) sum to 1, so must each be 1/2. Thus Pr(X 1 ≥ X 2) = Pr(X 2 ≥ X 1) = 1/2.

(b) Invoking the symmetry among X 1, X 2 and X 3, we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually

exclusive and their probabilities must add up to 1. Therefore each event occurs with probability1/3.

(c) The event {N > n} is the same as the event {X 1 is the minimum among the n iidrandom variables X 1, X 2, · · · , X n}. By extending the argument in part (b), we see thatPr(X 1 is the smallest of X 1, . . . , X n) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1

n−1 forn ≥ 2.

(d) Since N is a non-negative integer random variable (taking on values from 2 to ∞), we canuse Exercise 2.3(a) as follows:

E[N ] =∞

n=1

Pr {N ≥ n}

= Pr {N ≥ 1} +∞

n=2

Pr {N ≥ n}

= 1 +∞

n=2

1

n − 1

= 1 +∞

n=1

1

n.

4



Since the series ∞

n=11n diverges, we conclude that E[N ] = ∞.

(e) Since the alphabet has a finite number of letters, 1 Pr(X 1 = X 2) is no longer 0 and dependson the particular probability distribution. Thus, although, Pr(X 1 ≥ X 2) = Pr(X 2 ≥ X 1) bysymmetry, neither can be found without knowing the distribution.

Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric

value. If X 1 = amin, then no subsequent rv X 2, X 3, . . . can have a smaller value, so N = ∞ inthis case. Since the event X 1 = amin occurs with positive probability, E[N ] = ∞.

Exercise 2.5: First observe that for any n ≥ 1, the number of binary n-tuples with an evennumber of 1’s is equal to the number of binary n-tuples with an odd number of 1’s. To seethis, note that for each sequence (x1, x2, · · · , xn−1, 0), there corresponds a unique sequence(x1, x2, · · · , xn−1, 1). One of these two sequences has an odd number of 1’s while the other hasan even number of 1’s.

Since all 2n n-sequences are equally likely, the probability of an even number of ones then equalsthe probability of an odd number, i.e.,

Pr(X 1 ⊕ X 2 ⊕ · · · ⊕ X n = 0) = Pr(X 1 ⊕ X 2 ⊕ · · · ⊕ X n = 1) = 12

.

(a) Since Z = X 1 ⊕ · · · ⊕ X n, this shows that the binary random variable Z takes on the values0 and 1 with equal probability. Now, conditional on X 1 = 0, Z will be 0 if X 2 ⊕ · · · ⊕ X n = 0,which, from above (using X 2 . . . , X n in place of X 1, . . . , X n) has probability 1/2. Similarly,conditional on X 1 = 1, Z again takes on the values 0 and 1 with probability 1/2 each. Thus Z is independent of X 1.

(b) Given that X 1 = x1, X 2 = x2, · · · , X n−1 = xn−1, Z is equal to 0 if X n = x1 ⊕x2 ⊕· · ·⊕xn−1.Since X n is independent of X 1, . . . , X n−1, and is equiprobably 0 or 1, it follows that Z = 0under this conditioning with probability 1/2 and Z = 1 with probability 1/2. Since this is true

for all choices of x1, . . . , xn−1 it follows that Z is independent of X 1, . . . , X n−1.(c) Z, X 1, X 2, · · · , X n are NOT statistically independent, and in fact, Z is uniquely determinedby X 1, . . . , X n.

(d) Z is NOT statistically independent of X 1 if Pr(X i = 1) = p = 1/2. We demonstrate this forn = 2. The conditional PMF pZ |X 1(z|x1) for x1 = 0 and 1 is

pZ |X 1(0|0) = Pr(X 2 = 0) = 1 − p,

pZ |X 1(0|1) = Pr(X 2 = 1) = p.

Since the conditional PMF for Z depends on X 1, Z and X 1 are statistically dependent.

The purpose of this question is to show that in a group of n+1 random variables, pairwise

statistical independence (part a) does not imply statistical independence of all the randomvariables (part c) (even with the statistical independence of groups of n random variables inpart (b).

1The same results can be obtained with some extra work for a countably infinite discrete alphabet.

5



Exercise 2.6:

(a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then thatcode must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x1, x2, . . . , xmsuch that,

C(x1)C(x2) . . . C(xn) = C(x1)C(x2) . . . C(xm).

Then one of the following must hold:

• C(xn) = C(xm)

• C(xn) is a suffix of C(xm)

• C(xm) is a suffix of C(xn).

In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free.In the first case, xn must equal xm because of the suffix freedom. Simply delete that final letterfrom each sequence and repeat the argument. Since the sequences are distinct, the final lettermust differ after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached.

Hence, suffix-free codes are uniquely decodable.

(b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codewordis reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-freecode (with codeword lengths {1, 2, 2}) but not a prefix-free code.

A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder.To illustrate a rather extreme case, consider the following output produced by the encoder,

0111111111 . . .

Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the twopossible source sequences,

acccccccc...

and

bcccccccc . . . ,

till the end of the string is reached. Hence, in this case the decoder might have to wait for anarbitrarily long time before decoding.

(c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffixfree. Without loss of generality, set C 1 = 0. Then a prefix-free code cannot use either thecodewords 00 and 01 for C 2 or C 3, and thus must use 10 and 11, which is not suffix free.

Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then,u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after thebinary point, which is not possible. Hence, the algorithm fails.

6



Exercise 2.8: The Kraft inequality for D-ary alphabets can be stated thus:

Every prefix-free code for an alphabet X with codeword lengths {(x), x ∈ X} satisfies,x∈X

D−(x) ≤ 1.

Conversely, if the inequality is satisfied, then a prefix-free code with lengths {(x)} exists.

The key is to identify the critical steps that make the proof work in the binary case and torecognize that they can always be extended to the D-ary case:

• If we associate the number 0.y1y2 . . . yn = n

i=1 yiD−i with the codeword y1y2 . . . yn, then allcodewords that begin with y1, y2, . . . , yn lie in the interval [0.y1y2 . . . yn, 0.y1y2 . . . yn +D−n).

• Since the code is prefix-free, no two intervals can overlap.

• Since the intervals are disjoint and all contained in [0, 1), the sum of all the interval lengthsis less than or equal to one.

Hence, every prefix free code for a D-ary alphabet must satisfy the generalized Kraft’s inequal-ity stated above. We can similarly prove the construction, by essentially mimicking the codeconstruction algorithm outlined at the end of section 2.3.3.

Exercise 2.9:

(a) Assume, as usual, that p j > 0 for each j . From Eqs. (2.8) and (2.9)

H[X ] − L =M

j=1

p j log 2−lj

p j≤

M j=1

p j

2−lj

p j− 1

log e = 0.

As is evident from Figure 2.7, the inequality is strict unless 2−lj = p j for each j. Thus if H[X ] = L, it follows that 2−lj = p j for each j.

(b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) =1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first ordernode 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2.

1

0

a

c

b

bb

bccbcc

ba

ca

abac

aa

aa → 00ab → 011ac → 010ba → 110bb → 1111bc

→1110

ca → 100cb → 1011cc → 1010

Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the secondorder node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In thesame way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed withhigher order nodes in the same way, but what is the principle behind this?

7



In general, when an infinite binary tree is used to represent an unending sequence of lettersfrom an iid source where each letter j has probability p j and length j = 2− j, we see that eachnode corresponding to an initial sequence of letters x1, . . . , xn has a probability

i 2−xi equal

to the product of the individual letter probabilities and an order equal to

i xi . Thus eachnode labeled by a subsequence of letters has a probability 2− where is the order of that node.

The other nodes (those unlabeled in the example above) have a probability equal to the sum of the immediately following labeled nodes. This probability is again 2− for an th order node,which can be established by induction if one wishes to be formal.

Exercise 2.10:

(a) Since the code satisfies the Kraft inequality with equality, the corresponding tree is full. Nowlet max be the length of the longest codeword. Instead of asking how large max can be fora given M , we ask how small M can be for a given max. It is easy to see, by drawing a fewtrees, that for max = 2, M ≥ 3; similarly, for max = 3, M ≥ 4. This leads us to hypothesizethat for arbitrary max, we have M ≥ lmax + 1, achieved by the tree in the figure below:

a b c d

e

To verify this, note that the sibling of each intermediate node (including the root) on the pathto a given codeword of length max must either be a single codeword or a subtree with multiplecodewords. Thus M ≥ lmax + 1 and consequently, lmax ≤ M − 1. Since there are a finitenumber of distinct binary sequences of length M − 1 or less, there are finitely many full codetrees with M code words (in the next part, we derive how many such codes there are).

(b) It is obvious that there is only one full tree with 2 leaf nodes. Thus S(2)=1.

Next we use induction to calculate the number of full binary code trees with M > 2. For givenM , first consider the number of trees in which j codewords start with 0 and M − j start with

1 (1 ≤ j ≤ M − 1). Since the set of codewords starting with 0 must form a full code treewith the initial 0 omitted, there are S ( j) such trees. As seen above, S (2) = 1, and triviallyS (1) = 1. Similarly there are S (M − j) different trees starting with 1. Thus for given j, thereare S ( j)S (M − j) code trees of M codewords for which j codewords start with 0, It follows that

S (M ) =M −1 j=1

S ( j)S (M − j).

One can check the formula for small values of M , getting S (3) = 2, S (4) = 5, etc.

Exercise 2.11:

(a) For n = 2, M

j=1

2−lj

2

=

M

j1=1

2−lj1

M

j2=1

2−lj2

=

M j1=1

M j2=1

2−(lj1+lj2).

The same approach works for arbitrary n.

(b) Each source n-tuple xn = (a j1a j2, . . . , a jn), is encoded into a concatenationC (a j1)C (a j2) . . . C (a jn) of binary digits of aggregate length l(xn) = l j1 + l j2 + · · · , +l jn . Since

8



there is one n-tuple xn for each choice of a j1, a j2, . . . , a jn, the result of part (a) can be rewrittenas

M

j=1

2−lj

n

=xn

2−l(xn). (1)

(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate lengthi,

M j=1

2−lj

n

=nlmax

i=1

Ai2−i.

This uses the fact that since each codeword has length at most lmax, each concatenation haslength at most nlmax.

(d) From unique decodability, each of these concatenations must be different, so there are atmost 2i concatenations of aggregate length i, i.e., Ai

≤ 2i. Thus, since the above sum contains

at most nlmax terms, M

j=1

2−lj

n

≤ nlmax. (2)

(e) Note that

[nlmax]1/n = exp

ln(nlmax)

n

−→ exp(0) = 1

as n → ∞. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied.

Exercise 2.12:

(a) and (b) One Huffman code is a → 0, b → 10, c → 110, d → 111 where c and d are the lessprobable letters, and another is a → 00, b → 01, c → 10, d → 11.

(c) The code a → 00, b → 11, c → 10, d → 01 is also optimal but cannot arise from theHuffman procedure since the two least likely codewords must be siblings.

Exercise 2.13:

(a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3 + p4 ≥ p2,we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also

combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yieldingcodewords of lengths 1,2,3 and 3 respectively.

(b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2 p2. Thus

p1 = p3 + p4 ≤ 2 p2 which implies p1 + p3 + p4 ≤ 4 p2.

Since p2 = 1 − p1 − p3 − p4, the latter equation implies that 1 − p2 ≤ 4 p2, or p2 ≥ 0.2. From theformer equation, then, p1 ≤ 2 p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by alsochoosing p3 = p4 = 0.2. Thus pmax = 0.4.

9



(c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2 p1. Thus1 − 2 p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3and p3 = p4 = 1/6.

(d) The argument in part (b) remains the same if we assume p1 ≤ p3+ p4 rather than p1 = p3+ p4,i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4.

Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps thecodeword for symbol 2) will have length 1.

(e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than(2, 2, 2, 2).

(f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2 p1. Combined with p1 ≥ p2, we againhave p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1

and p2 in the second step of the Huffman algorithm, so each codeword will have length 2.

(g) It turns out that pmax

is still 2/5. To see this, first note that if p1

= 2/5, p2

= p3

= 1/5and all other symbols have an aggregate probability of 1/5, then the Huffman code constructioncombines the least likely symbols until they are tied together into a supersymbol of probability1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) thenguarantees that the code will have one codeword of length 1.

Exercise 2.14:

(a) First, we prove that for any equiprobable alphabet, the codeword lengths can differ by

at most 1. In the Huffman code tree for an equiprobable alphabet, the parent of the branchcorresponding to any symbol j has probability at least 2/M , which is greater than the probabilityof any other symbol i. Thus by Lemma 2.5.1, l j −1 ≤ li for every j, i; thus the codeword lengthsdiffer by at most 1.

We have shown that the lengths must be k or k − 1 for some integer k, and it remains to showthat k = log M . If M is a power of 2, then all codewords have length log M and we are done.Thus assume that M is not a power of 2 and let nk be the number of codewords with length k ,where 1 ≤ nk ≤ M − 1). The remaining M − n codewords then have length k − 1. Since thecode is optimal, the code tree is full, and

nk2−k + (M − nk)2−(k−1) = 1.

Thus2M − nk = 2k. (3)

Since nk > 0, it follows that 2M > 2k and thus M > 2k−1. Since nk < M , it follows thatM < 2k. Thus k = log M .

(b) From (3), nk = 2M − 2k. The expected codeword length is then

L = (k − 1)(2k − M ) + k(2M − 2k)

M = k + 1 − 2k

M .

10



(c) Letting y = M/2k,

L − log M = [k − log M ] + 1 − 2k

M = − log y + 1 − 1

y.

Note that 12 ≤ y < 1. By setting the derivative with respect to y to zero, we find that L − log M

attains its maximum at y = ln 2 with a resulting value of 1−log(e ln 2) or approximately 0.08607.Hence, for any M , we have,

L − log M ≤ 1 − log(e ln2)

L ≤ H (X ) + 0.08607.

which is a much tighter bound than L < H (X ) + 1.

Exercise 2.15:

(a) This is the same as Lemma 2.5.1.

(b) Since p1 < pM

−1 + pM , we see that p1 < pM

−1, where pM

−1 is the probability of the node

in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus,by part (a), l1 ≥ lM −1 = lM − 1.

(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (byLemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1,then l1 = lM , and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of theimmediate parent of M , showing that l1 ≥ lM − 1.

(d) and (e) We have shown that the shortest and longest length differ by at most 1, with somenumber m ≥ 1 lengths equal to l1 and the remaining M −m lengths equal to l1 +1. It follows that2l1+1 = 2m + (M −m) = M + m. From this is follows that l1 = log2(M ) and m = 2l1+1 − M .

Exercise 2.16:(a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. Forthe next largest full tree, convert one of the leaves into an intermediate node and grow 3 leavesfrom that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n(for n an integer).

(b) It is clear that for optimality, all the unused leaves in the tree must have the same lengthas the longest codeword. For M even, combine the 2 lowest probabilities into a node at thefirst step, then combine the 3 lowest probability nodes for all the rest of the steps until the rootnode. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes ateach step.

(c) If {

a,b,c,d,e,f }

have symbol probabilities {

0.3, 0.2, 0.2, 0.1, 0.1, 0.1}

respectively, then theternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}.

Exercise 2.17:

(a) Note that H(X ) =

i − pi log pi and

H(X ) = −( pM −1 + pM )log( pM −1 + pM ) −M −2i=1

pi log pi.

11



Thus

H(X ) − H(X ) = ( pM −1+ pM )log( pM −1+ pM ) − pM −1 log pM −1 − pM log pM

= − pM −1 log( pM −1

pM −1+ pM ) − pM log(

pM

pM −1+ pM )

= ( pM −1 + pM )H

(γ ) where γ =

pM

pM + pM −1 or

pM

−1

pM + pM −1 .

(b) Each step of the Huffman algorithm reduces the number of symbols by 1 unil only 1 node(the root) is left. Thus there are M − 1 intermediate nodes, counting the root.

(c) The reduced code and the original code are the same except for a unit increase in length

whenever letters M −1 or M occur. Thus L = L

+ q 1 as noted. In the same way, letting L(2)

be

the expected code length after the second Huffman step, L = L

(2)+ q 2, so L = L

(2)+ q 1 + q 2.

Iterating, and using the fact that the length of the root node is 0 ( i.e., L(M −1)

= 0), we haveL =

M −1i=1 q i.

(d) Part (a) shows that H(X ) = H(X ) + q 1H(α1). Letting X (2) be the further reduced ensemble

after the second Huffman step, and using part (a) on X , we see thatH

(X ) = H

(X (2)

)+q 2H

(α2).Thus H(X ) = H(X (2)) + q 1H(α1) + q 2H(α2). Iterating down to the root, and noting thatH(X (M −1)) = 0,

H(X ) =M −1i=1

q iH(αi).

(e) Combining parts (c) and (d),

L − H(X ) =

i

q i[1 − H(αi)].

H(αi) ≤ 1 with equality if and only if αi = 1/2. Also, since each p j > 0, it follows that eachq i > 0. Thus L − H(X ) = 0 if and only if each αi = 1/2.

(f) The above arguments apply to any full code tree using an ordering of the intermediate nodesthat goes from leaves to root, i.e., an ordering where vi is a prefix of v j only if i ≥ j.

Exercise 2.18:

(a) Applying the Huffman coding algorithm to the code with M +1 symbols with pM +1 = 0, wecombine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities p1, . . . , pM . The Huffman code for this reduced set of symbols is simply the code for the originalset of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the

reduced code modified by a unit length increase in the codeword for symbol M . Thus L = L+ pM

where L is the expected length for the code with M symbols.

(b) All n of the zero probability symbols are combined together in the Huffman algorithm, andthe reduced code from this combination is then the same as the code with M + 1 symbols inpart (a). Thus L = L

+ pM again.

12



Exercise 2.19:

(a) The entropies H(X ), H(Y ), and H(XY ) can be expressed as

H(XY ) = −

x∈X ,y∈Y pXY (x, y)log pXY (x, y)

H(X ) = − x∈X ,y∈Y

pXY (x, y)log pX (x)

H(Y ) = −

x∈X ,y∈Y pXY (x, y)log pY (y).

It is assumed that all symbol pairs x, y of zero probability have been removed from this sum,and thus all x (y) for which pX (x) = 0 ( pY (y) = 0) are consequently removed. Combining theseequations,

H(XY ) − H(X ) − H(Y ) =

x∈X ,y∈Y pXY (x, y)log

pX (x) pY (y)

pXY (x, y) .

(b) Using the standard inequality log x ≤ (x − 1)log e,

H(XY ) − H(X ) − H(Y ) ≤

x∈X ,y∈Y pXY (x, y)

pX (x) pY (y)

pXY (x, y) − 1

log e = 0.

Thus H(X, Y ) ≤ H(X ) +H(Y ). Note that this inequality is satisfied with equality if and only if X and Y are independent.

(c) For n symbols, X 1, . . . , X n, let Y be the ‘super-symbol’ X 2, . . . , X n. Then using (b),

H(X 1, . . . , X n) = H(X 1, Y ) ≤ H(X 1) + H(Y ) = H(X 1) + H(X 2, . . . , X n).

Iterating this gives the desired result.

An alternate approach generalizes part (b) in the following way:

H (X 1, . . . , X n) −

i

H (X i) =

x1,...,xn

p(x1, . . . , xn)log p(x1), . . . , ...p(xn)

p(x1, . . . , xn)

≤ 0,

where we have used log x ≤ (x − 1)log e again.

Exercise 2.20

(a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus

H(Y ) = − p1 log( p1) − (1 − p1)log(1 − p1) = H b( p1).

(b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0.

(c) Given Y =0, X =1 has probability 0, so X has M −1 possible choices with non-zero probability.The maximum entropy for an alphabet of size M −1 terms is log(M −1), so H(X |Y =0) ≤ log(M −1). This upper bound is met with equality if Pr(X = j | X =1) = 1

M −1 for all j = 1. Since

13



Pr(X = j|X =1) = p j/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 =· · · = pM . Combining this with part (b),

H(X | Y ) = p1H(X | Y =1) ≤ (1− p1)log(M − 1).

(d) Note that

H(XY ) = H(Y ) + H(X |Y ) ≤ H b( p1) + (1− p1)log(M −1)

and this is met with equality for p2 = · · · , pM . There are now two reasonable approaches. Oneis to note that H(XY ) can also be expressed as H(X ) + H(Y |X ). Since Y is uniquely specifiedby X , H(Y |X ) = 0,

H(X ) = H(XY ) ≤ H b( p1) + (1 − p1)log(M − 1), (4)

with equality when p2 = p3 = · · · = pM . The other approach is to observe that H(X ) ≤ H(XY ),which again leads (4), but this does not immediately imply that equality is met for p2 = · · · = pM .Equation (4) is the Fano bound of information theory; it is useful when p1 is very close to 1 andplays a key role in the noisy channel coding theorem.

(e) The same bound applies to each symbol by replacing p1 by p j for any j, 1 ≤ j ≤ M . Thusit also applies to pmax.

Exercise 2.21:

(a) The codewords of the Huffman code for a set of probabilities {0.5, 0.4, 0.1} have lengths{1, 2, 2} and thus Lmin = 1.5.

(b) The set of probabilities for (X 1, X 2) is {0.25, 0.2, 0.2 0.16, 0.05, 0.05, 0.04, 0.04, 0.01}. Atedious construction of the Huffman code leads to lengths {2, 2, 2, 3, 5, 5, 5, 6, 6}. The expectedlength is 2.68, so the expected length per source letter is Lmin,2 = 1.34.

(c) Note that Lmin,2 ≤ Lmin in this example. This result is true in general, since one set of

codewords for X 1, X 2 is the concatenation of the codewords for X 1 with those for X 2. The codeof minimal expected length for X 1, X 2) must be at least this small.

Exercise 2.22: One way to generate a source code for (X 1, X 2, X 3 is to concatenate a Huffmancode for (X 1, X 2) with a Huffman code of X 3. The expected length of the resulting code for(X 1, X 2, X 3) is Lmin,2 + Lmin. The expected length per source letter of this code is 2

3 Lmin,2 +13 Lmin. The expected length per source letter of the optimal code for (X 1, X 2, X 3) can be noworse, so

Lmin,3 ≤ 2

3Lmin,2 +

1

3Lmin.

Exercise 2.23: (Run Length Coding)(a) Let C and C be the codes mapping source symbols to intermediate integers and intermediateintegers to outbit bits respectively. If C is uniquely decodable, then the intermediate integerscan be decoded from the received bit stream, and if C is also uniquely decodable, the originalsource bits can be decoded.

The lengths specified for C satisfy Kraft and thus this code can be made prefix-free and thusuniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3bit binary representation is prefix-free.

14



C is a variable to fixed length code, mapping {b,ab,a2b, . . . , a7b, a8} to the integers 0 to 8. Thisset of strings forms a full prefix-free set, and thus any binary string can be parsed into these‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decodedinto the ‘codewords’ which are then concatenated into the original binary sequence. In general, avariable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed

if that set of ‘codewords’ is full and prefix-free.(b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition,each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoderemits 4 bits with probability 1; it emits an extra bit with probability (0 .9)8; it emits yet a furtherbit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b,

E(Y ) = 4 + (.09)8 + (0.9)16 + · · · = 4 · (0.9)8

1 − (0.9)8 = 4.756.

(c) To count the number of b’s out of the source, let Bi = 1 if the ith source letter is b andBi = 0 otherwise. Then E(Bi) = 0.1 and σ2

Bi = 0.09. Let AB = (1/n)

ni=1 Bi be the number of

b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9)·

10−21, which isclose to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 withstill higher probability.

(d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 sourceletters is with high probability close to 4.756 · 1019(1 + ) for small . Thus,

L ≈ (0.1) · 4 · (0.9)8

1 − (0.9)8 = 0.4756.

Renewal theory provides a more convincing way to fully justify this solution.

Note that the achieved L is impressive considering that the entropy of the source is

−(0.9)log(0.9)

−(0.1)log(0.1) = 0.469 bits/source symbol.

Exercise 2.25:

(a) Note that W takes on values − log(2/3) with probability 2/3 and − log(1/3) with probability1/3. Thus E(W ) = log 3 − 2

3 . Note that E(W ) = H(X ). The fluctuation of W around its meansis −1

3 with probability 23 and 2

3 with probability 13 . Thus σ2

W = 29 .

(b) The bound on the probability of the typical set, as derived using the Chebyshev inequality,and stated in (2.2) is:

Pr(Xn ∈ T ) ≥ 1 − σ2W

n2 = 1 − 1

45.

(c) To count the number of a’s out of the source, let the rv Y i(X i) be 1 for X i = a and 0 forX i = b. The Y i(X i)’s are iid with mean Y = 2/3 and σ 2

Y = 2/9. N a(Xn) is given by

N a =n

i=1

Y i(X i),

which has mean 2n/3 and variance 2n/9.

15



(d) Since the n-tuple Xn is iid, the sample outcome w(xn) =

i w(xi). Let na be the samplevalue of N a corresponding to xn. Since w(a) = − log2/3 and w(b) = − log1/3, we have

w(xn) = na(− log2/3) + (n − na)(− log1/3) = n log3 − na

W (Xn) = n log3 − N a.

In other words, W (Xn), the fluctuation of W (Xn) around its mean, is the negative of thefluctuation of N a(Xn); that is W (Xn) = − N a(Xn).

(e) The typical set is given by:

T n =

xn :

w(xn)

n − E [W ]

<

=

xn :

w(xn)

n

<

=

xn :

na(xn)

n

<

=

xn : 105

2

3 −

< na(xn) < 105

2

3 +

.

where we have used w(xn) = −na(xn). Thus, α = 105

23 −

and β = 105

23 +

.

(f) From part (c), N a = 2n/3 and σ

2

N a = 2n/9.The CLT says that for n large, the sum of n iid random variables (rvs) has a distributionfunction close to Gaussian within several standard deviations from the mean. As n increases,the range and accuracy of the approximation increase. In this case, α and β are 103 belowand above the mean respectively. The standard deviation is

2 · 105/9, so α and β are about

6.7 standard deviations from the mean. The probability that a Gaussian rv is more than 6.7standard deviations from the mean is about (1.6) · 10−10.

This is not intended as an accurate approximation, but only to demonstrate the weakness of theChebyshev bound, which is useful in bounding but not for numerical approximation.

Exercise 2.26: Any particular string xn which has i a’s and n

−i b’s has probability 2

3i 13n−i

.

This is maximized when i = 105, and the corresponding probability is 10−17,600. Those stringswith a single b have a probability 1/2 as large, and those with 2 b’s have a probability 1/4 aslarge. Since there are

ni

different sequences that have exactly i a’s and n − i b’s,

Pr{N a = i} =

n

i

2

3

i 1

3

n−i

.

Evaluating for i = n, n−1, and n−2 for n = 105:

Pr{N a = n} =

2

3

n

≈ 10−17,609

P r{N a = n−1} = 105 23n

−1 1

3 ≈ 10−17604

Pr{N a = n−2} =

105

2

2

3

n−2 1

3

2

≈ 10−17600.

What this says is that the probability of any given string with na ones decreases as na decreases,while the aggregate probability of all strings with na a’s increases with na (for na large comparedto N a). We saw in the previous exercise that the typical set is the set where na is close to N a

16



and we now see that the most probable individual strings have fantastically small probability,both individually and in the aggregate, and thus can be ignored.

Exercise 2.27:

(a) The good set Gn differs from the typical set T n in that it excludes only the sequences that are

atypically improbable. Those that are atypically probable are left in. In the previous exercise,the sequences with N a = 0, 1, or 2 are atypically probable.

(b) Since T n is a subset of Gn , Pr(Gn

) ≥ Pr(T e psn) ≥ 1 − σ2W

/n2.

(c) Since every element of Gn has a probability exceeding 2−n(H(X )+), the number of such

elements is at most 2n(H(X )+). Thus

|Gn | ≤ 2n(H(X )+α) where α = .

(d) Every element of Gn that is not in T n has a probability at least 2−n(H(X )−). Thus

|Gn

| − |T n

| ≤2n(H(X )+α) where α =

−.

(e) We need a lower bound on T n , and the one in (2.26) of the text does not suffice. Anotherpossibility is that T n contains T n/2. Using this in (2.26),

|T n | ≥ |T n/2| ≥

1 − 4σ2W

n2

2−n(H(X )−/2).

With this, we see immediately that limn→∞ |Gn − T n |/|T n | = 0, converging as e−n/2. In other

words, most elements of Gn are in fact typical for large n.

Exercise 2.28:

(a) The probability of an n-tuple xn = (x1, . . . , xn) is pXn(xn) = n

k=1 pX (xk). This productincludes N j(xn) terms xk for which xk is the letter j , and this is true for each j in the alphabet.Thus

pXn(xn) =M

j=1

pN j(xn)

j . (5)

(b) Taking the log of (5),

− log pXn(xn) = j

N j(xn)log 1

p j. (6)

Using the definition of S n , all xn ∈ S n must satisfy

j

np j(1 − )log 1

p j<

j

N j (xn)log 1

p j< np j (1 + )log

1

p j

nH(X )(1 − ) <

j

N j (xn)log 1

p j< nH(X )(1 + ).

17



Combining this with (6, every xn ∈ S (n) satisfies

H(X )(1 − ) < − log pXn(xn)

n < H(X )(1 + ). (7)

(c) With = H(X ), (7) shows that for all xn ∈ S n ,

H(X ) − < − log pXn(xn)n

< H(X ) + .

By (2.25) in the text, this is the defining equation of T n , so all xn in S n are also in T n .

(d) For each j in the alphabet, the WLLN says that for any given > 0 and δ > 0, and for allsufficiently large n,

Pr

N j(xn)

n − p j

≥

≤ δ

M . (8)

For all sufficiently large n, (8) is satisfied for all j , 1 ≤ j ≤ M . For all such large enough n, each

xn is either in S n or is a member of the event that |N j(xn)n − p j | ≥ for some j. The union of

the events that |N j(xn

)n − p j | ≥ for some j is upper bounded by δ , so Pr(S n ) ≥ 1 − δ .

(e) The proof here is exactly the same as that of Theorem 2.7.1. Part (b) gives upper and lowerbounds on Pr(xn) for xn ∈ S n and (d) shows that 1 − δ ≤ Pr(S n ≤ 1, which together give thedesired bounds on the number of elements in S n .

Exercise 2.30:

(a) First note that the chain is ergodic (i.e., it is aperiodic and all states can be reached from allother states). Thus steady state probabilities q (s) exist and satisfy the equations

s q (s) = 1

and q (s) =

s q (s)Q(s|s). For the given chain, these latter equations are

q (1) = q (1)(1/2) + q (2)(1/2) + q (4)(1)

q (2) = q (1)(1/2)

q (3) = q (2)(1/2)

q (4) = q (3)(1).

Solving by inspection, q (1) = 1/2, q (2) = 1/4, and q (3) = q (4) = 1/8.

(b) To calculate H (X 1) we first calculate the pmf pX 1(x) for each x ∈ X . Using the steady stateprobabilities q (s) for S 0, we have pX 1(x) =

s q (s) Pr{X 1=x|S 0=s}. Since X 1=a occurs with

probability 1/2 from both S 0=0 and S 0=2 and occurs with probability 1 from S 0=4,

pX 1(a) = q (0)1

2 + q (2)

1

2 + q (4) =

1

2.

Similarly, pX 1(b) = pX 1(c) = 1/4. Hence the pmf of X 1 is1

2 , 14 , 1

4

and H (X 1) = 3/2.

(c) The pmf of X 1 conditioned on S 0 = 1 is { 12 , 1

2}. Hence, H (X 1|S 0=1) = 1. Similarly,H (X 1|S 0=2)=1. There is no uncertainty from states 3 and 4, so H (X 1|S 0=3) = H (X 1|S 0=4) =0.

Since H (X 1|S 0) is defined as

s Pr(S 0 = s)H(X |S 0=s), we have

H (X 1|S 0) = q (0)H (X 1|S 0=0) + q (1)H (X 1|S 0=0) = 3

4,

18



which is less than H (X 1) as expected.

(d) We can achieve L = H (X 1|S 0) by achieving L(s) = H (X 1|s) for each state s ∈ S . To dothat, we use an optimal prefix-free code for each state.

For S 0 = 1, the code {a → 0, b → 1} is optimal with L(S 0=1) = 1 = H (X 1|S 0=1).

Similarly, for S 0=2 {

a→

0, c→

1}

is optimal with L(S 0=2) = 1 = H (X 1|S 0=2).

Since H (X 1|S 0=3) = H (X 1|S 0=4) = 0, we do not use any code at all for the states 3 and 4.In other words, our encoder does not transmit any bits for symbols that result from transitionsfrom these states.

Now we explain why the decoder can track the state after time 0. The decoder is assumedto know the initial state. When in states 1 or 2, the next codeword from the correspondingprefix-free code uniquely determines the next state. When state 3 is entered, the the next statemust be 4 since there is a single deterministic transition out of state 3 that goes to state 4 (andthis is known without receiving the next codeword). Similarly, when state 4 is entered, the nextstate must be 1. When states 3 or 4 are entered, the next received codeword corresponds to thesubsequent transition out of state 1. In this manner, the decoder can keep track of the state.

(e)The question is slightly ambiguous. The intended meaning is how many source symbolsx1, x2, . . . , xk must be observed before the new state sk is known, but one could possibly interpretit as determining the initial state s0.

To determine the new state, note that the symbol a always drives the chain to state 0 and thesymbol b always drives it to state 2. The symbol c, however, could lead to either state 3 or 4.In this case, the subsequent symbol could be c, leading to state 4 with certainty, or could be a,leading to state 1. Thus at most 2 symbols are needed to determine the new state.

Determining the initial state, on the other hand, is not always possible. The symbol a couldcome from states 1, 2, or 4, and no future symbols can resolve this ambiguity.

A more interesting problem is to determine the state, and thus to start decoding correctly, when

the initial state is unknown at the decoder. For the code above, this is easy, since whenevera 0 appears in the encoded stream, the corresponding symbol is a and the next state is 0,permitting correct decoding from then on. This problem, known as the synchronizing problem,is quite challenging even for memoryless sources.

Exercise 2.31: We know from (2.37) in the text that H(XY ) = H(Y ) + H(X | Y ) for anyrandom symbols X and Y . For any k-tuple X 1, . . . , X k of random symbols, we can view X kas the symbol X above and view the k − 1 tuple X k−1, X k−2, . . . , X 1 as the symbol Y above,getting

H(X k, X k−1 . . . , X 1) = H(X k | X k−1, . . . , X 1) + H(X k−1, . . . , X 1).

Since this expresses the entropy of each k-tuple in terms of a k − 1-tuple, we can iterate, getting

H(X n, X n−1, . . . , X 1) = H (X n | X n−1, . . . , X 1) + H (X n−1, . . . , X 1)

= H(X n | X n−1, . . . , X 1) + H(X n−1 | X n−2 . . . , X 1) + H(X n−2 . . . , X 1)

= · · · =n

k=2

H(X k | X k−1, . . . , X 1) + H(X 1).

19



Exercise 2.32:

(a) We must show that H(S 2|S 1S 0) = H(S 2|S 1). Viewing the pair of random symbols S 1S 0 as arandom symbol in its own right, the definition of conditional entropy is

H(S 2|S 1S 0) =

s1,s0

Pr(S 1, S 0 = s1, s0)H(S 2|S 1=s1, S 0=s0)

=s1s0

Pr(s1s0)H(S 2|s1s0). (9)

where we will use the above abbreviations throughout for clarity. By the Markov property,Pr(S 2=s2|s1s0) = Pr(S 2=s2|s1) for all symbols s0, s1, s2. Thus

H(S 2|s1s0) =

s2

− Pr(S 2=s2|s1s0)logPr(S 2=s2|s1s0)

=

s2

− Pr(S 2=s2|s1)logPr(S 2=s2|s1) = H(S 2|s1).

Substituting this in (9), we get

H(S 2|S 1S 0) =s1s0

Pr(s1s0)H(S 2|s1)

=

s1

Pr(s1)H(S 2|s1) = H(S 2|S 1). (10)

(b) Using the result of Exercise 2.31,

H(S 0, S 1, . . . , S n) =n

k=1

H(S k | S k−1, . . . , S 0) + H(S 0).

Viewing S 0 as one symbol and the n-tuple S 1, . . . , S n as another,

H(S 0, . . . , S n) = H(S 1, . . . , S n | S 0) + H(S 0).

Combining these two equations,

H(S 1, . . . , S n | S 0) =n

k=1

H(S k | S k−1, . . . , S 0). (11)

Applying the same argument as in part (a), we see that

H(S k | S k−1, . . . , S 0) = H(S k | S k−1).

Substituting this into (11),

H(S 1, . . . , S n | S 0) =n

k=1

H(S k | S k−1).

(c) If the chain starts in steady state, each successive state has the same steady state pmf, soeach of the terms above are the same and

H(S 1, . . . , S n|S 0) = nH(S 1|S 0).

20



(d) By definition of a Markov source, the state S 0 and the next source symbol X 1 uniquelydetermine the next state S 1 (and vice-versa ). Also, given state S 1, the next symbol X 2 uniquelydetermines the next state S 2. Thus, Pr(x1x2|s0) = Pr(s1s2|s0) where x1x2 are the samplevalues of X 1X 2 in one-to-one correspondence to the sample values s1s2 of S 1S 2, all conditionalon S 0 = s0.

Hence the joint pmf of X 1X 2 conditioned on S 0=s0 is the same as the joint pmf for S 1S 2conditioned on S 0=s0. The result follows.

(e) Combining the results of (c) and (d) verifies (2.40) in the text.

Exercise 2.33: Lempel-Ziv parsing of the given string can be done as follows:

Step 1: 00011101 001 0101100

↑ u = 7 n = 3Step 2: 00011101001 0101

100

u = 2 ↑ n = 4

Step 3: 000111010010101 100 ↑ u = 8 n = 3

The string is parsed in three steps. In each step, the window is underlined and the parsed blockis underbraced. The (n, u) pairs resulting from these steps are respectively (3,7), (4,2), and(3,8).

Using the unary-binary code for n, which maps 3 → 011 and 4 → 00100, and a standard 3-bitmap for u, 1 ≤ u ≤ 8, the encoded sequence is 011, 111, 00100, 010, 011, 000 (transmitted withoutcommas).

Note that for small examples, as in this case, LZ77 may not be very efficient. In general, the

algorithm requires much larger window sizes to compress efficiently.

Exercise 2.34:

(a) The initial substring is the first w = 1024 symbols, which is encoded directly as the string01024. The window then contains the string 01024.

The second substring is the remaining string of 5000 − 1024 = 3976 zeroes, so n = 3976, u = 1.(Actually there is a minor ambiguity here; the longest match is for n = 3976, which is a matchfor any u, 1 ≤ u ≤ 1024.) n=3976 is unary-binary encoded into log3976 = 11 zeroes followedby 1503103. u=1 is encoded into 091. The window then contains the string 01024.

The third substring is the single symbol 1 (since there is no match) corresponding to n = 1. We

send a 1

to specify the symbol 1, followed by 1

to encode n = 1. The window then contains thestring 010231.

The fourth substring is the remaining string of 3999 ones so n = 3999, u = 1. n=3999 is encodedinto log3999 = 11 zeroes followed by 150215. u=1 is encoded into 091. The window thencontains the string 11024.

The fifth substring is the single bit 0 (since there is no match) corresponding to n = 1. We senda 0 to specify the symbol 0 followed by a 1 to encode n = 1. The window then contains thestring 110230.

21



The sixth (and last) substring is the remaining string of 999 zeroes, so n = 999, u = 1. n=999is encoded into log999 = 9 zeroes followed by 150213. u=1 is encoded into 091.

Concatenating all the encodings described above, the overall encoded string is0103515031012130111502150910109150213091.

(b) Adding up the above lengths, we get a total of 1024+23+10+2+23+10+2+19+10 = 1123

bits to encode the whole string, of which 1024 are for the initial uncoded string to fill the window.

(c) There are three differences between the case w = 8 and the case w = 1024. First, the initialwindow requires w = 8 bits rather than 1024 bits. Second, we require only 3 bits to encode urather than 10 bits. Third, the first (n, u) pair is (4992, 1) rather than (3976, 1), which requiresa 25-bit codeword for n. The remaining (n, u) pairs are identical. This results in a total of 8 + 25 + 3 + 1 + 1 + 23 + 3 + 1 + 1 + 19 + 3 = 88 bits to encode the string.

Note that a window size of w = 1 would actually be optimal for this source sequence.

(d) All the 5000 + 999 ‘0’s (excluding the last symbol of the sequence) are followed by a ‘0’

except one ‘0’ which is followed by ‘1’. Similarly all except one of the 4000 ‘1’s are followed by‘1’s. So a simple two-state Markov model is as shown below. The symbol on the state transitiondenotes the source output during that state transition. State transition probabilities are writtenon the state transition arrows.

0

1

0; 1/4000

1; 1/5999

0; 5998/5999

1; 3999/4000

Markov model

(e) First we find the steady state probabilities of the states of this Markov chain by solving the

three linear equations

q 0 = 5998

5999q 0 +

1

4000q 1

q 1 = 1

5999q 0 +

3999

4000q 1

1 = q 0 + q 1.

Solving gives q 0 = Pr(s = 0) = 0.6 and q 1 = Pr(s = 1) = 0.4.

H(X |S ) = Pr(s = 0)H(X

|s = 0) + Pr(s = 1)H(X

|s = 1)

= 0.6

1

5999 log 5999 +

5998

5999 log

5999

5998

+ 0.4

1

4000 log 4000 +

3999

4000 log

4000

3999

= 0.0027.

22



Exercise 2.35:

(a) Since l1 ≤ l2 ≤ · · · ≤ l j , there are j codewords whose lengths are at most l j and at most 2lj

prefix-free nodes of length l j or less. Thus l j ≥ log j. If j is less than the alphabet size M , thenthere must be at least one intermediate node of length l j for the remaining codewords. Thusl j ≥ log l j + 1. There is one exception to what is to be shown, however, and that is if j = M

and j is a power of 2, it is possible for all codewords to have the same length log j = log j.(b) In the unary-binary code, l j = 2log j + 1, so the asymptotic efficiency is 2.

(c) As suggested in the hint, we can use a unary-binary code to encode the integers n = log j+1used for the unary part of the unary binary code. One might call this a unary-binary-binarycode. The length of codeword j is then log j + 2log n + 1 where n = log j + 1. This isessentially log j + log log j, so the asymptotic efficiency is 1.

23



Principles of Digital Communication

Solutions to Exercises – Chapter 3

Exercise 3.1:

(a) Subject to the justification in parts (b) and (c), we guess that the 3-bit quantizer that min-imizes mean square distortion is the uniform quantizer with 8 equal-sized quantization intervalsbounded by endpoints 1

8{−8, −6, −4, −2, 0, +2, +4, +6, +8} and 8 equally-spaced representationpoints 1

8{−7, −5, −3, −1, +1, +3, +5, +7}.

(b) To verify that this uniform quantizer satisfies the Lloyd-Max necessary conditions, let b j bethe point separating R j and R j+1 and let a j be the representation point for R j. Then

(i) b j = (a j + a j+1)/2 for all j ;

(ii) a j = E [U |U ∈ R j] for all j, since the conditional distribution over R j is uniform and itsmean is in the center of the interval R j.

(c) The two necessary conditions for optimality in part (b) may be written as

b j = a j + a j+1

2 (12)

a j = b j−1 + b j

2 . (13)

By substituting (13) into (12), we get the relation

b j − b j−1 = b j+1 − b j.

This says that in any set of regions satisfying the necessary conditions, region R j has the samesize as R j+1 for all j; i.e., all regions R j have the same size. This, together with equation (13)uniquely specifies the eight regions and eight representation points, so this is the only set of regions and representation points that satisfy the necessary conditions; i.e., this is the uniqueoptimal quantizer.

Exercise 3.2:

(a) An M = 2 quantizer has two representation points a1 and a2 and two corresponding quan-tization intervals separated by the point b1. Since the source distribution is uniform between 0and 1, the following conditions must hold for optimality:

b1 = a1 + a2

2 ;

a1 = 0 + b1

2 ;

a2 = b1 + 1

2 .

24



Solving these equations yields a uniform 1-bit quantizer with b1 = 1/2, a1 = 1/4, and a2 = 3/4.As in Exercise 3.1, this solution is unique.

(b) The mean squared error for each symbol is

MSE = (U − V (U ))2 = 120u −

1

42

du + 1

12

u − 3

42

du = (1/2)2

12 = 1

48 .

More simply, the MSE in a quantization interval of size ∆ with a uniform probability density is∆2/12. Since ∆ = 1/2 here, the result follows.

(c) Since we have 1 quantization bit per symbol, we can use 2 quantization bits to achieve 4quantization regions for two symbols. The simple approach, for each n, is to use a scalar uniform2-bit quantizer for u2n−1 and to use this single 2-bit quantization for both u2n−1 and u2n. In eachdimension, the mean squared error for each interval is ∆2/12, where the quantization interval is∆ = 1/4. Thus the mean square error for each symbol is MSE = 1/192.

As illustrated in part (d), this can also be viewed directly in two dimensions (u1, u2) where theprobability is uniformly distributed over the line segment u2 = u1, 0

≤ u1

≤ 1. The Lloyd-Max

conditions are then satisfied by 4 points at {( 18 , 1

8 ), ( 38 , 3

8 )( 58 , 5

8 ), ( 78 , 7

8 )} and the quantizationregions are formed by perpendicular bisectors between these points.

(d)

0 10

1

Scalar quantizer with M = 2

Two-dimensional quantizer with M = 4

Note that the 1-bit scalar quantizer uses the same rule for each sample and does not makeuse of the memory of the source. The quantization regions in the upper left and bottom rightquadrants of the 1-bit scalar quantizer will never be used so the quantizer wastes 0.5 bits persymbol (1 bit per 2 symbols) specifying something that will never occur.

Exercise 3.3:

(a) Given a1 and a2, b the Lloyd-Max conditions assert that b should be chosen half way betweenthem, i.e., b = (a1 + a2)/2. This insures that all points are mapped into the closest quantizationpoint. If the probability density is zero in some region around (a1 + a2)/2, then it makes nodifference where b is chosen within this region, since those points can not affect the MSE.

(b) Note that y(x)/Q(x) is the expected value of U conditional on U

≥ x, Thus, given b, the

MSE choice for a2 is y(b)/Q(b). Similarly, a1 is (E[U ] − y(b))/1 − Q(x). Using the symmetrycondition, E[U ] = 0, so

a1 = −y(b)

1 − Q(b) a2 =

y(b)

Q(b). (14)

(c) Because of the symmetry,

Q(0) =

∞0

f (u) du =

∞0

f (−u) du =

0

−∞f (u) du = 1 − Q(0).

25



This implicity assumes that there is no impulse of probability density at the origin, since suchan impulse would cause the integrals to be ill-defined. Thus, with b = 0, (14) implies thata1 = −a2.

(d) Part (c) shows that for b = 0, a1 = −a2 satisfies step 2 in the Lloyd-Max algorithm, andthen b = 0 = (a1 + a2)/2 then satisfies step 3.

The solution in part (d) for the density below is b = 0, a1 = −2/3, and a2 = 2/3. Anothersolution is a2 = 1, a1 = −1/2 and b = 1/3. The final solution is the mirror image of the second,namely a1 = −1, a2 = 1/2, and b = −1/3.

-1 0 1

1

3

1

3

1

3

f (u)

(f) The MSE for the first solution above (b = 0) is 2/9. That for each of the other solutionsis 1/6. These latter two solutions are optimal. On reflection, choosing the separation point b

in the middle of one of the probability pulses seems like a bad idea, but the main point of theproblem is that finding optimal solutions to MSE problems is often messy, despite the apparentsimplicity of the Lloyd-Max algorithm.

Exercise 3.4:

(a) Using the hint, we minimize

MSE(∆1, ∆2) + λf(∆1, ∆2) = 1

12

∆2

1f 1L1 + ∆22f 2L2

+ λ

L1

∆1+

L2

∆2

over both ∆1 and ∆2. The function is convex over ∆1 and ∆2, so we simply take the derivative

with respect to each and set it equal to 0, i.e.,

1

6∆1f 1L1 − λ

L1

∆21

= 0; 1

6∆2f 2L2 − λ

L2

∆22

= 0.

Rearranging,

6λ = ∆31f 1 = ∆3

2f 2,

which means that for each choice of λ, ∆1f 1/31 = ∆2f

1/32 .

(b) We see from part (a) that ∆1/∆2 = (f 2/f 1)1/3 is fixed independent of M . Holding this ratiofixed, MSE is proportional to ∆2

1 and M is proportional to 1/∆1 Thus M 2 MSE is independent

of ∆1 (for the fixed ratio ∆1/∆2).

M 2MSE = 1

12

f 1L1 +

∆22

∆21

f 2L2

L1 + L2

∆1

∆2

2

= 1

12

f 1L1 + f

2/31 f

1/32 L2

L1 + L2

f 1/32

f 1/31

2

= 1

12

f

1/31 L1 + f

1/32 L2

3.

26



(c) If the algorithm starts with M 1 points uniformly spaced over the first region and M 2 pointsuniformly spaced in the second region, then it is in equilibrium and never changes.

(d) If the algorithm starts with one additional point in the central region of zero probabilitydensity, and if that point is more than ∆1/2 away from region 1 and δ 2/2 away from region2, then the central point is unused (with probability 1). Since the conditional mean over the

region mapping into that central point is not well defined, it is not clear what the algorithmwill do. If it views that conditional mean as being in the center of the region mapped into thatpoint, then the algorihm is in equilibrium. The point of parts (c) and (d) is to point out thatthe Lloyd-Max algorithm is not very good at finding a global optimum.

(e) The probability that the sample point lies in region j ( j = 1, 2) is f jL j. The mean squareerror, using M j points in region j and conditional on lying in region j , is L2

j /(12M 2 j ). Thus, theMSE with M j points in region j is

MSE = f 1L3

1

12M21

+ f 2L3

2

12(M2)2.

This can be minimized numerically over integer M 1 subject to M 1 + M 2 = M . This wasminimized in part (b) without the integer constraint, and thus the solution here is slightlylarger than that there, except in the special cases where the non-integer solution happens to beinteger.

(f) With given ∆1 and ∆2, the probability of each point in region j, j = 1, 2, is f j ∆ j and thenumber of such points is L j/∆ j (assumed to be integer). Thus the entropy is

H (V ) = L1

∆1(f 1∆1) ln

1

f 1L1

+

L2

∆2(f 2∆2) ln

1

f 2L2

= −L1f 1 ln(f 1L1) − L2f 2 ln(f 2L2).

(g) We use the same Lagrange multiplier approach as in part (a), now using the entropy H (V )as the constraint.

MSE(∆1, ∆2) + λH(∆1, ∆2) = 1

12

∆2

1f 1L1 + ∆22f 2L2

− λf 1L1 ln(f 1∆1) − λf 2L2 ln(f 2∆2).

Setting the derivatives with respect to ∆1 and ∆2 equal to zero,

1

6∆1f 1L1 − λf 1L1

∆1= 0;

1

6∆2f 2L2 − λf 2L2

∆2= 0.

This leads to 6λ = ∆21 = ∆2

2, so that ∆1 = ∆2. This is the same type of approximation as beforesince it ignores the constraint that L1/∆1 and L2/∆2 must be integers.

Exercise 3.5:

(a) Let f Z (z), be an arbitrary density that is non-zero only within [−A, A]. Let f U (z) = 12A for

−A ≤ z ≤ A be the uniform density within [−A, A]. For the uniform density, h(U ) = log(2A).

27



To compare h(Z ) for an arbitrary density with h(U ), note that h(Z ) − log2A is given by

h(Z ) − log2A =

A

−Af Z (z)log[f U (z)/f Z (z)] dz

= log(e) A

−A

f Z (z)ln[f U (z)/f Z (z)] dz

≤ log(e)

A

−A[f U (z) − f Z (z)] dz = 0,

where we have used the inequality ln x ≤ x − 1.

(b) The inequality ln x ≤ x− 1 is satisfied with equality only where x = 1. Thus h(Z ) < log(2A)unless f Z (z) = f U (z) everywhere (or more strictly, almost everywhere in the measure theoreticsense).

Exercise 3.6:

(a) The probability of the quantization region

R is A = ∆( 1

2

+ x + ∆

2

). To simplify the algebraic

messiness, shift U to U − x − ∆/2, which, conditional on R, lies in [−∆/2, ∆/2]. Let Y denotethis shifted conditional variable. As shown below, f Y (y) = 1

A [y + (x+ 12 + ∆

2 )].

E[Y ] =

∆/2

−∆/2

y

A[y + (x +

1

2 +

∆

2 )] dy =

∆/2

−∆/2

y2

A dy +

∆/2

−∆/2

y

A[x +

1

2 +

∆

2 ] dy =

∆3

12A,

since, by symmetry, the final integral above is 0.

x

∆

f U (u)12

+ x

−∆2

∆2

f Y (y) = 1A

(y+x+ 12

+∆2

)

y

Since Y is the shift of U conditioned on R,

E[U |R] = x + ∆

2 + E[Y ] = x +

∆

2 +

∆3

12A .

That is, the conditional mean is slightly larger than the center of the region R because of theincreasing density in the region.

(b) Since the variance of a rv is invariant to shifts, MSE = σ2U |R

= σ2Y . Also, note from symmetry

that ∆/2−∆/2 y3 dy = 0. Thus

E[Y 2] =

∆/2

−∆/2

y2

A

y + (x +

1

2 +

∆

2 )

dy =

(x + 12 + ∆

2 )

A

∆3

12 =

∆2

12 .

MSE = σ2Y = E[Y 2] − (E[Y ])2 =

∆2

12 −

∆3

12A

2

.

28



MSE − ∆2

12 = −

∆3

12A

2

= ∆4

144(x + 12 + ∆

2 )2.

(c) The quantizer output V is a discrete random variable whose entropy H[V ] is

H(V ) =

M j=1

j∆

( j−1)∆ −f U (u)log[f (u)∆] du = 1

0 −f U (u)log[f (u)] du − log∆

and the entropy of h(U ) is by definition

h[U ] =

1

−0−f U (u)log[f U (u)] du.

Thus,

h[U ] − log∆ − H[V ] =

1

0f U (u)log[f (u)/f U (u)] du.

(d) Using the inequality ln x

≤x

−1, 1

0f U (u)log[f (u)/f U (u)] du ≤ log e

1

0f U (u)[f (u)/f U (u) − 1] du

= log e

1

0f (u) du −

1

0f U (u)

= 0.

Thus, the difference h[U ] − log∆ − H[V ] is non-positive (not non-negative).

(e) Approximating ln x by (1+x) − (1+x)2/2 for x = f (u)/f (u) and recognizing from part d)that the integral for the linear term is 0, we get

1

0f U (u)log[f (u)/f U (u)] du ≈ −

1

2 log e 1

0

f (u)

f (u) − 12

du (15)

= −1

2 log e

1

0

[f (u) − f U (u)]2

f U (u) du. (16)

Now f (u) varies by at most ∆ over any single region, and f (u) lies between the minimum andmaximum f (u) in that region. Thus |f (u) − f (u)| ≤ ∆. Since f (u) ≥ 1/2, the integrand aboveis at most 2∆2, so the right side of (16) is at most ∆ 2 log e.

Exercise 3.7:

(a) Note that 1u(ln u)2 is the derivative of

−1/ ln u and thus integrates to 1 over the given interval.

(b)

h(U ) =

∞e

1

u(ln u)2[ln u + 2 ln(lnu)] du =

∞e

1

u ln u du +

∞e

2 ln(ln u)

u(ln u)2 du.

The first integrand above is the derivative of ln(ln u) and thus the integral is infinite. The secondintegrand is positive for large enough u, and therefore h(U ) is infinite.

(c) The hint establishes the result directly.

29



Exercise 3.8:

(a) As suggested in the hint2, (and using common sense in any region where f (x) = 0)

−D(f g) =

f (x) ln

g (x)

f (x) dx

≤ f (x) g(x)f (x)

− 1 dx = g(x) dx − f (x) dx = 0.

Thus D(f g) ≥ 0,

(b)

D(f φ) =

f (x) ln

f (x)

φ(x) dx

= −h(f ) +

f (x)

ln

√ 2πσ2 +

x2

2σ2

dx

= −h(f ) +√

2πeσ2.

(c) Combining parts (a) and (b), h(f ) ≤ √ 2πeσ2. Since D(φφ) = 0, this inequality is satisfied

with equality for a Gaussian rv ∼ N (0, σ2).

Exercise 3.9:

(a) For the same reason as for sources with probability densities, each representation point a j

must be chosen as the conditional mean of the set of symbols in R j. Specifically,

a j =

i∈Rj

pi ri

i∈Rj

pi.

(b) The symbol ri has a squared error |ri − a j |2 if mapped into R j and thus into a j . Thus ri

must be mapped into the closest a j and thus the region R j must contain all source symbolsthat are closer to a j than to any other representation point. The quantization intervals are notuniquely determined by this rule since R j can end and R j+1 can begin at any point betweenthe largest source symbol closest to a j and the smallest source symbol closest to a j+1.

(c) For ri midway between a j and a j+1, the squared error is |ri − a j |2 = |ri − a j+1|2 no matterwhether ri is mapped into a j or a j+1.

(d) In order for the case of part (c) to achieve MMSE, it is necessary for a j and a j+1 to eachbe the conditional mean of the set of points in the corresponding region. Now assume that a j

is the conditional mean of

R j under the assumption that ri is part of

R j . Switching ri to

R j+1

will not change the MSE (as seen in part (c)), but it will change R j and will thus change theconditional mean of R j . Moving a j to that new conditional mean will reduce the MSE. Thesame argument applies if ri is viewed as being in R j+1 or even if it is viewed as being partly inR j and partly in R j+1.

2A useful feature of divergence is that it exists whether or not a density exists; it can be defined over anyquantization of the sample space and it increases as the quantization becomes finer, thus approaching a limit(which might be finite or infinite).

30



Exercise 3.10: Let a1 . . . , aM be the representation points for a scalar quantizer satisfying theLloyd-Max conditions, and let b1, . . . , bM −1 be the end points between the M regions. For thecorresponding 2D quantizer, the representation points are pairs, (aia j) and the lines separatingthe regions are perpendicular bisectors between those representation points. These perpendicularbisectors are either horizontal or vertical and if horizontal are placed at heights b1, . . . , bM −1

and if vertical are placed at horizontal locations b1, . . . , bM −1. Thus these regions satisfy the2D Lloyd-Max condition for the regions conditional on the representation points.

Since these regions are rectangles and the source distribution is independent between the twodimensions, the conditional mean within a region is a 2D vector whose components are the condi-tional means for the corresponding input component. Thus satisfaction of the scalar Lloyd-Maxcondition guarantees that the corresponding 2D quantizer satisfies the 2D Lloyd-Max condition.

For high rate quantizers, we know that hexagonal regions provide lower mean square distortionthan rectangular regions, so this is a good example of how the Lloyd-Max algorithm gets ‘stuck’in non-optimal quantization regions.

31



Exercise 4.2:

From (4.1) in the text, we have u(t) = ∞

k=−∞ uke2πikt/T for t ∈ [−T /2, T /2]. Substituting this

into T /2−T /2 u(t)u∗(t)dt, we have

T /2

−T /2 |u(t)|2

dt = T /2

−T /2

∞k=−∞ uke

2πikt/T ∞

=−∞ u∗ e−2πit/T

dt

=∞

k=−∞

∞=−∞

uku∗

T /2

−T /2e2πi(k−)t/T dt

=

∞k=−∞

∞=−∞

uku∗ T δ k,,

where δ k, equals 1 if k = and 0 otherwise. Thus,

T /2

−T /2 |u(t)

|2dt = T

∞

k=−∞ |uk

|2.

Exercise 4.3:

(a) Let a1, a2, . . . , be an arbitrary listing of the integers (positive and negative). If they arelisted in numerical order, then a1 < a j for each positive integer j , and thus the integer a1 − 1 isnot in the list. This contradicts the assumption that some ordered list of the integers exists.

(b) Again, let a1, a2, . . . , be a listing of the rationals in (0, 1), then a1 > 0, so a1/2 is a smallerrational in (0,1) than a1.

(c) Now let a1, a2, . . . , be a listing of the rationals in [0, 1]. Then if the list is ordered, a2 must

be positive, so if a1 = 0, then a2/2 cannot be on the list. Similarly if a1 = 0, then 0 is not onthe list.

Exercise 4.4:

(a) Note that sa(k) − sa(k − 1) = ak ≥ 0, so the sequence sa(1), sa(2), . . . , is non-decreasing.A standard result in elementary analysis states that a bounded non-decreasing sequence musthave a limit. The limit is the least upper bound of the sequence {sa(k); k ≥ 1}.

(b) Let J k = max{ j(a), j(2), . . . , j(k), i.e., J k is the largest index in a j(1), . . . , a j(k). Then

k

=1

b =

k

=1

a j()

≤

J k

j=1

a j

≤ S a.

By the same argument as in part (a), k

=1 b has a limit as k → ∞ and the limit, say S b is atmost S a.

(c) Using the inverse permutation to define the sequence {ak} from the sequence {bk}, the sameargument as in part (b) shows that S a ≤ S b. Thus S a = S b and the limit is independent of theorder of summation.

33



(d) The simplest example is the sequence {1, −1, 1, −1, . . . }. The partial sums here alternatebetween 1 and 0, so do not converge at all. Also, in a sequence taking two odd terms for eacheven term, the series goes to ∞. A more common (but complicated) example is the alternatingharmonic series. This converges to 0, but taking two odd terms for each even term, the seriesapproaches ∞.

Exercise 4.5:

(a) For E = I 1 ∪ I 2, with the left end points satisfying a1 ≤ a2, there are three cases to consider.

• a2 < b1. In this case, all points in I 1 and all points in I 2 lie between a1 and max{b1, b2}.Conversely all points in (a1, max{b1, b2}) lie in either I 1 or I 2. Thus E is a single intervalwhich might or might not include each end point.

• a2 > b1. In this case, I 1 and I 2 are disjoint.

• a2 = b1. If I 1 is open on the right and I 2 is open on the left, then I 1 and I 2 are separatedby the single point a2 = b1. Otherwise E is a single interval.

(b) Let E k = I 1∪I 2∪···∪I k and let J k be the final interval in the separated interval representationof E k. We have seen how to find J 2 from E 2 and note that the starting point of J 2 is either a1

or a2. Assume that in general the starting point of J k is a j for some j, 1 ≤ j ≤ k.

Assuming that the starting points are ordered a1 ≤ a2 ≤ · · · , we see that ak+1 is greater thanor equal to the starting point of J k. Thus J k ∪ I k+1 is either a single interval or two separatedintervals by the argument in part (a). Thus E k+1, in separated interval form, is E k, with J kreplaced either by two separated intervals, the latter starting with ak+1, or by a single intervalstarting with the same starting point as J k. Either way the starting point of J k+1 is a j for some j, 1 ≤ j ≤ k+1, verifying the initial assumption by induction.

(c) Each interval J k created above starts with an interval starting point a1, . . . , and ends with

an interval ending point b1, . . . , and therefore all the separated intervals start and end with suchpoints.

(d) Let I 1 ∪ · · · ∪ I be the union of disjoint intervals arising from the above algorithm and letI 1 ∪ · · · ∪ I i be any other ordered union of separated intervals. Let k be the smallest integer forwhich I k = I k . Then the starting points or the ending points of these intervals are different, orone of the two intervals is open and one closed on one end. In all of these cases, there is at leastone point that is in one of the unions and not the other.

Exercise 4.6:

(a) If we assume that the intervals {I j; 1 ≤ j < ∞} are ordered in terms of starting points, thenthe argument in Exercise 4.5 immediately shows that the set of separated intervals stays the

same as each new new interval I k+1 is added except for the possible addition of a new intervalat the right or the expansion of the right most interval. However, with a countably infinite setof intervals, it is not necessarily possible to order the intervals in terms of starting points (e.g.,suppose the left points are the set of rationals in (0,1)). However, in the general case, in goingfrom Bk to Bk+1, a single interval I k+1 is added to Bk. This can add a new separated interval,or extend one of the existing separated intervals, or combine two or more adjacent separatedintervals. In each of these cases, each of the separated intervals in Bk (including I j,k) eitherstays the same or is expanded. Thus I j,k ⊆ I j,k+1.

34



(b) Since I j,k ⊆ I j,k+1, the left end points of the sequence {I j,k; k ≥ j} is a monotonic decreasingsequence and thus has a limit (including the possibility of −∞). Similarly the right end pointsare monotonically increasing, and thus have a limit (possibly +∞). Thus limk→∞ I j,k exists asan interval I j that might be infinite on either end. Note now that any point in the interior of I jmust be in I j,k for some k . The same is true for the left (right) end point of I j if I j is closed on

the left (right). Thus I j must be in B for each j .(c) From Exercise 4.5, we know that for each k ≥ 1, the set of intervals {I 1,k, I 2,k, . . . , I k,k} isa separated set whose union is Bk. Thus, for each , j ≤ k, either I ,k = I j,k or I ,k and I j,k areseparated. If I ,k = I j,k, then the fact that I j,k ⊆ I j,k+1 ensures that I ,k+1 = I j,k+1, and thus,in the limit, I = I j . If I ,k and I j,k are separated, then, as explained in part (a), the additionof I k+1 either maintains the separation or combines I ,k and I j,k into a single interval. Thus, ask increases, either I ,k and I j,k remain separated or become equal.

(d) The sequence {I j ; j ≥ 1} is countable, and after removing repetitions it is still countable. Itis a separated sequence of intervals from (c). From (b), ∪∞ j=1 ⊆ B. Also, since B = ∪ jI j ⊆ ∪ jI j ,we see that B = ∪ jI j .

(e) Let

{I j; j

≥1

} be the above sequence of separated intervals and let

{I j ; j

≥1

} be any other

sequence of separated intervals such that ∪ jI j = B. For each j ≥ 1, let c j be the center pointof I j . Since c j is in B, c j ∈ I k for some k ≥ 1. Assume first that I j is open on the left. Lettinga j be the left end point of I j, the interval (a j, c j] must be contained in I k . Since a j /∈ B, a jmust be the left end point of I k and I k must be open on the left. Similarly, if I j is closed onthe left, a j is the left end point of I k and I k is closed on the left. Using the same analysis onthe right end point of I j, we see that I j = I k . Thus the sequence {I j ; j ≥ 1} contains eachinterval in {I j; j ≥ 1}. The same analysis applied to each interval in {I j ; j ≥ 1} shows that{I j; j ≥ 1} contains each interval in {I j ; j ≥ 1}, and thus the two sequences are the same exceptfor possibly different orderings.

Exercise 4.7:

(a) and (b) For any finite unions of intervals E 1 and E 2, (4.87) in the text states that

µ(E 1) + µ(E 2) = µ(E 1 ∪ E 2) + µ(E 1 ∩ E 2) ≥ µ(E 1 ∪ E 2),

where the final inequality follows from the non-negativity of measure and is satisfied with equalityif E 1 and E 2 are disjoint. For part (a), let I 1 = E 1 and I 2 = E 2 and for part (b), let Bk = E 1 and I k+1 = E 2.

(c) For k = 2, part (a) shows that µ(Bk) ≤ µ(I 1) + µ(I 2). Using this as the initial step of theinduction and using part (b) for the inductive step shows that µ(Bk) ≤ k

j=1 µ(I j) with equalityin the disjoint case.

(d) First assume that µ(B

) is finite (this is always the case for measure over the interval[−T /2, T /2]). Then since Bk is non-decreasing in k ,

µ(B) = limk→∞

µ(Bk) ≤ limk→∞

k j=1

µ(I k).

Alternatively, if µ(B) = ∞, then limk→∞k

j=1 µ(I k) = ∞ also.

35



Exercise 4.8: Let Bn = ∪∞ j=1I n,j. Then B = ∪n,jI n,j . The collection of intervals {I n,j ; 1 ≤ n ≤∞, 1 ≤ j ≤ ∞} is a countable collection of intervals since the set of pairs of positive integers iscountable.

Exercise 4.9:

(a)

µ(B) = µo(B) ≤ µo(A) + µo(B ∩ A) = µ(A) + µo(B ∩ A),

where the first equality uses Theorem 4.9.1, the following inequality uses Lemma 4.9.1, and thefinal inequality uses the definition of measurable set.

(b) Since A ⊆ B, A ⊆ B and µ(A ∪ A) = T , it follows that µ(B ∪ B) = T . From Lemma 4.9.2,

µ(B ∩ B) = µ(B) + µ(B) − µ(B ∪ B) = µ(B) + µ(B) − T.

(c)

µo(B ∩ A) ≤ µo(B ∩ B) = µ(B ∩ B) = µ(B) + µ(B) − T ≤ µ(B) + µ(A) + δ − T = µ(B) − µ(A) + δ.

where, in the first line, the inequality is the subset inequality for outer measure, the followingequality uses Theorem 4.9.1 (along with the fact that B ∩ B is a countable union of intervals),and the next equality is part (c).

(d) Since δ > 0 is arbitrary, µo(B ∩ A) ≤ µ(B) − µ(A). Combining this with part (a) completesthe derivation.

Exercise 4.10:

(a) and (b). Each cover Bn is a countable union of intervals and thus measurable. FromTheorem 4.9.4, the intersection D = ∩nBn is measurable. Every point in A is in each coverBn and thus in the intersection D. If we choose each Bn to satisfy µ(Bn) ≤ µo(A) + 2−n, thenµ(∩k

n=1Bn) ≤ µo(A) + 2−k, so µo(A) = µ(D).

(c) Using Lemma 4.9.3,

µo(D ∩ A) = µo(D) + µo(A) − µo(D ∪ A) = µo(D) + µo(A) − T,

where the final equality comes from the fact that A ⊆ D and thus D∪ A = [−T /2, T /2]. If A ismeasurable, then µo(A) = T − µo(A), so µo(D ∩ A) = 0. Finally, if A is measurable, D ∩ A) isalso, so the outer measure equals the measure.

Finally, if µo(

D ∩ A) = 0, then the complement of

D ∩ A must have outer measure T , so

D ∩ Ais measurable. Since D is also measurable, it follows that A must be measurable.

Exercise 4.11:

(a) Since {t : u(t) < β } is the complement of {t : u(t) ≥ β }, so if one is measurable, the other isalso.

(b) The union of {t : u(t) < β } and {t : u(t) < β } is {t : α ≤ u(t) < β }, so if the first two setsare measurable, the third is also.

36



(c) The following expresses {t : u(t) ≤ β } as a countable intersection of measurable sets.

{t : u(t) ≤ β } = ∩∞n=1{t : u(t) < β + 1/n}.

(d) The following expresses {t : u(t) < β } as a countable union of measurable sets.

{t : u(t) < β } = ∪∞n=1{t : u(t) ≤ β − 1/n}.

Exercise 4.12:

(a) By combining parts (a) and (c) of Exercise 4.11, {t : u(t) > β } is measurable for all β .Thus, {t : −u(t) < −β } is measurable for all β , so −u(t) is measurable. Next, for β > 0,{t : |u(t)| < β } = {t : u(t) < β } ∩ {t : u(t) > −β }, which is measurable.

(b) {t : u(t) < β } = {t : g(u(t)) < g(β ), so if u(t) is measurable, then g(u(t) is also.

(c) Since exp(

·) is increasing, exp[u(t)] is measurable by part (b). Part (a) shows that

|u(t)

| is

measurable if u(t) is. Both the squaring function and the log function are increasing for positivevalues, so u2(t) = |u(t)|2 and log(|u(t)| are measurable.

Exercise 4.13:

(a) Let y(t) = u(t) + v(t). We will show that {t : y(t) < β ) is measurable for all real β . Let > 0 be arbitrary and k ∈ Z be arbitrary. Then, for any given t,

(k − 1) ≤ u(t) < k and v(t) < β − k) =⇒ y(t) < β.

This means that the set of t for which the left side holds is included in the set of t for which theright side holds, so

{t : (k − 1) ≤ u(t) < k} ∩ {t : v(t) < β − k)} ⊆ {t : y(t) < β }.

This subset inequality holds for each integer k and thus must hold for the union over k,k

{t : (k − 1) ≤ u(t) < k} ∩ {t : v(t) < β − k)}

⊆ {t : y(t) < β }.

Finally this must hold for all > 0, so we choose a sequence 1/n for n ≥ 1, yieldingn≥1

k

{t : (k − 1)/n ≤ u(t) < k/n} ∩ {t : v(t) < β − k/n)}

⊆ {t : y(t) < β }.

The set on the left is a countable union of measurable sets and thus is measurable. It is alsoequal to {t : y(t) < β }, since any t in this set also satisfies y(t) < β − 1/n for sufficiently large n.

(b) This can be shown by an adaptation of the argument in (a). If u(t) and v(t) are positivefunctions, it can also be shown by observing that ln u(t) and ln v(t) are measurable. Thus thesum is measurable by part (a) and exp[ln u(t) + ln v(t)] is measurable.

Exercise 4.14: The hint says it all.

37



Exercise 4.15: (a) Restrict attention to t ∈ [−T /2, T /2] throughout. First we show thatvm(t) = inf ∞n=m un(t) is measurable for all m ≥ 1. For any given t, if un(t) ≥ V for all n ≥ m,then V is a lower bound to un(t) over n ≥ m, and thus V is greater than or equal to the greatestsuch lower bound, i.e., V ≥ vm(t). Similarly, vm(t) ≥ V implies that un(t) ≥ V for all n ≥ m.Thus,

{t : vm(t) ≥ V } =∞

n=m

{t : un(t) ≥ V }.

Using Exercise 4.11, the measurability of un implies that {t : un(t) ≥ V } is measurable foreach n. The countable intersection above is therefore measurable, and thus, using the result of Exercise 4.11 again, vm(t) is measurable for each m.

Next, if vm(t) ≥ V then vm(t) ≥ V for all m > m. This means that vm(t) is a non-decreasingfunction of m for each t, and thus limm vm(t) exists for each t. This also means that

{t : limm→∞ vm(t) ≥ V } =

∞

m=1

∞n=m

{t : un(t) ≥ V }

.

This is a countable union of measurable sets and is thus measurable, showing that lim inf un(t)is measurable.

(b) If liminf n un(t) = V 1 for a given t, then limm vm(t) = V 1, which implies that for the givent, the sequence {un(t); n ≥ 1} has a subsequence that approaches V 1 as a limit. Similarly, if limsupn un(t) = V 2 for that t, then the sequence {un(t), n ≥ 1} has a subsequence approachingV 2. If V 1 < V 2, then limn un(t) does not exist for that t, since the sequence oscillates infinitelybetween V 1 and V 2. If V 1 = V 2, the limit does exist and equals V 1.

(c) Using the same argument as in part (a), with inf and sup interchanged,

{t : lim sup un(t)

≤V

}=

∞

m=1 ∞

n=m{t : un(t)

≤V

}is also measurable, and thus lim sup un(t) is measurable. It follows from this, with the help of Exercise 4.13 (a), that lim supn un(t) − liminf n un(t) is measurable. Using part (b), limn un(t)exists if and only if this difference equals 0. Thus the set of points on which limn un(t) exists ismeasurable and the function that is this limit when it exists and 0 otherwise is measurable.

Exercise 4.16: As seen below, un(t) is a rectangular pulse taking the value 2n from 12n+1 to

32n+1 . It follows that for any t ≤ 0, un(t) = 0 for all n. For any fixed t > 0, we can visually seethat for n large enough, un(t) = 0. Since un(t) is 0 for all t greater than 3

2n+1, then for any fixed

t > 0, un(t) = 0 for all n > log23t − 1. Thus limn→∞ un(t) = 0 for all t.

Since limn→∞ un(t) = 0 for all t, it follows that limn→∞ un(t)dt = 0. On the other hand, un(t)dt = 1 for all n so limn→∞

un(t)dt = 1.

3/41/4 3/8

1/16 3/16

1/80

38



Exercise 4.17:

(a) Since u(t) is real valued,

u(t)dt

=

u+(t)dt −

u−(t)dt

≤ u+(t)dt + u−(t)dt

=

u+(t) dt +

u−(t) dt

=

u+(t)dt +

u−(t)dt =

|u(t)|dt.

(b) As in the hint we select α such that α

u(t)dt is non-negative and real and |α| = 1. Nowlet αu(t) = v(t) + jw(t) where v(t) and w(t) are the real and imaginary part of αu(t). Sinceα

u(t)dt is real, we have

w(t)dt = 0 and α

u(t)dt =

v(t)dt. Note also that |v(t)| ≤ |αu(t)|.Hence u(t)dt

=α u(t)dt

= v(t)dt

≤

|v(t)| dt (part a)

≤ |αu(t)| dt

=

|α| |u(t)| dt

=

|u(t)| dt.

Exercise 4.18:

(a) The meaning of u(t) = v(t) a.e. is that µ{t : |u(t) − v(t)| > 0} = 0. It follows that |u(t) − v(t)|2dt = 0. Thus u(t) and v(t) are L2 equivalent.

(b) If u(t) and v(t) are L2 equivalent, then |u(t)−v(t)|2dt = 0. Now suppose that µ{t : |u(t)−

v(t)|2 > } is non-zero for some > 0. Then |u(t) − v(t)|2dt ≥ µ{t : |u(t) − v(t)|2 > } > 0

which contradicts the assumption that u(t) and v(t) are L2 equivalent.

(c) The set {t : |u(t) − v(t)| > 0} can be expressed as

{t : |u(t) − v(t)| > 0} = n≥1{t : |u(t) − v(t)| > 1/n}.

Since each term on the right has zero measure, the countable union also has zero measure. Thus{t : |u(t) − v(t)| > 0} has zero measure and u(t) = v(t) a.e.

39



Exercise 4.19:

(a) Multiplying both sides of u(t) = ∞

j=1 u jθk(t) by θ∗k(t) and integrating with respect to t, weget

u(t)θ∗k(t)dt = ∞

j=1

u jθ j(t)θ∗k(t)dt

=∞

j=1

u j

θ j (t)θ∗k(t)dt

= ukAk.

Thus,

uk = 1

Ak

u(t)θ∗k(t)dt.

When expressing a function u(t) as an orthogonal expansion, the coefficient associated witha particular orthogonal waveform is found by taking the projection of the function onto the

normalized basis. (This is just like determining a coefficient of a particular frequency of aFourier Series expansion.)

(b) ∞−∞

|u(t)|2dt =

∞−∞

u(t)u∗(t)dt =

∞−∞

∞k=1

ukθk(t)

∞ j=1

u∗ j θ∗ j (t)

dt

=∞

k=1

∞ j=1

uku∗ j

∞−∞

θk(t)θ∗ j (t) dt

=∞

k=1

|uk|2Ak.

(c) ∞−∞

u(t)v∗(t) =

∞−∞

∞k=1

ukθk(t)

∞ j=1

v∗ j θ∗ j (t)

dt

=∞

k=1

∞ j=1

ukv∗ j

∞−∞

θk(t)θ∗ j (t) dt

=

∞

k=1

ukv∗kAk.

Exercise 4.21:

(a) By expanding the magnitude squared within the given integral as a product of the functionand its complex conjugate, we get u(t) −

nm=−n

k=−

uk,mθk,m(t)2 dt =

|u(t)|2 dt −

nm=−n

k=−

T |uk,m|2. (17)

40



Since each increase in n (or similarly in ) subtracts additional non-negative terms, the givenintegral is non-increasing in n and .

(b) and (c) The set of terms T |uk,m|2 for k ∈ Z and m ∈ Z is a countable set of non-negativeterms with a sum bounded by

|u(t)|2 dt, which is finite since u(t) is L2. Thus, using the resultof Exercise 4.4, the sum over this set of terms is independent of the ordering of the summation.

Any scheme for increasing n and in any order relative to each other in (17) is just an exampleof this more general ordering and must converge to the same quantity.

Since um(t) = u(t)rect(t/T − m) satisfies |um(t)|2 dt = T

k |uk,m|2 by Theorem 4.4.1 of the

text, it is clear that the limit of (17) as n, → ∞ is 0, so the limit is the same for any ordering.

There is a subtlety above which is important to understand, but not so important as far asdeveloping the notation to avoid the subtlety. The easiest way to understand (17) is by under-standing that

|um(t)|2 dt = T

k |uk,m|2, which suggests taking the limit k → ±∞ for eachvalue of m in (17). This does not correspond to a countable ordering of (k, m). This can bestraightened out with epsilons and deltas, but is better left to the imagination of the reader.

Exercise 4.22:

(a) First note that:

nm=−n

um(t) =

0 |t| > (n + 1/2)T

2u(t) t = (m + 1/2)T , |m| < n

u(t) otherwise.

u(t) −

nm=−n

um(t)

2

dt =

(−n−1/2)T

−∞|u(t)|2 +

∞(n+1/2)T

|u(t)|2 dt.

By the definition of an L2 function over an infinite time interval, each of the integrals on theright approach 0 with increasing n.

(b) Let um(t) =

k=− uk,mθk,m(t). Note that

nm=−n u

m(t) = 0 for |t| > (n + 1/2)T . We cannow write the given integral as:

|t|>(n+1/2)T

|u(t)|2 dt +

(n+1/2)T

−(n+1/2)T

u(t) −n

m=−n

um(t)

2

dt. (18)

As in part (a), the first integral vanishes as n → ∞.

(c) Since uk,m are the Fourier series coefficients of um(t) we know um(t) = l.i.m→∞um(t). Hence,

for each n, the second integral goes to zero as → ∞. Thus, for any > 0, we can choose n sothat the first term is less than /2 and then choose large enough that the second term is lessthan /2. Thus the limit of (18) as n, → ∞ is 0.

41



Exercise 4.23: The Fourier transform of the LHS of (4.40) is a function of t, so its Fouriertransform is

F (u(t) ∗ v(t)) =

∞−∞

∞−∞

u(τ )v(t − τ )dτ

e−2πiftdt

= ∞−∞ u(τ )

∞

−∞ v(t − τ )e−2πift

dt dτ

=

∞−∞

u(τ )

∞−∞

v(r)e−2πif (t+r) dr

dτ

=

∞−∞

u(τ )e−2πifτ dτ

∞−∞

v(r)e−2πifrdr

= u(f )v(f ).

Exercise 4.24:

(a)

|t|>T

u(t)e−2πift − u(t)e−2πi(f −δ)t dt = |t|>T

u(t)e−2πift 1 − e2πiδt dt

=

|t|>T

|u(t)|1 − e2πiδt

dt

≤ 2

|t|>T

|u(t)| dt for all f > 0, δ > 0.

Since u(t) is L1, ∞−∞ |u(t)| dt is finite. Thus, for T large enough, we can make

|t|>T |u(t)| dt

as small as we wish. In particular, we can let T be sufficiently large that 2 |t|>T |u(t)| dt is less

than /2. The result follows.

(b) For all f , |t|≤T

u(t)e−2πift − u(t)e−2πi(f −δ)t dt =

|t|≤T

|u(t)|1 − e2πiδt

dt.

For the T selected in part a), we can make1 − e2πiδt

arbitrarily small for all |t| ≤ T by choosingδ to be small enough. Also, since u(t) is L1,

|t|≤T |u(t)| dt is finite. Thus, by choosing δ small

enough, we can make |t|≤T |u(t)| 1 − e2πiδt

dt < /2.

Exercise 4.25:

(a)

xA(f ) = ∞−∞

xA(t)e−2πiftdt

=

|t|>A

u(t)e−2πiftdt

=

∞−∞

u(t)e−2πiftdt − |t|≤A

u(t)e−2πiftdt

= u(f ) − uA(f ).

42



(b) ∞−∞ |xA(t)|2dt =

|t|>A |u(t)|2dt. Since

∞−∞ |u(t)|2dt is defined as limA→∞

A−A |u(t)|2 dt and

this limit exists and is finite, |t|>A |u(t)|2dt must approach 0 as A → ∞.

(c)

∞

−∞ |u(f )

− uA(f )

|2df =

∞

−∞ |xA(f )

|2df using part a)

=

∞−∞

|xA(t)|2dt by the energy equation.

In part b), we explained that limA→∞ ∞−∞ |xA(t)|2dt = 0. Thus, l imA→∞

∞−∞ |u(f ) −

uA(f )|2dt = 0.

Exercise 4.26: Exercise 4.11 shows that the sum of two measurable functions is measurable,so the question concerns the energy in au(t) + bv(t). Note that for each t, |au(t) + bv(t)|2 ≤2|a|2|u(t)|2 + 2|b|2|v(t)|2. Thus since

|u(t)|2 dt < ∞ and

|v(t)|2 dt < ∞, it follows that

|au(t) + bv(t)|2 dt <

∞.

If {t : u(t) ≤ β } is a union of disjoint intervals, then {t : u(t − T ) ≤ β } is that same union of intervals each shifted to the left by T , and therefore it has the same measure. In the generalcase, any cover of {t : u(t) ≤ β }, if shifted to the left by T , is a cover of {t : u(t−T ) ≤ β }. Thus,for all β , µ{t : u(t) ≤ β } = µ{t : u(t − T ) ≤ β }. Similarly if {t : u(t) ≤ β } is a union of intervals,then {t : u(t/T ) ≤ β } is that same set of intervals expanded by a factor of T . This generalizesto arbitrary measurable sets as before. Thus µ{t : u(t) ≤ β } = (1/T )µ{t : u(t/T ) ≤ β }.

Exercise 4.27:

(a) The Fourier series expansion does not converge pointwise, as shown in Example 4.2.1 of thetext. It does converge in

L2, as stated in theorem 4.4.1.

(b) The Fourier transform does exist for all f (for an L2 time-limited function). There areseveral ways to see this. First, u(t) must be L1 since it is both L2 and time limited, and thusthe Fourier transform exists pointwise. Second, after doing part (c), time/frequency duality canbe used to recognize that the sampling theorem applies to u(f ). Thus u(f ) must exist pointwise.Third, use part 1 of Plancherel. uA(f ) exists everywhere for each A, but uA(f ) = u(f ) for allA ≥ T /2 since u is time limited to [−T /2, T /2]. Since uA(f ) exists pointwise, u(f ) does also.

(c) From the equation for uk and the equation for u(f ), we see that ukT = u(k/T ). Thus

u(f ) =

k

u(k/T )sinc(f T − k).

We see that this is the time/frequency dual of the sampling theorem. Thus, using that theoremhere, we see that the sum converges for all f (and also that u(f ) is both continuous and bounded).

43



Exercise 4.29: The statement of the exercise contains a misprint — the transform u(f ) islimited to |f ≤ 1/2 (thus making the sampling theorem applicable) rather than the functionbeing time-limited. For the given sampling coefficients, we have

u(t) = k

u(k)sinc(t − k) =n

k=−n

(−1)ksinc(t − k)

u(n + 1

2) =

nk=−n

(−1)ksinc(n + 1

2 − k) =

nk=−n

(−1)k(−1)n−k

π[n − k + 12 ]

. (19)

Since n is even, (−1)k(−1)n−k = (−1)n = 1. Substituting j for n − k, we then have

u(n + 1

2) =

2nk=0

1

π(k + 12 )

. (20)

The approximation

m2k=m1

1k+1/2 ≈ ln m2+1

m1comes from approximating the sum by an integral

and is quite accurate for m1 >> 0 To apply this approximation to (20), we must at least omitthe term k = 0 and this gives us the approximation

u(n + 1

2) ≈ 2

π +

1

π ln(2n + 1).

This goes to infinity logarithmically in n as n → ∞. The approximation can be improvedby removing the first few terms from (20) before applying the approximation, but the termln(2n + 1) remains.

We can evaluate u(n+m+ 12 ) and u(n−m−1

2 ) by the same procedure as in (19). In particular,

u(n+m+1

2

) =n

k=−n

(

−1)ksinc(n+m+

1

2−k)

=n

k=−n

(−1)k(−1)n+m−k

π[n+m−k + 12 ]

=2n+m j=m

(−1)n+m

π[ j + 12 ]

.

.

u(n−m−1

2) =

nk=−n

(−1)k(−1)n−m−k

π[n−m−k − 12 ]

=2n−m

j=−m

(−1)n−m

π[ j − 12 ]

.

Taking magnitudes,

u(n+m+ 12

) =2n+m j=m

1π[ j + 1

2 ]; u(n−m−1

2) =

2n

−m

j=−m

1π[ j − 1

2 ].

All terms in the first expression above are positive, whereas those in the second expression arenegative for j ≤ 0. We break this second expression into positive and negative terms:u(n−m−1

2)

=0

j=−m

1

π[ j − 12 ]

+2n−m j=1

1

π[ j − 12 ]

=0

k=−m

1

π[k − 12 ]

+2n−m−1

j=0

1

π[ j + 12 ]

.

44



For each j, 0 ≤ j ≤ m, the term in the second sum above is the negative of the term in the firstsum with j = −k. Cancelling these terms out,

u(n−m−1

2)

=2n−m−1 j=m+1

1

π[ j + 12 ]

.

This is a sum of positive terms and is a subset of the positive terms in |u(n + m+ 12 |, establishing

that |u(n − m − 12 | ≤ |u(n + m + 1

2 )|. What is happening here is that for points inside [−n, n],the sinc functions from the samples on one side of the point cancel out the sinc functions fromthe samples on the other side.

The particular samples in this exercise have been chosen to illustrate that truncating the samplesof a bandlimited function and truncating the function can have very different effects. Herethe function with truncated samples oscillates wildly (at least logarithmically in n), with theoscillations larger outside of the interval than inside. Thus most of the energy in the functionresides outside of the region where the samples are nonzero.

Exercise 4.31:

(a) Note that g(t) = p2(t) where p(t) = sinc(W t). Thus g(f ) is the convolution of ˆ p(f ) withitself. Since ˆ p(f ) = 1

W rect( f W ), we can convolve graphically to get the triangle function below.

1W

−W W = 12T

g(f ) g(t)

1

1W

(b) Since u(t) = k u(kT )sinc(2W t − k), it follows that v(t) = k u(kT )sinc(2W t − k) ∗ g(t).

Letting h(t) = sinc(t/T ) ∗ g(t), we see that h(f ) = T rect(T f )g(f ). Since rect(T f ) = 1 over therange where g(f ) is non-zero, h(f ) = T g(f ). Thus h(t) = T g(t). It follows that

v(t) =

k

T u(kT )g(t − kT ). (21)

(c) Note that g(t) ≥ 0 for all t. This is the feature of g(t) that makes it useful in generatingamplitude limited pulses. Thus, since u(kT ) ≥ 0 for each k, each term in the sum is non-negative,and v(t) is non-negative.

(d) The obvious but incomplete way to see that

k sinc(t/T − k) = 1 is to observe that eachsample of the constant function 1 is 1, so this is just the sampling expansion of a constant.

Unfortunately, u(t) = 1 is not L2, so the sampling theorem does not apply. The problem is morethan nit-picking, since, for example, the sampling expansion of a sequence of alternating 1’s and-1’s does not converge (as can be seen from Exercise 4.29). The desired result follows here fromnoting that both the sampling expansion and the constant function 1 are periodic in T and bothare L2 over one period. Taking the Fourier series over a period establishes the equality.

(e) To evaluate

k g(t− kT ), consider (21) with each u(kT ) = 1. For this choice, it follows thatk g(t−kT ) = v(t)/T . To evaluate v(t) for this choice, note that u(t) = 1 and v(t) = u(t)∗g(t),

so that v(t) can be regarded as the output when the constant 1 is passed through the filter g(t).

45



The output is then constant also and equal to

g(t) dt = g(0) = 1W . Thus

k g(t − kT ) =

1/T W = 2.

(f) Note that v(t) =

k u(kT )T g(t − kT ) is non-decreasing, for each t, in each sample u(kT ).Thus v(t) ≤

k T g(t − kT ), which as we have seen is simply 2T.

(h) Since g is real and non-negative and each

|u(kT )

| ≤1,

|v(t)| ≤

k

|u(kT )|T g(t − kT ) ≤ 2T for all t.

We will find in Chapter 6 that g(t) is not a very good modulation waveform at a sampleseparation T , but it could be used at a sample separation 2T .

Exercise 4.32:

(a) From property 4, p(t − 1) = 1 − p(t) for t = 1/2, and from the 0/1 property this meansthat either p(t) or p(t − 1) is 1, but the other must be zero. Thus p(t) p(t−1) = 0. Thus

p(t) p(t − 1) dt = 0, demonstrating orthogonality.

(b) For |k| > 1, the pulses are non-overlapping. Thus p(t) p(t − k) = 0 for all t. As seen in part(a) p(t) p(t − k) = 0 for k = 1 for all t = 1/2; the same holds for k = −1 for all t = −1/2. Thus

p(t) p(t − k) dt = 0.

(c) As seen in part (b), p(t) p(t − k) = 0 for all t and k (other than t = ±1/2 for k = ±1). Thus p(t) p∗(t−k)e−2πimt = 0 a.e, and orthogonality again holds.

(d) Using the hint,

p(t)e−2πimt + p(t−1)e−2πim(t−1) = [ p(t) + p(t−1)]e−2πimt = e−2πimt

for 0 ≤ t ≤ 1, t = 1/2. Thus

1

−1 p

2

(t)e−2πimt

dt = 1

−1 p(t)e−

2πimt

dt = 1

0 p(t)e−

2πimt

dt + 0

−1 p(t)e−

2πimt

dt

=

1

0[ p(t)+ p(t − 1)]e−2πimt dt =

1

0e−2πimt dt = 0.

(e) From Parseval, if p(t) and q (t) are orthogonal, then ˆ p(f ) and q (f ) are also orthogonal. TheFourier transform of q (t) = p(t−k)e2πimt is q (f ) = ˆ p(f +m)e−2πikf Thus, ˆ p(f ) is orthogonal toˆ p(f +m)e−2πikf for all m, k other than m = k = 0. Thus, substituting f for t, k for −m and mfor −k gives the desired result.

Exercise 4.33: Consider the sequence of functions vm(t) = rect(t − m) for m ∈ Z+, i.e., time

spaced rectangular pulses. For every t, limm→∞ rect(t − m) = 0 so this sequence convergespointwise to 0. However, |(rect(t − m) − rect(t − n)|2 dt = 2 for all n = m, so L2 convergence

is impossible.

Exercise 4.34: With T = 12W ,

s(f ) =

m

u(f + m

T )rect(f T ) =

1m=−1

u(f + 2W m)rect( f

2W ),

46



where the sum has only 3 terms because u(f ) is 0 for |f | > 3W .

Using the sampling theorem, s(kT ) = v0(kT ) for all k if and only if s(f ) = v0(f ). This meansthat we are looking for an example where

u(f

−2W )rect(

f

2W

) + u(f + 2W )rect( f

2W

) = 0.

One simple possibility is to choose u(f ) = rect( f 2W − 1) − rect( f

2W + 1). In this case, u(t) =2i sin(4πW t)sinc(2W t).

Exercise 4.36:

(a)

|u(t) − v(n)(t)| =

u(t) −

nm=−n

vm(t)

= u(f )e

2πift

df − |f |< 1

T (n+1/2) u(f )e

2πift

df =

|f |≥ 1

T (n+1/2)

u(f )e2πiftdf

≤

|f |≥ 1

T (n+1/2)

|u(f )|df. (22)

Since u(f ) is L1, limn→∞ |f |≥ 1

T (n+1/2) |u(f )|df = 0. From (22), limn→∞ |u(t) − v(n)(t)| = 0 and

u(t) = limn→∞ v(n)(t) for all t.

(b) From (a), |u(t) − v(n)(t)| ≤ |f |≥ 1T

(n+1/2) |u(f )|df . Since u(f ) is L1, the right hand side can

be made arbitrarily small by choosing n large enough. For any > 0, we can choose n sufficientlylarge that

|u(t) − v(n)(t)| ≤ /2 for all t. (23)

Next, note that vn(t) = n

m=n vm(t) and that this sum contains 2n + 1 terms. For each termm, we can choose k0(m) so that, for all t

|vm(t) −k0(m)

k=−k0(m)

vm(kT )sinc( t

T − k)e2πimt/T | ≤

4n + 2. (24)

This is proven for m = 0 in appendix 2 of Chapter 5. The proof is essentially the same forarbitrary m. Now let k0 = max k0(m) over −n ≤ m ≤ n and combine (24) over m to getvn(t) −n

m=−n

k0k=−k0

vm(kT )sinc( t

T − k)e2πimt/T

≤

2.

Combining this with (22) completes the demonstration of pointwise convergence for all t.

47



Exercise 4.37:

(a) |s(f )| df =

|

m

u(f + m

T )rect(f T )| df ≤

m

|u(f + m

T )rect(f T )| df =

|u(f )| df,

which shows that s(f ) is L1 if u(f ) is.(b) The following sketch makes it clear that u(f ) is L1 and L2. In particular,

|u(f )| df =

|u(f )|2 df = 2

k≥1

1

k2 < ∞.

10 2

1

u(f )

1/4 1/9

s(f )0 1/2

2

4

6

It can be seen from the sketch of s(f ) that s(f ) = 2 from 18 to 1

2 and from −12 to −1

8 , which isa set of measure 3/4. In general, for arbitrary integer k > 0, it can be seen that s(f ) = 2k from

12(k+1)2 to 1

2k2 and from − 12k2 to − 1

2(k+1)2 . Thus s(f ) = 2k over a set of measure 2k+1k2(k+1)2 . It

follows that |s(f )|2 df = lim

n→∞

nk=1

(2k)2 2k + 1

k2(k + 1)2 = lim

n→∞

nk=1

4(2k + 1)

(k + 1)2

≥ limn→∞

nk=1

4(k + 1)

(k + 1)2 =

nk=1

4

k + 1 = ∞.

(c) Note that u(f ) = 1 for every positive integer value of f , and thus (for positive ) u(f )f 1+

approaches ∞. It is 0 for other arbitrarily large values of f , and thus no limit exists.

Exercise 4.38: ∞−∞

|u(t)|2dt = 2

1 +

1

22 +

1

32 + ...

.

This sum is finite so u(t) is L2. Now we’ll show that

s(t) =

k

u(k)sinc(t − k) =

k

sinc(t − k)

is neither L1 nor L2. Taking the Fourier Transform of s(t),

s(f ) = k

rect(f )e−2πifk = rect(f )k

e−2πifk.

To show that s(t) is not L1, ∞−∞

|s(t)|dt =

∞−∞

s(t)dt since s(t) ≥ 0 for all t

= s(0) =

k

1 = ∞.

48



To show that s(t) is not L2, ∞−∞

|s(t)|2dt =

∞−∞

|

k

sinc(t − k)|2df = ∞.

Since u(k) is equal to 1 for every integer k, k u2

(k) = ∞. The sampling theorem energyequation does not apply here |u(t)|2dt = T

k |u(kT )|2

because u(f ) is not band-limited.

49




Exercise 5.1: The first algorithm starts with a set of vectors S = {v1, . . . , vm} that spanV but are dependent. A vector vk ∈ S is selected that is a linear combination of the othervectors in S . vk is removed from S , forming a reduced set S . Now S still spans V since eachv ∈ V is a linear combination of vectors in S , and vk in that expansion can be replaced by itsrepresentation using the other vectors. If S is independent, we are done, and if not, the previousstep is repeated with S replacing S . Since the size of S is reduced by 1 on each such step, thealgorithm terminates with an independent spanning set, i.e., a basis.

The second algorithm starts with an independent set S = {v1, . . . , vm} of vectors that donot span the space. An arbitrary nonzero vector vm+1 ∈ V is then selected that is not alinear combination of S (this is possible since S does not span V ). It can be seen that S ={v1, . . . , vm+1} is an independent set. If S spans V , we are done, and if not, the previous step isrepeated with S replacing S . With each repetition of this step, the independent set is increasedby 1 vector until it eventually spans V .It is not immediately clear that the second algorithm ever terminates. To prove this and alsoprove that all bases of a finite dimensional vector space have the same number of elements, wedescribe a third algorithm. Let S ind = v1, . . . , vm be an arbitrary set of independent vectors andlet S s p = {u1, . . . , un} be a finite spanning set for V (which must exist by the finite dimensionalassumption). Then, for k = 1, . . . m, successively add vk to

S sp and remove one of the original

vectors u j of S sp so that the remaining set, say S sp is still a spanning set. This is always possiblesince the added element must be a linear combination of a spanning set, so the augmented set islinearly dependent. One of the original elements of S sp can be removed (while maintaining thespanning property) since the newly added vector is not a linear combination of the previouslyadded vectors. A contradiction occurs if m > n, i.e., if the independent set is larger than thespanning set, since no more than the n original vectors in the spanning set can be removed.

We have just shown that every spanning set contains at least as many members as any inde-pendent set. Since every basis is both a spanning set and an independent set, this means thatevery basis contains the same number of elements, say b. Since every independent set containsat most b elements, algorithm 2 must terminate as a basis when S reaches b vectors.

Exercise 5.2: The vector 0 can be added to any spanning set and the augmented set is still aspanning set (in fact any vector can be added to a spanning set and the augmented set is stillspanning). If a set S = {v1, . . . , vn} of vectors contains 0 (say v1 = 0) then

j α j v j = 0 if

α1 = 0 and α j = 0 for j > 1. Thus S is a dependent set of vectors.

50



Exercise 5.3: Let the n vectors that uniquely span a vector space V be called v1, v2, . . . , vn.We will prove that the n vectors are linearly independent using proof by contradiction. Assumev1, v2, . . . , vn are linearly dependent. Then

n j=1 α j v j = 0 for some set of scalars α1, α2,..,αn

where not all the α js equal zero. Say αk = 0. We can express vk as a linear combination of theother n − 1 vectors {v j} j=k:

vk = j=k

−α j

αk v j .

Thus vk has two representations in terms of {v1, . . . , vn}. One is that above, and the otheris vk =

j β j v j where β k = 1 and β j = 0 for j = k. Thus the representation is non-unique,

demonstrating the contradiction.

It follows that if n vectors uniquely span a vector space, they are also independent and thusform a basis. From Theorem 5.1.1, the dimension of V is then n.

Exercise 5.4:

(a)

v, u = v1u1 + v2u2 = u1v1 + u2v2 = u, v Hermitian symmetry.

αv + β u, w = (αv1 + βu1)w1 + (αv2 + βu2)w2

= αv1w1 + αv2w2 + βu1w1 + βu2w2

= αv, w + β u, w Hermitian bilinearity.

v, v = v21 + v2

2 > 0 if and only if v = 0.

(b) The length of v = {v1, v2} in the Euclidian plane is

v21 + v2

2 = v, v = v.

(c) The distance from v to u is simply the length of the vector v

−u, which from part (b) is

v − u.

(d) This is the standard plane geometry result for the cosine.

(e) v, u = v1u1 + 2v2u2 still satisfies the axioms for an inner product space, as can be verifiedimmediately by the same argument as in part (a). The corresponding length in this innerproduct space, however, is

v2

1 + 2v22, which is different than the conventional length of plane

geometry. Usually when a vector space is called R2, the conventional inner product space isintended. Exercise 5.5 explores these ‘unconventional’ inner product spaces in more depth.

Exercise 5.5: First we check the Hermitian symmetry.

v, u = j c jv j u∗ j

u, v∗ =

j

c ju jv∗ j

=

j

c∗ j v ju∗ j .

If any c j , say ck is non-real, then v, u = u, v∗ if, for example, v and u are each equal to theunit vector ek . Thus Cn is not an inner product space for parts (d) and (e) (unless the complexcoefficients c1, . . . cn are all real).

51



Next suppose ck = 0 for some component k. Then ek, ek = 0, violating the strict positivityaxiom. Thus Cn is not a product space if any ck = 0. This means that the condition in part (c)does not guarantee that Cn is a product space.

If the c j are all real and non-zero, then both Hermitian symmetry and strict positivity aresatisified as shown above. The bilinearity follows by inspection. Thus Cn is a product space

under the conditions in parts (a) and (b).

Exercise 5.6:

v + u2 = v + u, v + u= v, v + u + u, v + u by axiom (b)

= v, v + v, u + u, v + u, u by axiom (b)

≤ |v, v > | + |v, u| + |u, v| + |u, u|≤ v2 + vu + uv + u2 = (v + v)2.

So

v + u ≤

v

+

u

.

Exercise 5.8:

(a) By direct substitution of u(t) =

k,m uk,mθk,m(t) and v∗(t) =

k,m v∗k,mθ∗k,m(t) into theinner product definition

u, v =

∞−∞

u(t)v∗(t) dt

=

∞−∞

k,m

uk,mθk,m(t)

k,m

v∗k,mθ∗k,m(t) dt

= k,m uk,m

k,m v∗k

,m ∞−∞ θk,m(t)θ∗k

,m

(t) dt

= T k,m

uk,mv∗k,m.

(b) For any real numbers a and b, 0 ≤ (a + b)2 = a2 + 2ab + b2. It follows that ab ≤ 12 a2 + 1

2 b2.Applying this to |uk,m| and |vk,m|, we see that

|uk,mv∗k,m| = |uk,m| |v∗k,m| ≤ 1

2|uk,m|2 +

1

2|vk,m|2.

Thus, using part (a),

|u , v | ≤ T k,m

|uk,mv∗k,m| ≤ T 2k,m

|uk,m|2 + T 2k,m

|vk,m|2.

Since u and v are L2, the latter sums above are finite, so |u ,v | is also finite.

(c) It is necessary for inner products in an inner-product space to be finite since, by definitionof a complex inner-product space, the inner product must be a complex number, and the setof complex numbers (just like the set of real numbers) does not include ∞. This seems like atechnicality, but it is central to the special properties held by finite energy functions.

52



Exercise 5.9:

(a) For V to be a vector subspace, it is necessary for v = 0 to be an element of V , and thisis only possible in the special case where u1 = u2. Even in this case, however, V is not avector space. This will be shown at the end of part (b). It will be seen in studying detection inChapter 8 that V is an important set of vectors, subspace or not.

(b) V can be rewritten as V = {v : v − u12 = v − u22}. Expanding these energy differencesfor k = 1, 2,

v − uk2 = v2 − v, uk − uk, v + uk2

= v2 + uk2 − 2(v, uk).

It follows that v ∈ V if and only if

v2 + u12 − 2(v, u1) = v2 + u22 − 2(v, u2).

Rearranging terms, v ∈ V if and only if

(v, u2 − u1) = u22 − u12

2 . (25)

Now to complete part (a), assume u22 = u12 (which is necessary for V to be a vector space)and assume u1 = u2 to avoid the trivial case where V is all of L2. Now let v = i(u2 − u1).Thus v, u2 − u1 is pure imaginary so that v ∈ V . But iv is not in V since iv, u2 − u1 =−u2 − u12 = 0. In a vector subspace, multiplication by a scalar (in this case i) yields anotherelement of the subspace, so V is not a subspace except in the trivial case where u1 = u2¿

(c) Substituting (u1 + u2)/2 for v, we see that v − u1 = (u2 − u2) and v − u2 =(−u2 + u2), so v − u1 = v − u2 and consequently v ∈ V .(d) The geometric situation is more clear if the underlying class of functions is the class of real

L2 functions. In that case V is a subspace whenever u1 = u2. If u1 = u2, then V is ahyperplane . In general, a hyperplane H is defined in terms of a vector u and a subspace S asH = {v : v = u+s for some s ∈ S}. In R2 a hyperplane is a straight line, not necessarily throughthe origin, and in R3, a hyperplane is either a plane or a line, neither necessarily including theorigin. For complex L2, V is not a hyperplane. Part of the reason for this exercise is to seethat real L2 and complex L2, while similar in may aspects, are very different in other aspects,especially those involving vector subspaces.

Exercise 5.12:

(a) To show that S ⊥ is a subspace of V , we need to show that for any v 1, v 2 ∈ S ⊥, αv 1+β v 2 ∈ S ⊥for all scalars α, β . If v 1, v 2

∈ S ⊥, then for all w

∈ S ,

αv 1 + β v 2,w

= αv 1,w

+ β

v 2,w

=

0 + 0. Thus αv 1 + β v 2 ∈ S ⊥ and S ⊥ is a subspace of V .(b) By the Projection Theorem, for any u ∈ V , there is a unique vector u |S ∈ S such thatu − u |S , s = 0 for all s ∈ S . So u ⊥S = u − u |S ∈ S ⊥S and we have a unique decompositionof u into u = u |S + u ⊥S .

(c) Let V and S (where S < V ) denote the dimensions of V and S respectively. Start with aset of V independent vectors s 1, s2 · · · sV ∈ V . This set is chosen so that the first S of these i.e.s1, s2 · · · sS are in S . The first S orthonormal vectors obtained by Gram-Schmidt procedure

53



will be a basis for S . The next V − S orthonormal vectors obtained by the procedure will be abasis for S ⊥.

Exercise 5.13: The function sinc(3t/2 has bandwidth 1/3, so it is also lowpass constrained to1/2. It can thus be represented by the sampling theorem at bandwidth 1/2.

sinc(3t/2) =

n

sinc(3n/2)sinc(t − n)

Exercise 5.14:

(a) Assume throughout part (a) that m, n are positive integers, m > n. We will show, as case1, that if the left end, am, of the pulse gm(t) satisfies am < an, then am + 2−m−1 < an, i.e.,the pulses do not overlap at all. As case 2, we will show that if am ∈ (an, , an + 2−n−1), thenam + 2−m−1 ∈ [an, an + 2−n−1], i.e., the pulses overlap completely. This shows that all pulsesoverlap either completely or not at all.

Case 1: Let dm be the denominator of the rational number am (in reduced form). Thus (sinceandn and amdm are integers), it follows that if am < an, then also am + 1

dndm≤ an. Since

dn ≤ dm ≤ m for m ≥ 3, we have am + 1m2 ≤ an for m ≥ 3. Since 1/m2 > 2−m−1 for m ≥ 3,

it follows that am + 2−m−1 < an. Thus, if am < an, gm and gn do not overlap for any m > 3.Since g2 does not overlap g1 by inspection, there can be no partial overlap for any am < an.

Case 2: Apologies! This is very tedious. Assume that am ∈ (an, an + 2−n−1). By the sameargument as above,

am ≥ an + 1

dndmand am +

1

dmdn≤ an + 2−n−1 (26)

where dn is the denominator of an + 2−n

−1

. Combining these inequalities,1

dndm< 2−n−1. (27)

We now separate case 2 into three subcases. First, from inspection of Figure 5.3 in the text,there are no partial overlaps for m < 8. Next consider m ≥ 8 and n ≤ 4. From the right side of (26), there can be no partial overlap if

2−m−1 ≤ 1

dmdncondition for no partialoverlap. (28)

From direct evaluation, we see that dn ≤ 48 for n ≤ 4. Now dm2−m−1 is 5/512 for m = 8 and

is decreasing for m ≥ 8. Since 5/512 < 1/48, there is no partial overlap for n ≤ 4, m ≥ 8.Next we consider the general case where n ≥ 5. From (27), we now derive a general conditionon how small m can be for m, n pairs that satisfy the conditions of case 2. Since m ≥ dm form ≥ 3, we have

m > 2n+1

dn(29)

For n ≥ 5, 2n+1/dn ≥ 2n + 2, so the general case reduces to n ≥ 5 and m ≥ 2n + 2.

54



Next consider the condition for no partial overlap in (28). Since dn ≤ 2n+1dn ≤ 2n+1n anddm ≤ m, the following condition also implies no partial overlap:

m2−m−1 ≤ 2−n−1

n (30)

The left side of (30) is decreasing in m, so if we can establish (30) for m = 2n+2, it is establishedfor all m ≥ 2n + 2. The left side, for m = 2n + 2 is (2n + 2)2−2n−3. Thus all that remains is toshow that (2n + 2)n ≤ 2n+2. This, however, is obvious for n ≥ 5.

(b) First we clarify the definition of hn(t). Note that hn(t) = ±gn(t), using the plus sign if thereare an even number of pulses gi(t), i < t containing gn(t) and the minus sign otherwise. Frompart (a), each of these pulses either contains gn(t) or is disjoint from gn(t). This means thatn

i=1 hi(t) is 1 wherever n

i=1 gi(t) is odd and 0 when it is even. Thus n

i=1 hi(t) takes on onlythe values 1 and 0 over (0 < t < 1) and n ≥ 1.

(c) First note that for every rational number an, m

i=1 hi(an) is constant for m > n sincegm(t) cannot contain an for m > n because of the containment feature in part (a). Thuslimn→∞

ni=1 hi(t) has a limit for all rational t. Also, the measure of the set of points over

which mi=1 changes as m increases from n to ∞ is at most 2−n−1. This implies that the

measure of the set of points for which limn→∞n

i=1 hi(t) has no limit is zero. However, thisset of zero measure not only contains a countably infinite set of of points but an uncountablyinfinite set of points. The following ‘construction’ places a subset of these points at which nolimit exists into one to one correspondence with the set of (nonending) binary sequences.

Relabel g1(t) and g2(t) as f 0(t) and f 1(t). Let f 00(t) and f 01t be the first and second subsequentpulses gi(t) and g j(t) that are contained in f 0(t) and let f 10(t) and f 11(t) be the first and secondsubsequent pulses that are contained in f 1(t). In the same way for any binary n-tuple x , letf x 0(t) and f x 1(t) be the first and second subsequent pulses gi(t) and g j (t) contained in f x (t).This process defines a binary tree of disjoint intervals, with 2n disjoint intervals at level n. Each(non-ending) binary sequence then is a single point limit of the corresponding nested intervals.

Exercise 5.15: Using the same notation as in the proof of Theorem 4.5.1,

u(n)(t) =n

m=−n

nk=−n

uk,mθk,m(t) u(n)(f ) =n

m=−n

nk=−n

uk,mψk,m(f ).

Since ψk,m(f ) is the Fourier transform of θk,m(t) for each k, m, the coefficients uk,m are the samein each expansion. In the same way,

v(n)(t) =n

m=−n

nk=−n

vk,mθk,m(t) v(n)(f ) =n

m=−n

nk=−n

vk,mψk,m(f ).

It is elementary, using the orthonormality of the θk,m and the orthonormality of the ψk,m to seethat for all n > 0,

u (n),v (n) =n

m=−n

nk=−n

uk,mv∗k,m = u (n), v (n). (31)

Thus our problem is to show that this same relationship holds in the limit n → ∞. We know(from Theorem 4.5.1) that l.i.m.n→∞u (n) = u , with the corresponding limits for v (n), u (n), and

55



v (n). Using the Schwarz inequality on the second line below, and Bessel’s inequality on thethird,

|u (n), v − u (n),v (n)| = |u (n), v − v (n)|≤ u (n)v − v (n)≤ u v − v

(n)

.

Since limn→∞ v −v (n) = 0, we see that limn→∞ |u (n),v −u (n), v (n)| = 0. In the same way,limn→∞ |u (n),v −|u ,v | = 0. Combining these limits, and going through the same operationson the transform side,

limn→∞u (n), v (n) = u ,v lim

n→∞u (n), v (n) = u , v . (32)

Combining (31) and (32), we get Parsevals relation for L2 functions, u ,v = u , v .

Exercise 5.16:

(a) Colloquially, lim|f |→∞ u(f )|f |1+

= 0 means that u(f )||f |1+ becomes and stays increas-

ingly small as |f | becomes large. More technically, it means that for any δ > 0, there is an A(δ )such that

u(f )||f |1+ ≤ δ for all f such that |f | ≥ A(δ ). Choosing δ = 1 and A = A(1), we see

that |u(f )| ≤ |f |−1− for |f | ≥ A.

(b) ∞−∞

|u(f )| df =

|f |>A

|u(f )| df +

|f |≤A

|u(f )| df

≤ 2

∞A

f −1− df +

A

−A|u(f )| df

=

2A−

+ A

−A |u(f )| df.

Since u(f ) is L2, its truncated version to [−A, A] is also L1, so the second integral is finite,showing that u(f ) (untruncated) is also L1. In other words, one role of the above is to makeu(f ) decreases quickly enough with increasing f to maintain the L1 property.

(c) Recall that s(n)(f ) =

|m|≤n sm(f ) where sm(f ) = u(f − m)rect†(f ). Assuming A to be

integer and m > A, |sm(f )| ≤ (m − 1)−1−. Thus for f ∈ (−1/2, 1/2]

|s(n)(f )| ≤|m|≤A

u(f − m)

+

|m|>A

(|m| − 1)−1−

=

|m|≤A

u(f − m)

+

m≥A

2m−1−. (33)

The factor of 2 above was omitted by error from the exercise statement. Note that since the finalsum converges, this is independent of n and is thus an upper bound on |s(f )|. Now visualize the

2A + 1 terms in the first sum above as a vector, say a . Let→1 be the vector of 2A + 1 ones, so

56



that a ,→1 =

ak. Applying the Schwarz inequality to this, |k ak| ≤ a →1 . Substituting

this into (33),

|s(f )| ≤

(2A + 1)|m|≤A

|u(f + m)|2 +

m≥A

2m−1−. (34)

(d) Note that for any complex numbers a and b, |a + b|2 ≤ |a + b|2 + |a − b|2 = 2|a|2 + 2|b|2.Applying this to (34),

|s(f )|2 ≤ (4A + 2)|m|≤A

|u(f + m)|2 +

m≥A

2m−1−

2

.

Since s(f ) is nonzero only in [1/2, 1/2] we can demonstrate that s(f ) is L2 by showing that theintegral of |s(f )|2 over [−1/2, 1/2] is finite. The integral of the first term above is 4A + 2 timesthe integral of |u(f )|2 from −A − 1/2 to A + 1/2 and is finite since u(f ) is L2. The integral of the second term is simply the second term itself, which is finite.

57




Exercise 6.1: Let U k be be a standard M-PAM random variable where the M points each haveprobability 1/M . Consider the analogy with a uniform M -level quantizer used on a uniformlydistributed rv U over the interval [−M d/2,Md/2].

0

R1 R2 R3 R4 R5 R6

a1 a2 a3 a4 a5 a6

d

(M = 6)

.

Let Q be the quantization error for the quantizer and U k be the quantization point. ThusU = U k + Q. Observe that for each quantization point the quantization error is uniformlydistributed over [−d/2, d/2]. This means that Q is zero mean and statistically independent of the quantization point U k. It follows that

E[U 2] = E[(Q + U k)2 = E[U 2k ] + E[Q2] = E[U 2k ] + d2

12.

On the other hand, since U is uniformly distributed, E[U 2] = (dM )2/12. It then follows that

E[U 2k ] =

d2(M 2−

1)

12 .

Verifying the formula for M=4:

E S = 2

d2

2+

3d2

2

4 =

5

4d2

d2(M 2 − 1)

12 =

5

4d2.

Verifying the formula for M=8:

E S = 2d

22

+ 3d

2 2

+ 5d

2 2

+ 7d

2 2

8 = 21

4 d2

d2(M 2 − 1)

12 =

21

4 d2.

58



Exercise 6.2:

(a) We have T = .002, so the energy in the pulse p(t) is T = .002, and thus the energy in thewaveform is

.002[115(3d/2)2 + 130(d/2)2 + 120(−d/2)2 + 135(−3d/2)2] = 5d2/4.

(b) The bandwidth is 250 H.

(c)

E[

U 2(t) dt] = E[.002

500k=1

U 2k ] = E[U 2k ] = 5d2

4 .

The calculation of E[U 2k ] can be done directly or by the formula of Exercise 6.1. The fact thatthis is the same as part (a) is an accident arising from the fact that exactly half the symbols inpart (a) had magnitude d/2 and half had magnitude 3d/2.

(d) and (e) In this case, the probability of 00 is 0.45, that of 11 is 0.45 and the others are 0.05

each. Assuming that a1, . . . , a4 are mapped into −3d/2, −d/2, d/2, 3d/2 as in Figure 6.2, one of the more probable symbols is mapped into a low energy signal and one to a high energy signal,so

E[U 2k ] = E[

U 2(t) dt] =

5d2

4 ,

the same as in parts (a) and (c).

(f) Using the mapping (00 → a2), (01 → a1), (10 → a4), (11 → a3), the probable signals aremapped into the low energy signals, giving an expected energy of

0.9(d/2)2 + 0.1(3d/2)2 = 0.9d2.

The point of the problem is to observe that energy is saved by mapping more probable symbolsinto lower energy signals.

Exercise 6.3:

(a) Since the received signal is decoded to the closest PAM signal, the intervals decoded to eachsignal are indicated below.

0

R1 R2 R3 R4

a1 a2 a3 a4−3d/2 −d/2 d/2 3d/2

d

.

Thus if U k = a1 is transmitted, an error occurs if Z k ≥ d/2. The probability of this is Q(d/2)where

Q(x) =

∞x

1√ 2π

exp(−z2/2).

If U k = a2 is transmitted, an error occurs if either Z k ≥ d/2 or Z k < −d/2, so, using thesymmetry of the Gaussian density, the probability of an error in this case is 2 Q(d/2). In the

59



same way, the error probability is 2Q(d/2) for a3 and Q(d/2) for a4. Thus the overall errorprobability is (3/2)Q(d/2).

(b) Now suppose the third point is moved to d/2 + . This moves the decision boundary betweenR3 and R4 by /2 and similarly moves the decision boundary between R2 and R3 by /2. Theerror probability then becomes

P e() = 1

2

Q(d/2) + Q(

d +

2 ) + Q(

d −

2 )

.

dP (e)

d =

1

4

1√

2πexp(−(d + )2/2) − 1√

2πexp(−(d − )2/2)

.

This is equal to 0 at = 0, as can be seen by symmetry without actually taking the derivitive.

(c) With the third signal point at d/2 + , the signal energy is

E S = 1

4

d

22

+

d +

2 2

+ 2

3d

2 2

.

The derivitive of this with respect to is (d + )/8.

(d) This means that to first order in , the energy can be reduced by reducing a3 withoutchanging P e. Thus moving the two inner points slightly inward provides better energy efficiencyfor 4-PAM. This is quite counter-intuitive. The difference between optimizing the points in4-PAM and using standard PAM is not very significant, however. At 10 dB signal to noiseratio, the optimal placement of points (which requires considerably more computation) makesthe ratio of outer points to inner points 3.15 instead of 3, but it reduces error probability byless than 1%.

Exercise 6.4:

(a) If for each j , ∞−∞

u(t)d j (t) dt =

∞−∞

∞k=∞

uk p(t−kT )d j (t) dt

=∞

k=∞uk

∞−∞

p(t−kT )d j(t) dt = u j ,

then it must be that ∞−∞ p(t−kT )d j(t) dt = p(t − kT ), d j (t) has the value one for k = j and

the value zero for all k = j . That is, d j (t) must be orthogonal to p(t − kT ) for all k = j .

(b) Since

p(t

−kT ), d0(t)

= 1 for k = 0 and equals zero for k

= 0, it follows by shifting each

function by jT that p(t − (k − j)T ), d0(t) equals 1 for j = k and 0 for j = k . It follows thatd j(t) = d0(t − jT ).

(c) In this exercise, to avoid ISI (intersymbol interference), we pass u(t) through a bank of filters d0(−t), d1(−t) . . . d j(−t) . . . , and the output of each filter at time t = 0 is u0, u1 . . . u j . . .respectively. To see this, note that the output of the j -th filter in the filter bank is

r j(t) =∞

k=∞uk

∞−∞

p(τ −kT )d j (−t + τ ) dτ.

60



At time t = 0,

r j(0) =∞

k=∞uk

∞−∞

p(τ −kT )d j(τ ) dτ = u j .

Thus, for every j , to retrieve u j from u(t), we filter u(t) through d j(−t) and look at the outputat t = 0.

However, from part (b), since d j(t) = d0(t − jT ) (the j -th filter is just the first filter delayed by jT ). Rather than processing in parallel through a filter bank and looking at the value at t = 0,we can process serially by filtering u(t) through d0(−t) and looking at the output every T . Toverify this, note that the output after filtering u(t) through d0(−t) is

r(t) =∞

k=∞uk

∞−∞

p(τ −kT )d0(−t + τ ) dτ,

and so for every j,

r( jT ) =∞

k=∞

uk ∞

−∞

p(τ −kT )d0(τ − jT ) dτ = u j.

Filtering the received signal through d0(−t) and looking at the values at jT for every j is thesame operation as filtering the signal through q (t) and then sampling at jT . Thus, q (t) = d0(−t).

Exercise 6.5:

(a) g(0) = v(0)sinc(0) = 1 and g(kT ) = v(kT )sinc(kT ) = v(kT ) · 0 = 0 for all non-zero integersk. Thus, g(t) is ideal Nyquist with interval T.

(b) Multiplication in the time domain is equivalent to convolution in the frequency domain.

g(f ) = v(f ) ∗ T rect(f T ) = T ∞−∞ v(s)rect((f − s)T )ds.

(c) The main idea in this exercise is important when v(f ) is a narrowband waveform, so weillustrate the idea with the following example where the convolution can be done graphically byinspection.

−1−α

2T

−1+α2T

1−α2T

1+α

2T

T

g(f ) = v(f ) ∗ T rect(T f )

00

0

T rect(T f )

T /α

v(f )=(T /α)rect(Tf/α)

− α

2T

α

2T

Note that the bandwidth of v(f ) above is a fraction α of the bandwidth of T rect(T F ), and thatα is then the rolloff of g(f ).

More generally, if v(f ) is any function nonzero only in [−α/2T,α/2T ] where α < 1, then theleading edge of g(f ) (i.e., in the range [−(1 + α)/2T, −(1 − α)/2T ]) is given by

g(f ) = T

f +1/2T

−α/2T v(s) ds.

61



That is, g(f ) has a leading edge which is the integral f ∞ v(s) ds, shifted to the left by 1/2T . In

the same way, it has a trailing edge, in the range [(1 − α)/2T, (1 + α)/2T ], that satisfies

g(f ) = T − T

f −1/2T

−α/2T v(s) ds.

It can be seen (with some thought) that this satisfies the band edge symmetry version of theNyquist condition.

In a moment, we demonstrate a more equation-friendly way of demonstrating the Nyquist cri-terion for g(f ) that works whether or not v(t) is bandlimited. First, however, we discuss whatthe above result means. Choosing a good Nyquist pulse g(t) is a tradeoff between small rolloff α and rapid time decay. We have seen that the time decay can be controlled by the choiceof v(t), subject to a bandwidth constraint α < 1. We saw that g(f ) is a rectangular pulse inthe middle with leading and trailing edges equal to the integral of v(f ). Thus the derivitive of the leading edge of g(f ) determines v(f ), and the inverse transform determines the time decayof g(t). Focusing on v(f ) instead of g(f ) has the added advantage that v(f ) can be arbitrarysubject to the constraint α - it does not have to satisfy the Nyquist criterion itself.

We now give an alternate derivation that g(f ) satisfies the Nyquist criterion:

l.i.m.

m

g(f + m/T )rect(f T ) = T rect(f T ). (35)

An elegant way to visualize this sum is to convolve g(f ) with a sequence of T spaced unit impulses(which performs the summation) and to truncate the result (i.e., multiply by rect(f T )). Sinceconvolution is associative and commutative, we can first convolve the string of impulses withT rect(f T ) and convolve the result with v(f ). The convolution of T rect(f T ) with the impulsestring is simply the constant function T . Convolving this with v(f ) simply results in anotherconstant function equal to T

∞−∞ v(f ) df . Since v(0) = 1, that integral is T , and after truncation,

we get T rect(f T ).

(d) As we have already seen, if v(t) is limited to Bb, then g (t) is limited to Bb + 1/2T .

Exercise 6.6:

(a) g(t) must be ideal Nyquist, i.e., g(0) = 1 and g(kT ) = 0 for all non-zero integer k. Theexistence of the channel filter does not change the requirement for the overall cascade of filters.The Nyquist criterion is given in the previous problem as Eq. (35).

(b) It is possible, as shown below. There is no ISI if the Nyquist criterion

m g(f + 2m) = 12 for

|f | ≤ 1 is satisfied. Since g(f ) = ˆ p(f )h(f )q (f ), we know that g(f ) is zero where ever h(f ) = 0.In particular, g(f ) must be 0 for |f | > 5/4 (and thus for f ≥ 2). Thus we can use the band edgesymmetry condition, g(f ) + g(2

−f ) = 1/2 over 0

≤ f

≤ 1. Since g(f ) = 0 for 3/4 < f

≤ 1,

it is necessary that g(f ) = 1/2 for 1 < f ≤ 5/4. Similarly, since g(f ) = 0 for f > 5/4, wemust satisfy g(f ) = 1/2 for |f | < 3/4. Thus, to satisfy the Nyquist criterion, g(f ) is uniquelyspecified as below.

0 34

54

g(f )0.5

62



In the regions where g(f ) = 1/2, we must choose q (f ) = 1/[2ˆ p(f )h(f )]. Elsewhere g(f ) = 0because h(f ) = 0, and thus q (f ) is arbitrary. More specifically, we must choose q (f ) to satisfy

q (f ) =

0.5, |f | ≤ 0.5;1

3−2|f | , 0.5 < |f | ≤ 0.751

3−2|f |, 1

≤ |f | ≤

5/4

It makes no difference what q (f ) is elsewhere as it will be multiplied by zero there.

(c) Since h(f ) = 0 for f > 3/4, it is necessary that g(f ) = 0 for |f | > 3/4. Thus, for all integersm, g(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met.

(d) If for some frequency f , ˆ p(f )h(f ) = 0, it is possible for g(f ) to have an arbitrary value bychoosing q (f ) appropriately. On the other hand, if ˆ p(f )h(f ) = 0 for some f , then g(f ) = 0.Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T ), there is some integer m such thath(f +m/T )ˆ p(f +m/T ) = 0. Equivalently, it is necessary that

m h(f +m/T )ˆ p(f +m/T ) = 0 for

all f .

There is one peculiarity here that you were not expected to deal with. If ˆ p(f )h(f ) goes through

zero at f 0 with some given slope, and that is the only f that can be used to satisfy the Nyquistcriterion, then even if we ignore the point f 0, the response q (f ) would approach infinity fastenough in the vicinity of f 0 that q (f ) would not be L2.

This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), thereis no problem in choosing q (f ) to avoid intersymbol interference. Later, when noise is taken intoaccount, it will be seen that it is undesirable for q (f ) be very large where ˆ p(f ) is small, sincethis amplifies the noise in frequency regions where there is very little signal.

Exercise 6.7:

(a) From property 4 of the problem statement, p(t − 1) = 1 − p(t), and from the 0/1 property

this means that either p(t) or p(t − 1) is 1, but not both. Thus p(t) p(t−1) = 0. It then followsthat

p(t) p(t − 1) dt = 0, demonstrating orthogonality.

(b) For |k| > 1, the pulses are non-overlapping. Thus p(t) p(t − k) = 0 for all t. As seen in part(a) p(t) p(t − k) = 0 for k = 1 for all t = 1/2; the same holds for k = −1 for all t = −1/2. Thus

p(t) p(t − k) dt = 0.

(c) As seen in part (b), p(t) p(t − k) = 0 for all t and k (other than the isolated points t = ±1/2for k = ±1). Thus p(t) p∗(t−k)e−i2πmt = 0 a.e., and orthogonality again holds.

(d) Using the hint,

p(t)e−2πimt + p(t−1)e−2πim(t−1) = [ p(t) + p(t−1)]e−2πimt = e−2πimt

for 0 ≤ t ≤ 1, t = 1/2. Thus 1

−1 p2(t)e−2πimt dt =

1

0[ p(t)+ p(t − 1)]e−2πimt dt =

1

0e−2πimt dt = 0,

where we first used p2(t) = p(t) and then converted the integral from −1 to +1 into an integralfrom 0 to 1.

(e) Note that if p(t) and q (t) are orthogonal, then ˆ p(f ) and q (f ) are also orthogonal (thisfollows from Parseval, who says that

p(t)q ∗(t) dt =

ˆ p(f )q ∗(f ) df ). The Fourier transform of

63



q (t) = p(t−k)e2πimt is q (f ) = ˆ p(f +m)e−2πikf Thus, ˆ p(f ) is orthogonal to ˆ p(f +m)e−2πikf for allm, k other than m = k = 0. Thus, substituting f for t, k for −m and m for −k gives the desiredresult.

Exercise 6.8:

(a) With α = 1, the flat part of g(f ) disappears. Using T = 1 and using the familiar formulacos2 x = (1 + cos 2x)/2, g1(f ) becomes

g1(f ) = 1

2

1 + cos(

πf

2 )

rect(

f

2).

Writing cos x = (1/2)[eix + e−ix] and using the frequency shift rule for Fourier transforms, weget

g1(t) = sinc(2t) + 1

2sinc(2t + 1) +

1

2sinc(2t − 1)

=

sin(2πt)

2πt +

1

2

sin(π(2t + 1))

π(2t + 1) +

1

2

sin(π(2t

−1))

π(2t − 1)

= sin(2πt)

2πt − 1

2

sin(2πt)

π(2t + 1) − 1

2

sin(2πt)

π(2t − 1)

= sin(2πt)

2π

1

t − 1

2t + 1 − 1

2t − 1

=

sin(2πt)

2πt(1 − 4t2)

= sin(πt) cos(πt)

πt(1 − 4t2) =

sinc(t) cos(πt)

(1 − 4t2) .

This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5.The numerator is also 0, and it can be seen from the first equation above that the limiting valueas t

→ ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3, much faster than

sinc(t).

(b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transformof a real symmetric waveform gα(f ) that satisfies the Nyquist criterion for T = 1 and has arolloff of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform v(f )is given by the following:

v(f + 1/2) = dg(f )

df for

−(1 + α)

2 < f <

(1 − α)

2 .

That is, we take the derivitive of the leading edge of g(f ), from −(1 + α)/2 to −(1 − α)/2 andshift by 1/2 to get v(f ). Using the middle expression in (6.17) of the text, and using the fact

that cos

2

(x) = (1 + cos 2x)/2,

v(f + 1/2) = 1

2

d

df

1 + cos

π(−f − (1 − α)/2)

α

for f in the interval (−(1 + α)/2, −(1 − α)/2). Shifting by letting s = f + 12 ,

v(s) = 1

2

d

ds cos

−πs

α − π

2

=

1

2

d

ds sin

πs

α

=

π

2α cos

πs

α

64



for s ∈ (−α/2, α/2). Multiplying this by rect(s/α) gives us an expression for v(s) everywhere.Using cos x = 1

2 (eix + e−ix) allows us to take the inverse transform of v(s), getting

v(t) = π

4 [sinc(αt + 1/2) + sinc(αt − 1/2)]

= π

4 sin(παt + π/2)

παt + π/2 +

sin(παt − π/2)

παt − π/2 .

Using the identity sin(x + π/2) = cos x again, this becomes

v(t) = 1

4

cos(παt)

αt + 1/2 − cos(παt)

αt − 1/2

=

cos(παt)

1 − 4α2t2.

Since g(t) = sinc(t/a)v(t) the above result for v(t) corresponds with (6.18) for T = 1.

(c) The result for arbitrary T follows simply by scaling.

Exercise 6.9:

(a) The figure is incorrectly drawn in the exercise statement and should be as follows:1

12

− 12

− 34

− 32

− 74

− 14

0 14

34

12

74

32

In folding these pulses together to check the Nyquist criterion, note that each pulse on thepositive side of the figure folds onto the interval from −1/2 to −1/4, and each pulse of the leftfolds onto 1/4 to 1/2, Since there are k of them, each of height 1/k, they add up to satisfy theNyquist criterion.

(b) In the limit k → ∞, the height of each pulse goes to 0, so the pointwise limit is simply themiddle pulse. Since there are 2k pulses, each of energy 1/(4k2), the energy difference betweenthat pointwise limit and gk(f ) is 1/(2k), which goes to 0 with k. Thus the pointwise limit andthe L2 limit both converge to a function that does not satisfy the Nyquist criterion for T = 1and is not remotely close to a function satisfying the Nyquist condition. Note also that onecould start with any central pulse and construct a similar example such that the limit satisfiesthe Nyquist criterion.

Exercise 6.10:

(a) Since q ∗(f ) = ˆ p(f ), we see that g(f ) = ˆ p(f )q (f ) = |ˆ p(f )|2 so that g(f ) is real. Also,since q ∗(−t) ↔ q ∗(f ), we see that q ∗(−t) = p(t). Thus, if p(t) is real, q (t) is also real andg(t) = p(t) ∗ q (t) is real. Since q ∗(f ) = ˆ p(f ), we see that p(t) = q ∗(−t). Since p(t) is real, q (t) is

also real, and thus g (t) = p(t) ∗ q (t) is real. Thus

g(−f ) =

g(t)e2πift dt =

g(t)e−2πift dt

∗= g∗(f ) = g(f ).

(b) Note that g(f ) as given in the hint is real and can be expressed as g(f ) = rect(f − 1/2).Choosing ˆ p(f ) =

g(f ), we see that ˆ p(f ) = rect(f − 1/2), so p(t) = e−2πiftsinc(t), which is

clearly not real. More generally, since g(f ) = g(−f ), no choice of p(t) satisfying the givenconditions can be real.

65



Exercise 6.11:

(a) Note thatxk(t) = 2{exp(2πi(f k + f c)t)} = cos[2π(f k + f c)t].

The cosine function is even, and thus x1(t) = x2(t) if f 1 + f c = −f 2 − f c. This is the onlypossibility for equality unless f 1 = f 2. Thus, the only f 2

= f 1 for which x1(t) = x2(t) is

f 2 = −2f c − f 1. Since f 1 > −f c, this requires f 2 < −f c, which is why this situation cannot arisewhen f k ∈ [−f c, f c) for each k .

(b) For any u1(f ), one can find a function u2(f ) by the transformation f 2 = −2f c − f 1 in(a). Thus without the knowledge that u1(t) is lowpass limited to some B < f c, the ambiguousfrequency components in u1(t) cannot be differentiated from those of u2(t) by observing x(t).If u(t) is known only to be bandlimited to some band B greater than f c, then the frequenicesbetween −B and B − 2f c are ambiguous.

An easy way to see the problem here is to visualize u(f ) both moved up by f c and down by f c.The bands overlap if B > f c and the overlapped portion can not be retrieved without additionalknowledge about u(t).

(c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzerobandwidth, ambiguity might be possible in other ways also. We have already seen, however,that if u(t) has a bandwidth less than f c, then u(t) can be uniquely retrieved from x(t) in theabsence of noise.

(d) For u(t) real, x(t) = 2u(t)cos(2πf ct), so u(t) can be retrieved by dividing x(t) by 2 cos(2πf ct)except at those points of measure 0 where the cosine function is 0. This is not a reasonableapproach, especially in the presence of noise, but it points out that the PAM case is essentiallydifferent from the QAM case

(e) Since u∗(t) exp(2πif ct) has energy at positive frequencies, the use of a Hilbert filter doesnot have an output equal to u(t) exp(2πif ct), and thus u(t) does not result from shifting thisoutput down by f c. In the same way, the bands at 2f c and

−2f c that result from DSB-QC

demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter.For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any meansat all.

For PAM it is surprising, since it says that these methods are not general. Since all time-limitedwaveforms are unbounded in frequency, it says that there is a fundamental theoretical problemwith the standard methods of demodulation. This is not a problem in practice, since f c is usuallyso much larger than the nominal bandwidth of u(t) that this problem is of no significance.

Exercise 6.12:

(a) φ j(t)φ∗ (t) dt =

θ j (t)θ∗ (t)e−2πif ct+2πif ct =

θ j(t)θ∗ (t) dt = δ j .

(b)

φ2(t) = θ2(t)e2πif ct = θ1(t − T )e2πif ct

= φ1(t − T )e−2πif c(t−T )e2πif ct = φ1(t − T )e2πif cT .

66



If f cT is an integer, this is φ1(t − T ). In other words, baseband waveforms that are orthonormalto their time shifts preserve this property at bandpass if f cT is an integer.

Exercise 6.13:

(a) Since u(t) is real, φ1(t) =

{u(t)e2πif ct

}= u(t) cos(2πf ct), and since v(t) is pure imaginary,

φ2(t) = {v(t)e2πif ct} = [iv(t)] sin(2πif ct). Note that [iv(t)] is real. Thus we must show that u(t) cos(2πf ct)[iv(t)] sin(2πf ct) dt =

u(t)[iv(t)] sin(4πf ct) dt = 0.

Since u(t) and v(t) are lowpass limited to B/2, their product (which corresponds to convolutionin the frequency domain) is lowpass limited to B < 2f c. Rewriting the sin(4πf ct) above interms of complex exponentials, and recognizing the resulting integral as the Fourier transformof u(t)[iv(t)] at ±2f c, we see that the above integral is indeed zero.

(b) Almost anything works here, and a simple choice is u(t) = [iv(t)] = rect(8f ct − 1/2).

Exercise 6.15:

(a) ∞−∞

√ 2 p(t − jT )cos(2πf ct)

√ 2 p∗(t − kT ) cos(2πf ct)dt

=

∞−∞

p(t − jT ) p∗(t − kT )[1 + cos(4πf ct)]dt

=

∞−∞

p(t − jT ) p∗(t − kT )dt +

∞−∞

p(t − jT ) p∗(t − kT )cos(4πf ct)dt

= δ jk + 1

2 ∞

−∞

p(t − jT ) p∗(t − kT ) e4πif ct + e−4πif ct

dt.

The remaining task is to show that the integral above is 0. Let g jk (t) = p(t − jT )g∗(t − kT ).Note that g jk (f ) is the convolution of the transform of p(t − jT ) and that of p∗(t − kT ). Since p is lowpass limited to f c, g jk is lowpass limited to 2f c, and thus the integral (which calculatesthe Fourier transform of g jk at 2f c and −2f c) is zero.

(b) Similar to part (a) we get, ∞−∞

√ 2 p(t − jT )sin(2πf ct)

√ 2 p∗(t − kT ) sin(2πf ct)dt

=

∞

−∞

p(t − jT ) p∗(t − kT )[1 − cos(4πf ct)]dt

= ∞−∞

p(t − jT ) p∗(t − kT )dt − ∞−∞

p(t − jT ) p∗(t − kT )cos(4πf ct)dt

= δ jk − 1

2

∞−∞

g jk

e4πif ct + e−4πif ct

dt.

Again, the integral is 0 and orthonormality is proved.

67



Now for any j, k ∞−∞

√ 2 p(t − jT ) sin(2πf ct)

√ 2 p∗(t − kT ) cos(2πf ct)dt

= ∞

−∞ p(t − jT ) p∗(t − kT ) sin(4πf ct)dt

= 1

2i

∞−∞

g jk

e4πif ct − e−4πif ct

dt,

which is zero as before. Thus all sine terms are orthonormal to all cosine terms.

Exercise 6.16: Let ψk(t) = θk(t)e2πif ct. Since ψk(t)ψ∗

j ((t) dt =

θk(t)e2πif ctθ∗ j (t)e−2πif ctdt = δ kj ,

the set {ψk(t)} is an orthonormal set. The set {ψ∗k(t)} is also an orthonormal set by the same

reason. Also, since each ψk(t) is bandlimited to [f c−B/2, f c+B/2] and each ψ∗k(t) is bandlimitedto [−f c − B/2, −f c + B/2], the frequency bands do not overlap, so by Parsival’s relation, eachψk(t) is orthonormal to each ψ∗

j (t). This is where the constraint B /2 < f c has been used.

Next note that the sets of functions ψk,1(t) = {2ψk(t)} and ψk,2(t) = {2ψk(t)} are given by

ψk,1(t) = ψk(t) + ψ∗k(t) and iψk,2(t) = ψk(t) − ψ∗

k(t).

It follows that the set {ψk,1(t)} is an orthogonal set, each of energy 2, and the set {ψk,2(t)} isan orthogonal set, each of energy 2. By the same reason, for each k , j with k = j, ψk,1 and ψ j,2

are orthogonal. Finally, for each k , and for each = {1, 2},

ψk,ψk, dt = |ψk(t)|2

+ |ψ∗k(t)|2

dt = 2.

Exercise 6.17:

(a) This expression is given in (6.25) of the text.

(b) Note that the hypothesized expression for x(t) is

2|u(t)| cos[2πf ct + φ(t)] = 2|u(t)| cos[φ(t)] cos(2πf ct) − 2|u(t)| sin[φ(t)] sin(2πf ct).

Since u(t) = |u(t)|eiφ(t),

{u(t)} = |u(t)| cos[φ(t)] and {u(t) = |u(t)| sin[φ(t)],

so the hypothesized expression agrees with (6.25). Assuming that φ(t) varies slowly with respectto f ct, x(t) varies between 2|u(t)| and −2|u(t)|, touching ±u(t) each once per cycle.

(c) Since | exp(2πf t)| = 1, for any real f , |u(t)| = |u(t)| = |x+(t)|. Thus this envelope varieswith the baseband waveform and is defined independent of the carrier. The phase modulation(as well as x(t)) does depend on the carrier.

68



(d) Since 2πf ct + φ(t) = 2πf ct + φ(t), φ(t) and φ(t) are related by

φ(t) = φ(t) + (f c − f c)t.

(e) There are two reasonable approaches to this. First,

x2(t) = 4|u(t)|2 cos2[2πf ct + φ(t)] = 2|u(t)|2 + 2|u(t)|2 cos[4πf ct + 2φ(t)].

Filtering out the term at 2f c, we are left with 2|u(t)|2. The filtering has the effect of forming ashort term average. The trouble with this approach is that it is not quite obvious that all of thehigh frequency term get filtered out. The other approach is more tedious and involves squaringx(t) using (6.25). After numerous trigonometric identities left to the imagination of the reader,the same result as above is derived.

69



Exercise 7.2:

(a) Let Z = X + Y . Since X and Y are independent, the density of Z is the convolution of theX and Y densities. To make the principles stand out, assume σ 2

X = σ2Y = 1.

f Z (z) = f X (z)

∗f Y (z) =

∞

−∞f X (x)f Y (z

−x) dx

=

∞−∞

1√ 2π

e−x2/2 1√ 2π

e−(z−x)2/2 dx

= 1

2π

∞−∞

e−(x2−xz+ z2

2 ) dx

= 1

2π

∞−∞

e−(x2−xz+ z2

4 )− z2

4 dx

= 1

2√

πe−z2/4

∞−∞

1√ π

e−(x− z2

)2 dx

= 1

2√

πe−z2/4 ,

since the last integral integrates a Gaussian pdf with mean z/2 and variance 1/2, which evaluatesto 1. As expected, Z is Gaussian with zero mean and variance 2.

The ‘trick’ used here in the fourth equation above is called completing the square . The idea isto take a quadratic expression such as x2 + αz + βz2 and to add and subtract α2z2/4. Thenx2 + αxz + αz2/4 is (x + αz/2)2, which leads to a Gaussian form that can be integrated.

Repeating the same steps for arbitrary σ2X and σ2

Y , we get the Gaussian density with mean 0and variance σ2

X + σ2Y .

(b) and (c) We can also find the density of the sum of independent rvs by taking the productof the Fourier transforms of the densities and then taking the inverse transform. Since e−πt2 ↔

e−πf 2

are a Fourier transform pair, the scaling property leads to1√

2πσ2exp(− x2

2σ2) ↔ exp(−π(2πσ2)f 2) = exp[−2π2σ2θ2]. (36)

Since convolution for densities corresponds to multiplication for their transforms, the transformof Z = X + Y is given by

f Z (θ) = f X (θ) f Y (θ) = exp−2π2θ2(σ2

X + σ2Y )

.

Recognizing this, with σ2 = σ2X + σ2

Y , as the same transform as in (36), the density of Z is theinverse transform

f Z (z) = 1 2π(σ2

X + σ2Y )

exp −z2

2(σ2X + σ2

Y ) . (37)

(d) Note that αkW k is a zero-mean Gaussian rv with variance α2k. Thus for n = 2, the density

of V is given by (37) as

f V (v) = 1 2π(α2

1 + α22)

exp

v2

2(α21 + α2

2)

.

71



The general formula, for arbitrary n, follows by iteration, viewing the sum of n variables as thesum of n − 1 variables (which is Gaussian) plus one new Gaussian rv. Thus

f V (v) = 1

2π(

α2

k)exp

−v2

2(

nk=1 α2

k)

.

Exercise 7.3:

(a) Note that f X is twice the density of a normal N (0, 1) rv for x ≥ 0 and thus that X is themagnitude of a normal rv. Multiplying by U simply converts X into a N (0, 1) rv Y 1 that canbe positive or negative. The mean and variance of Y 1 are then 0 and 1 respectively.

(b) Let Z be independent of X with the same density as X and let Y 2 = U Z . Then Y 2 is alsoa normal Gaussian rv. Note that Y 1 and Y 2 are each nonnegative if U = 1 and each negative if U = −1. Furthermore, given U = 1 , Y 1 and Y 2 are equal to X and Z respectively and thus havean iid Gaussian density conditional on being in the first quadrant. Given U = −1, Y 1 and Y 2are each negative with the density of

−X and

−Z . The unconditional density is then positive

only in the first and third quadrant, with the conditional density in each quadrant multipliedby 1/2.

Note that Y 1 and Y 2 are individually Gaussian, but clearly not jointly Gaussian, since their jointdensity is 0 in the second and fourth quadrant, and thus the contours of equi-probability densityare not the ellipses of Figure 7.2.

(c)

E[Y 1Y 2] = E[UX UZ ] = E[U 2]E[X ]E[Z ] = (E[X ])2,

where we used the independence of U, X , and Z and also used the fact that U 2 is deterministicwith value 1. Now,

E[X ] =

∞0

x

2

π exp

−x2

2

dx =

2

π,

where we have integrated by using the fact that x dx = d(x2/2). Combining the above equations,E[Y 1Y 2] = 2/π. For jointly Gaussian rvs, the mean and covariance specify the joint density (givenby (7.20) in the text for 2 rvs). Since this density is different from that of Y 1 and Y 2, this providesa very detailed proof that Y 1 and Y 2 are not jointly Gaussian.

(d) In order for the joint probability of two individually Gaussian rvs V 1, V 2 to be concentratedon the diagonal axes, v2 = v1 and v2 = −v1, we arrange that v2 = ±v1 for each joint samplevalue. To achieve this, let X be the same as above and let U 1 and U 2 be iid binary rvs, each

taking on the values +1 and -1 with equal probability. Then we let V 1 = U 1X and V 2 = U 2X .That is V 1 and V 2 are individually Gaussian but have identical magnitudes. Thus their samplevalues lie on the diagonal axes and they are not jointly Gaussian.

Exercise 7.4: Since X = max(W 1, W 2) and Y = min(W 1, W 2), we see that X ≥ Y . Thus eachpossible sample point (x, y) lies beneath the 45◦ line, as seen in the figure below.

Both W 1 = x, W 2 = y and W 1 = y, W 2 = x map into that same point. Thus the mapping foldsthe region on the upper left of the 45◦ line onto the region on the lower right of that line.

72



w1

w2

x

y

(b) From the geometry above, we see that X, Y has zero density to the upper left of the 45◦ lineand twice the Gaussian density to the lower right. Thus

f XY (x, y) =

22π exp(−x2−y2

2 ) for x ≥ y0 for x < y.

(c) Since S = X + Y = W 1 + W 2, S is the sum of two normal N (0, 1) rvs, so S ∼ N (0, 2). Thus,

f S (s) = 1√ 4π

exp(s2

4 ).

(d) Note that X − Y is the larger of W 1 − W 2 and W 2 − W 1, i.e., X − Y = |W 1 − W 2|. SinceW 1 − W 2 ∼ N (0, 2),

f D(d) =

2√

4π exp(−d2

4 ) for d ≥ 0

0 for d < 0.

(e) We have seen that S = W 1 + W 2 and D = |W 1 − W 2|. Let Z = W 1 − W 2. Then S andZ are zero-mean jointly Gaussian. Since they are uncorrelated, they are also are statistically

independent. It follows that S and D = |Z | are statistically independent. Since U is statisticallyindependent of both S and D , U D is statistically independent of S . But also U D is a Gaussianrv since it is the magnitude of a Gaussian rv times an equiprobable and independent choice of sign. Since S and U D are each Gaussian and statistically independent, they are jointly Gaussian.

Exercise 7.5: This exercise is needed to to justify that {V (τ ); τ ∈ R} in (7.34) of the textis L2 with probability 1, but contains nothing explicit about random processes. Let g(τ ) =

φ(t)h(τ − t) dt. Since convolution in the time domain corresponds to multiplication in the

frequency domain, g(f ) = φ(f )h(f ). Since h is L1, |h(f )| ≤ |h(t)| dt .= h◦ < ∞. Thus

|g(τ )|2 dτ = |g(f )

|2 df = |h(f )

|2

|φ(f )

|2 df

≤(h◦)2 |φ(f )

|2 df = (h◦)2.

To see that the assumption that h is L1 (or some similar condition) is necessary, let φ(f ) = h(f )each be f −1/4 for 0 < |f | ≤ 1 and each be zero elsewhere. Then

10 |φ(f )|2 df = 2. Thus h and

θ are L2, but h is unbounded and hence h is not L1. Also, 1

0 |φ(f )h(f )|2 df = 1

0 (1/f ) df = ∞.What this problem shows in general is that the convolution of two L2 functions is not necessarilyL2, but that the additional condition that one of the functions is L1 is enough to guarantee thatthe convolution is L2.

73



Exercise 7.6:

(a) We generalize the process of (7.30) in the text to assume that Z 1, . . . , Z n are zero-mean jointly Gaussian rvs with arbitrary finite3 covariance. Thus we can express Z = (Z 1, . . . , Z n)T

as Z = AW where W 1, . . . , W n are iid, N (0, 1). Thus U = (Z (t1), . . . , Z (t))T can be expressedas U = BZ = BAW and it follows that Z (t1), . . . , Z (t) are jointly Gaussian for all and

t1, . . . , t. The energy in a sample function of Z (t), say z(t) = nk=1 zkφk(t) where z1, . . . , zn

are the sample values of Z 1, . . . , Z n isn

k=1 z2k. Note that this depends only on the orthogonality

of the φk and has nothing to do with the correlation of the Z n. This is finite with probability 1since it is a finite sum of sample values that are each finite with probability 1.

(b) Note that KZ (t, τ ) =

k,m E[Z kZ m]φk(t)φm(τ ). Since this process is real, the absolute value

sign on K 2Z (t, τ ) can be omitted (although the result, using absolute value signs, applies also tocomplex processes). Thus

|KZ (t, τ )|2 dtdτ =

k,m,k,m

E[Z kZ m]E[Z kZ m]φk(t)φm(τ )φk(t)φm(τ ) dtdτ

= k,m

(E[Z kZ m])2,

where we have integrated over t, using the orthonormality of the φk, and then integrated over τ in the same way. Since the summation is over 1 ≤ k ≤ n and 1 ≤ m ≤ n, the number of termsis finite, so the sum must be finite.

Exercise 7.7:

(a) For each t ∈ , Z (t) is a finite sum of zero-mean independent Gaussian rvs {φk(t)Z k; 1 ≤k ≤ n} and is thus itself zero-mean Gaussian.

var(Z (t)) = E[Z 2(t)]

= E n

k=1φk(t)Z k

ni=1

φi(t)Z i=

ni=1

nk=1

φi(t)φk(t)E[Z iZ k]

=n

k=1

φ2k(t)σ2

k,

where E[Z iZ k] = σ2kδ ik since the Z k are zero mean and independent.

(b) Let Y = (Z (t1), Z (t2) · · · Z (t))T and let Z = (Z 1, . . . , Z n)T be the underlying Gaussianrvs defining the process. We can write Y = BZ , where B is an × n matrix whose (m, k)th

entry is B

(m, k) = φk(tm). Since Z 1, . . . , Z n are independent, Y

is jointly Gaussian. Thus{Z (t1), Z (t2) · · · , Z (t)} are jointly Gaussian. By definition then, Z (t) is a Gaussian process.

(c) Note that Z ( j)(t) − Z (n)(t) = j

k=n+1 Z kφk(t). By the same analysis as in part (a),

var(Z ( j)(t) − Z (n)(t)) =

jk=n+1

φ2k(t)σ2

k.

3The elements ak,m in the matrix A used to define a jointly Gaussian vector are defined to be real numbers,which by definition does not include ± infinity.

74



Since |φk(t)| < A for all k and t, we have

var(Z ( j)(t) − Z (n)(t)) ≤ j

k=n+1

σ2kA2.

Because ∞k=1 σ2k < ∞, ∞k=n+1 σ2k converges to 0 as n → ∞. Thus, var(Z ( j)(t) − Z (n)(t))approaches 0 for all j as n → ∞, i.e., for any δ > 0, there is an nδ such that var(Z ( j)(t) −Z (n)(t)) ≤ δ for all j, n ≥ nδ. By the Chebyshev inequality, for any > 0,

Pr[Z ( j)(t) − Z (n)(t) ≥ ] ≤ δ/2.

Choosing 2 =√

δ , and letting δ go to 0 (with nδ → ∞), we see that

limn,j→∞

Pr[Z ( j)(t) − Z (n)(t) > 0] = 0.

Thus we have a sequence of Gaussian random variables that, in the limit, take on the samesample values with probability 1. This is a strong (but not completely rigorous) justification forsaying that there is a limiting rv that is Gaussian.

(d) Since the set of functions {φk(t)} are orthonormal,

z2 =

∞−∞

(

k

zkφk(t))(

i

ziφi(t))dt

=i, k

zkziφk(t), φi(t)

=

k

z2k.

Thus the expected energy in the process is E

[{Z (t)}2

] = kE

[Z 2

k ] = k σ2

k

(e) Using the Markov inequality,

Pr{{Z (t)}2 > α} ≤

k

σ2k/α.

Since

k σ2k < ∞, limα→∞ Pr{{Z (t)}2 > α} = 0. This says that sample functions of Z (t) are

L2 with probability 1.

Exercise 7.8:

(a) Let t1, t2, . . . , tn be an arbitrary set of epochs. We will show that Z (t1), . . . , Z (tn) are jointly

Gaussian. Each of these rvs are linear combinations of the iid Gaussian set {Z k; k ∈ Z}, and,more specifically are linear combinations of a finite subset of {Z k; k ∈ Z}. Thus they are jointlyGaussian and {Z (t); t ∈ R} is a Gaussian random process.

It is hard to take the need for a finite set above very seriously, but for the mathematically pureof heart, it follows from the fact that when t is not an integer plus 1/2, Z (t) is equal to Z k fork equal to the integer part of t + 1/2. In the special case where t is an integer plus 1/2, Z (t) isthe sum of Z t−1/2 and Z t+1/2.

75



(b) The covariance of {Z (t); t ∈ R} (neglecting the unimportant case where t or τ are equal toan integer plus 1/2) is:

KZ (t, τ ) = E[Z (t)Z (τ )]

= E[Z t+0.5Z τ +0.5]

= 1 , t + 0.5 = τ + 0.50 , otherwise.

(c) Since KZ (t, τ ) is a function of both t and τ rather than just t − τ (for example,0 = KZ (4/3, 2/3) = KZ (2/3, 0) = 1 ), the process is not WSS. Hence it is not stationary.

(d) Let V (0) = V 0 and V (0.5) = V 1. We see that V 0 and V 1 will be independent if 0 < φ ≤ 0.5and V 0 = V 1 if 0.5 < φ ≤ 1. Hence,

f V 1|V 0,Φ(v1|v0, φ) =

N (0, 1) , 0 ≤ φ ≤ 0.5δ (v1 − v0) , otherwise

Recognizing that Pr(0 ≤ Φ ≤ 0.5) = 1/2, the conditioning on Φ can be eliminated to get

f V 1|V 0(v1|v0) = 1

2

1√ 2π

exp

−v2

1

2

+

1

2δ (v1 − v0).

The main observation to be made from this is that V 0 and V 1 cannot be jointly Gaussian sincethis conditional distribution is not Gaussian.

(e) Ignore the zero-probability event that φ = 1/2. Note that, given Φ < 1/2, V 0 = Z 0 and,given Φ > 1/2, V 0 = Z −1. Since Φ is independent of Z 0 and Z −1 and Z 0 and Z −1 are iid, thismeans that V 0 is N (0, 1) and, by the same argument, V 1 is N (0, 1). Thus V (0) and V (0.5) areindividually Gaussian but not jointly Gaussian. This is a less artificial example than those inExercises 7.3 and 7.4 of Gaussian rvs that are not jointly Gaussian. The same argument appliesto Z (t), Z (τ ) for any t, τ for which |t − τ | < 1, so {V (t); t ∈ R} is not a Gaussian process eventhough V (t) is a Gaussian rv for each t.

(f) Consider the event that V (t) and V (τ ) are both given by the same Z i. This is impossiblefor |t − τ | > 1, so consider the case t ≤ τ < t + 1. Let V (t) = Z i (where i is such that−1

2 < t − i − Φ ≤ 12 . Since Φ is uniform in [0, 1], Pr[V (τ ) = Z i] = 1 − (τ − t). For t − 1 < τ ≤ t,

Pr[V (τ ) = Z i] = 1 − (t − τ ). Given the event that V (t) and V (τ ) are both given by the same Z ithe conditional expected value of V (t)V (τ ) is 1, and otherwise it is 0. Thus

E[V (t)V (τ )] =

1 − |t − τ | , |t − τ | ≤ 10 , otherwise.

(g) Since the covariance function is a function of |t−τ | only, the process is WSS. For stationarity,we need to show that for all integers n, for all sets of epochs t1, . . . , tn ∈ R and for all shifts τ ∈ R,the joint distribution of V (t1), . . . , V (tn) is the same as the one of V (t1 + τ ), . . . , V (tn + τ ). Letus consider n = 2. As in parts (d)-(e), one can show that the joint distribution of V (t1), V (t2)depends only on whether t1 − t2 is smaller than the value of φ or not; thus shifting both epochsby the same amount does not modify the joint distribution. For arbitrary n, one can observethat only the spacing between the epochs matters, hence a common shift does not affect the joint distribution and the process is stationary.

76



Exercise 7.9:

(a) Each sample function v(t) of the random process V (t) is baseband limited to 1/2T . Wecan view

v(t)sinc(t/T ) dt as the output at time 0 of the filter h(t) = sinc(t/T ) for input v(t).

This filter is an ideal lowpass filter with bandwidth 1/2T , i.e., h(f ) = T rect(f T ). Since v(t)is contained in the baseband bandwidth of the filter, the filter output is a scaled version of the

input, i.e., it is equal to T v(t). Thus the output at t = 0 is T v(t) = T v0 where v0 is the samplevalue of V 0. Thus the linear functional is T V 0, which is N (0, T 2σ2).

(b) The linear functional

V (t)sinc( αtT ) dt is similarly the output at time 0 of the ideal lowpass

filter h(t) = sinc( αtT ). For α > 1, this filter has a greater bandwidth than the input and thus

passes V (t) with scaling T /α. Thus the linear functional is T V 0/α, which is N (0, T 2σ2/α2).

(c) As shown in Theorem 7.5.2, the process V (t) is a stationary Gaussian process with covarianceKV (t − τ ) = σ2sinc( t−τ

T ). Since Y is simply a scaled version of V ,

KY (t − τ ) = T 2σ2

α2 sinc(

t − τ

T ).

(d) Since Y is stationary, Y (τ ) has the same distribution as Y (0), which is is N (0, T 2

σ2

/α2

).(e) The spectral density of {Y (t)} is the Fourier transform of the covariance function found in

part (c), and is thus T 2σ2

α2 rect(T f ).

(f) Since Y (t) = (T /α)V (t), we can express Y (t) as

Y (t) =

k

Y ksinc

t − kT

T

,

where for each k, Y k = T V k/α. thus the Y k are iid and N (0, σ2T 2/α2).

(g) The result of passing the WSS Gaussian process V (t) through a linear filter is a WSSGaussian process Y (t). The spectral density S Y (f ) is then given by S V (f )

|h(f )

|2. Thus

S Y (f ) = T 2σ2

α2 rect(Tf/α) ↔ KY (τ ) =

T 2σ2

α2sinc(ατ/T ).

Thus, for each τ , Y (τ ) ∼ N (0, T 2σ2/α2.

(h) Note that KY is the covariance function of a sinc process, and the additional fact that {Y (t)}is Gaussian completely specifies the process. Thus, using Theorem 7.5.2 and (7.44), the processcan be represented as

Y (t) =

k

Y ksinc

Y (

t − kT/α

T /α

,

where the Y k are iid N (0, σ2T 2/α2)). Note that in this case, V (t) has been filtered to a portionof the original bandwidth and any given Y k is not a scaled version of V (kT/α).

Exercise 7.10:

(a) Since X ∗−k = X k for all k, we see that E[X kX ∗−k] = E[X 2k ], Thus we are to show thatE[X 2k ] = 0 implies the given relations.

E[X 2k ] = E[((X k) + i(X k))2] = E[(X k)2] − E[(X k)2] + 2iE[(X k)(X k)]

77



Since both the real and imaginary parts of this must be 0,

E[(X k)2] = E[(X k)2] and E[(Xk)(Xk)] = 0.

Note: If X k is Gaussian, this shows that it is circularly symmetric.

(b) The statement of this part of the exercise is somewhat mangled. The intent (which is clear

in Section 7.11.3, where this result is required) is to show that if E[X kX ∗m] = 0 for all m = k ,then E[(X k)(X m)], E[(X k)(X m)], and E[(X k)(X m)] are all zero for all m = ±k. Toshow this, note that 2i(X m) = [X m − X ∗m]. Thus, for m = ±k,

0 = E[X kX ∗m] − E[X kX ∗−m] = E[X k(X ∗m − X m)] = 2iE[X k(X m)].

The real and imaginary part of this show that E[(X k)(X m)] = 0 and E[(X k)(X m)] = 0.The same argument, using 2(X m) = X m + X ∗m shows that E(X k)(X m)] = 0.

Exercise 7.11: The rvs V 1 and V 2 are defined by

V j = g j (t)Z (t) dt.

Since g j (t) and {Z (t)} are real, it follows that the V j’s are real rvs. Hence, the integral in (7.58),i.e., E[V 1V 2] is real. However, the integrand, g1(f )S Z (f )g∗2(f ) need not be real. To see this, pickany real g1(t) and set

g2(t) = g1(−t + T ).

Then,

g1(f )S Z (f )g∗2(f ) = g1(f )S Z (f )g1(−f )e−i2πf T =|g1(f )|2S Z (f )

e−i2πf T .

Since the bracketed expression is real for all f , the overall expression is not purely real. Note

that we have used the Hermitian symmetry property of transforms of real waveforms,g1(−f ) = g∗1(f ).

in constructing the example above.

Exercise 7.12:

(a) We view Y as a linear functional of {Z (t)} by expressing it as

Y =

T

0Z (t) dt =

∞−∞

Z (t)g(t) dt.

where

g(t) =

1 if t ∈ [0, T ]

0 otherwise.

Since Y is a linear functional of a Gaussian process, it is a Gaussian rv. Hence, we only need tofind its mean and variance. Since {Z (t)} is zero-mean,

E[Y ] = E

∞−∞

Z (t)g(t) dt

=

∞−∞

E[Z (t)]g(t) dt = 0.

78



E[Y 2] = E

T

0Z (t)g(t) dt

T

0Z (τ )g(τ ) dτ

=

T

0

T

0E[Z (t)Z (τ )]g(t)g(τ ) dτ dt

= T

0 T

0

N 0

2 δ (τ − t)g(t)g(τ ) dτ dt

= N 0

2

∞−∞

g2(τ ) dτ.

For g (t) in this problem, ∞−∞ g2(τ )dτ = T so E[Y 2] = N 0T

2 .

(b) The ideal filter h(t), normalized to unit energy, is given by

h(t) =√

2W sinc(2W t) ↔ h(f ) = 1√

2W rect(

f

2W ).

The input process {Z (t)} is WGN of spectral density N 0/2, as interpreted in the text. Thus the

output is a stationary Gaussian process with the spectral density

S Y (f ) = S z(f )|h(f )|2 = N 0

2

1

2W rect(

f

2W ).

The covariance function is then given by

KY (τ ) = N 0

2 sinc(2W τ ).

The covariance matrix for Y 1 = Y (0) and Y 2 = Y ( 14W ) is then

K = N 0

2 1 2/π

2/π 1 .

Using (7.20), the resulting joint probability density for Y 1, Y 2 is

f Y 1Y 2(y1y2) = 1

πN 0

1 − (2/π)2 exp

−y21 + (4/π)y1y2 − y2

2

N 0(1 − (2/π)2)

.

(c)

V =

∞0

e−tZ (t) dt =

∞−∞

g(t)Z (t) dt,

where

g(t) =

e−t for t ≥ 0

0 otherwise.

Thus, V is a zero-mean Gaussian rv with variance,

E[V 2] = Z 0

2

∞−∞

g2(t) dt = Z 0

4 ,

79



i.e. V ∼ N (0, Z 0/4).

Exercise 7.13:

(a) It is shown in Section 7.7 of the text that all W k are iid and N (0, N 0/2).

(b) The expansion k W

kφ

k(t) is simply the sinc expansion of WGN over the band B .

Exercise 7.14:

(a) Following the hint, let Z 1 be CN (0, 1) and let Z 2 = U Z 1 where U is independent of Z 1 andis ±1 with equal probability. Then since {Z 1} and {Z 1} are iid and N (0, 1/2), it follows that{Z 2} = U {Z 1} and {Z 2} = U {Z 1} are also iid and Gaussian and thus CN (0, 1). However{Z 1} and {Z 2} are not jointly Gaussian and {Z 1} and {Z 2} are not jointly Gaussian (seeExercise 7.3 (d)), and thus Z 1 and Z 2 are not jointly Gaussian. However, since Z 1 and Z 2 arecircularly symmetric, Z and eiθZ have the same joint distribution for all θ .

(b) We are given that {Z } and {Z } are Gaussian and that for each choice of φ, eiφZ hasthe same distribution as Z . Thus

{eiφZ

} = cos(φ)

{Z

} − sin(φ)

{Z

} must be Gaussian

also. Scaling this, α cos(φ){Z } − α sin(φ){Z } is also Gaussian for all α and all φ. Thusα1{Z } + α2{Z } is Gaussian for all α1 and α2, which implies that {Z } and {Z } are jointlyGaussian. This, combined with the circular symmetry, implies that Z is circularly symmetricGaussian.

80




Exercise 8.1:

(a) Conditional on observation v , the probability that hypothesis i is correct is pU |V (i|v). Thus

for decision U = 0, the cost is C 00 if the decision is correct (an event of probability pU |V (0|v))and C 10 if the decision is incorrect (an event of probability pU |V (1|v)). Thus the expected cost

for decision U = 0 is

C 00 pU

|V (0

|v) + C 10 pU

|V (1

|v).

Combining this with the corresponding result for decision 1,

U mincost = arg min j

C 0 j pU |V (0|v) + C 1 j pU |V (1|v).

(b) We have

C 01 pU |V (0|v) + C 11 pU |V (1|v) ≥U =0

<U =1

C 00 pU |V (0|v) + C 10 pU |V (1|v)

(C 01 − C 00) pU |V (0|v) ≥U =0

< ˜U =1

(C 10 − C 11) pU |V (1|v),

and using

pU |V ( j|v) = p jf V |U (v| j)

f V (v) , j = 0, 1,

we get the desired threshold test.

(c) The MAP threshold test takes the form

Λ(v) =f V |U (v|0)

f V |U (v|1)≥U =0

<U =1

p1

p0,

so only the RHS of the minimum cost and the MAP tests are different. We can interpret the

RHS of the cost detection problem as follows: the relative cost of an error (i.e., the differencein costs between the two hypotheses) given that 1 is correct is given by C 10 − C 11. This relativecost is then weighted by the a priori probability p1 for the same reason as in the MAP case. Theimportant thing to observe here, however, is that both tests are threshold tests on the likelihoodratio. Thus the receiver structure in both cases computes the likelihood ratio (or LLR) and thenmakes the decision according to a threhold. The calculation of the likelihood ratio is usually themajor task to be accomplished. Note also that the MAP test is the same as the minimum costtest when the relative cost is the same for each hypothesis.

81



Exercise 8.2: Assume that a > 0. The following diagram then indicates the observation in theabsence of noise.

a

a exp(iπ/4)

a exp(−iπ/4) The three dots on the rightindicate the possible centerpoints for V when U = 0.

The three dots on the leftindicate the possible centerpoints for V when U = 1.

The intuitively optimal decision rule with noise is

V 1≥U =0<U =1

0. (38)

A simple and elegant way to demonstrate its optimality is to note that, given any possible θ, theML decision rule is simply the minimum distance rule between the two possible points (sincethe noise is Gaussian). Thus, for each θ, (38) is the ML rule given θ, i.e., the optimal rule if the

receiver observes θ. The minimum error probability when θ is an observable can be no greaterthan the minimum when θ is not observed (since the detector could always ignore θ). Since thistest does not require θ, however, it must be optimal whether or not θ is observed.

Exercise 8.3: Conditional on U , the V i’s are independent since the X i’s and Z i’s are inde-pendent. Under hypothesis U = 0, V 1 and V 2 are i.i.d. N (0, 1 + σ2) (because the sum of twoindependent Gaussian random variables forms a Gaussian random variable whose variance isthe sum of the two individual variances) and V 3 and V 4 are i.i.d. N (0, σ2). Under hypothesisU = 1, V 1 and V 2 are i.i.d. N (0, σ2) and V 3 and V 4 are i.i.d. N (0, 1 + σ2).

Note that by symmetry, the probability of error conditioned on U = 0 is same as that conditionedon U = 1. Hence the average probability of error is same as the probability of error conditioned

on U = 0.(a), (b) Since the V i’s are independent of each other under either hypothesis, we havef V |U (v |u) = f V 1|U (v1|u)f V 2|U (v2|u)f V 3|U (v3|u)f V 4|U (v4|u). Thus,

LLR(v ) = ln

exp

− v21

1+σ2 − v221+σ2 −

v23σ2 −

v24σ2

exp

− v21

σ2 − v22σ2 −

v231+σ2 −

v241+σ2

= (v21 + v2

2 − v23 − v2

4) ·

1

σ2 − 1

1 + σ2

=

E a − E bσ2(1 + σ2)

.

This shows that the log-likelihood ratio depends only on the difference between E a and E b. Hence,the pair (E a, E b) is a sufficient statistic for this problem. Actually the difference E a − E b is alsoa sufficient statistic.

(c) For ML detection, the LLR threshold is 0, i.e. the decision is U = 0 if LLR(v ) is greaterthan or equal to zero and U = 1 if LLR(v ) is less than zero. Thus the ML detection rule reduces

to E a ≥U =0<U =1

E b. The threshold would shift to ln(Pr(U =1)Pr(U =0) ) for MAP detection.

82



(d) As described before, we only need to find the error probability conditioned on U = 0.Conditioning on U = 0 throughout below, the ML detector will make an error if E a < E b. Here(as shown in Exercise 7.1), E a is an exponentially distributed random variable of mean 2 + 2σ2

and E b is an independent exponential rv of mean 2σ2,

f E a(x) =

1

2 + 2σ2 exp( −x

2 + 2σ2 ); f E b(x) =

1

2σ2 exp( −x

2σ2 ).

Thus, conditional on E a = x (as well as U = 0), an error is made if E b > x.

Pr(E b > x) =

∞x

1

2σ2 exp(

−x

2σ2) dx = exp(

−x

2σ2).

The overall error probability is then

Pr(e) =

∞0

f E b(x) exp(−x

2σ2) dx

= ∞

0

1

2 + 2σ2

exp( −x

2 + 2σ2

)exp(−x

2σ2

) dx

= 1

2 + 1/σ2.

We can make a quick sanity check of this answer by checking that it equals 0 for σ2 = 0 andequals 1/2 for σ2 = ∞. In the next chapter, it will be seen that this is the probability of errorfor binary signaling in flat Rayleigh fading.

Exercise 8.5: Expanding y (t) and b(t) in terms of the orthogonal functions {φk,j (t)},

y(t)b(t) dt = k,j

yk,j ψk,j (t)k,j bk,jψk,j(t)) dt

=k,j

yk,j bk,j

[ψk,j (t)]2 dt

= 2k,j

yk,j bk,j ,

where we first used the orthogonality of the functions ψk,j (t) and next the fact that they eachhave energy 2. Dividing both sides by 2, we get (8.36) of the text.

Exercise 8.6:

(a)

Q(x) = 1√

2π

∞x

e−z2/2 dz

= 1√

2π

∞0

e−(x+y)2/2 dy where y = z − x

= e−x2/2

√ 2π

∞0

e−y2/2−xy dy.

83



(b) The upper bound, exp(−y2/2) ≤ 1 is trivial, since exp v ≤ 1 whenever v ≤ 0. For thelower bound, 1 − y2/2 ≤ exp(−y2/2) is equivalent (by taking the logarithm of both sides) toln(1 − y2/2) ≤ −y2/2, which is the familiar log inequality we have used many times.

(c) For the upper bound, use the upper bound of part (b),

Q(x) = e−x2/2

√ 2π

∞0

e−y2/2−xy dy ≤ e−x2/2

√ 2π

∞0

e−xy dy = e−x2/2

x√

2π.

For the lower bound, use the lower bound of part (b) and then substitute z for xy.

Q(x) = e−x2/2

√ 2π

∞0

e−y2/2−xy dy

≥ e−x2/2

√ 2π

∞0

e−xy dy − e−x2/2

√ 2π

∞0

y2

2 e−xy dy

= e−x2/2

x√

2π− e−x2/2

√ 2π

1

x3 ∞

0

z2

2 e−z dz =

1

x 1 − 1

x2 e−x2/2

√ 2π

.

Thus, 1 − 1

x2

1

x√

2πe−x2/2 ≤ Q(x) ≤ 1

x√

2πe−x2/2 for x > 0. (39)

Exercise 8.7:

(a) We are to show that Q(γ + η) ≤ Q(γ ) exp[−ηγ − η2/2] for γ ≥ 0 and η ≥ 0. Using the hint,

Q(γ + η) =

1

√ 2π ∞γ +η exp(−x2

2 ) dx =

1

√ 2π ∞γ exp(−(y + η)2

2 ) dy

= 1√

2π

∞γ

exp(−y2

2 − ηy − η2

2 ) dy

≤ 1√ 2π

∞γ

exp(−y2

2 − ηγ − η 2

2 ) dy

= exp[−ηγ − η2

2 ] Q(γ ),

where, in the third step, we used the fact that y ≥ γ over the range of the integration.

(b) Setting γ = 0 and recognizing that Q(0) = 1/2,

Q(η) ≤ 1

2 exp[−η2

2 ].

This is tighter than the standard upper bound of (8.98) when 0 < η <

2/π.

(c) Part (a) can be rewritten by adding and subtracting γ 2/2 inside the exponent. Then

Q(γ + η) ≤ exp

−(η + γ )2

2 +

γ 2

2

Q(γ ).

84



Substituting w for γ + η yields the required result.

Exercise 8.8:

(a) An M -dimensional orthogonal signal set has M signal points and can transmit log2 M bitsper M -dimensions. Hence,

ρ = 2log2 M

M bits per 2 dimensions.

The energy per symbol is E and there are log2 M bits per symbol. Hence, the energy per bit is,

E b = E

log2 M .

(b) The squared distance between two orthogonal signal points a j and a k is given by,

||a j − a k||2 = a j − a k,a j − a k=

a j,a j

+

a k,a k

−2

a j ,a k

= 2E − 2Eδ jk

=

2E if j = k,

0 otherwise

Clearly, each point is equidistant from every other point and this distance is√

2E . Hence,

d2min(A) = 2E.

Also every signal point has M − 1 nearest neighbors.

(c) The ML rule chooses the a i that minimizes y − a i2. This is equivalent to choosing the a ithat maximizes

y ,a i

. This can be easily derived from the fact that

y

−a i

2

≥ y

−a j

2

⇔y ,a i ≤ y ,a j. The ML rule is to project y on each of the signals and choose one withthe largest projection. In a coordinate system where each signal waveform is collinear with acoordinate, this simply means choosing the hypothesis with the largest received coordinate.

Exercise 8.9:

(a) Given that a m is transmitted, an error occurs if the event a j −Y < a m −Y occursfor any j = m, 1 ≤ j ≤ M , i.e., an error occurs if the union of these events occurs. We ignorethe case of equality in these distance inequalitites since it has 0 probability. Let A j,m denotethe event a j −Y < a m −Y . Thus, using the union bound,

Pr(e|U =a m) = Pr j=m

A j,m|U =a m ≤ j=m

Pr(A j,m|U =a m).

Next, recall that for Gaussian noise with variance N 0/2 per dimension, the pairwise error prob-ability is given by

Pr(A j,m|U =a m) = Q

a j − a m2

2N 0

= Q

E

N 0

,

85



w0 = γ 1. Using part (b), we then have

Pr

M −1m=1

(W m ≥ w0|A = a 0)

≥ y(w0) − y2(w0)/2

≥ y(w0)/2 for w0 > γ 1

1/2 for w0 = γ 1

The upper part of the second inequality is valid because, if w0 > γ 1, then y(w0) is less than1 and y2 < y. The lower part of the second inequality follows because y(γ 1) = 1. The lowerbound of 1/2 is also valid for all w0 < γ 1 because the probability on the left of the equation isincreasing with decreasing w0.

Note that these lower bounds differ by a factor of 2 from the corresponding union bound (and thefact that probabilities are at most 1). This might seem very loose, but since y(w0) is exponentialin w0 over the range of interest, a factor of 2 is not very significant.

(d) Using part (c) in part (a),

Pr(e) = ∞−∞

f W 0|A(w0|a 0) PrM −1

m=1

(W m ≥ w0|A = a 0)

dw0

≥ γ 1

−∞

f W 0|A(w0|a 0)

2 dw0 +

∞γ 1

f W 0|A(w0|a 0)(M − 1)Q(w0)

2 dw0 (41)

≥ 1

2Q(α − γ 1), (42)

where, in the last step, the first integral in (41) is evaluated using the fact that W 0, conditionalon A = a 0, is N (α, 1). The second integral in (41) has been lower bounded by 0.

(e) From the upper and lower bounds to Q(x) in Exercise 8.6, we see that Q(x) ≈1

√ 2πx exp(−x2

/2) for large x. The coefficient 1/√ 2πx here is unimportant in the sense that

limx→∞

ln[Q(x)]

−x2/2 = 1.

This can be verified in greater detail by taking the log of each term in (39). Next, substitutingγ 1 for x and noting that limM →∞ γ 1 = ∞, this becomes

limM →∞

ln(M − 1)

γ 21 /2

= 1.

Note that the limit is not affected by replacing M − 1 by M . Taking the square root of the

resulting term in brackets, we get limM →∞ γ 1/γ = 1.Associating γ with γ 1, the upper bound to error probability in (8.57) of the text is substantiallythe same as the lower bound in part (d) (again in the sense of ignoring the coefficient in thebound to the Q function). The result that Pr(e) ≥ 1/4 for γ 1 = α is immediate from (42), andthe result for γ 1 > α follows from the monotonicity of Q.

(f) The problem statement was not very precise here. What was intended was to note that thelower bound here is close to the upper bound in (8.57), but not close to the upper bound in(8.59). The intent was to strengthen the lower bound in the case corresponding to (8.59), which

87



(c) We first express v ,a (the inner product in Cn), in terms of real and imaginary parts.

v ,a = [v , a + v , a ] + i[v , a − v , a ]

= D(v ), D(a ) + iD(v ), D(ia ). (43)

Using this,

v |a = v ,a a

a 2 =

D(v ), D(a )a + D(v ), D(ia )ia

a 2 .

Since the inner products above are real and since a = D(a , this can be converted to R2n as

D(v |a ) = D(v ), D(a )D(a ) + D(v ), D(ia )D(ia )

D(a )2 . (44)

This is the projection of D(v ) onto the space spanned by D (a ) and D(ia ). It would be con-structive for the reader to trace through what this means in the special (but typical) case wherethe components of a are all real.

(d) Note from (43) that v ,a = D(v ), D(a ). Thus

D

[v ,a ]a

a 2

=

D(v ), D(a )D(a )

D(a )2 .

From (44), this is the projection of D(v |a ) onto D (a ).

Exercise 8.13:

(a) Denote the complex noise by Z and assume4 that Z ∼ CN (0, N 0). The noise in the realdimension is then independent of that in the imaginary dimension. The sketch indicates the 4points of 4-QAM and the corresponding Voronoi regions for ML detection. An error occurs if noise causes the received point to lie in a different region than the transmitted point. Note that

due to symmetry, the probability of error given that any particular signal is sent will be sameas that for any other signal, so we assume that a + ia is sent.

a + ia

a − ia

−a + ia

−a − ia

3 4 2 1

ML decoding will choose the signal closest to the received complex value, i.e., if a point in region

1 is received, a + ia is decoded, if a point in region 2 is received, −a + ia is decoded, etc. Thuswhen a + ia is sent, an error occurs if the noise in the real direction is more negative than −aor if the noise in the imaginary direction is more negative than −a. Each of these events hasprobability Q(a/

N 0/2) which we denote as P 1. The probability of error is the union of these

events. Since these events are independent,

Pr(e) = 2P 1 − P 21 .

4This assumption was intended in the problem statement, but not made explicit.

89



That is, conditional on sending a + ia, Pr(e) is the probability P 1 of crossing the vertical axis(thus landing in regions 2 or 3) plus the probability P 1 of crossing the horizontal axis (landing inregions 3 or 4) less the probability of crossing both axes (landing in region 3), which was doublecounted before.

(b) Viewing this as 2 PAM systems, the probability of error on the real part is P 1 = Q(a/ N 0/2)

and the probability of error on the imaginary part is P 1(c) In part (b), we looked at the error in each 2-PAM system, whereas in part (a), an error in4-QAM error happens when either of the two parallel PAM systems are in error.

(d) A QAM error occurs when either PAM system above makes an error, i.e., as the union of the two independent error events. This is exactly what we calculated in part (a).

Exercise 8.14: This should be done after doing Exercise 8.13, since it concerns two ways of mapping from 2 binary digits to 4-QAM.

00

11

01

10

Mapping 1

00

10

01

11

Mapping 2

First consider Mapping 2. Here the first binary digit is the imaginary component of the signaland the second binary digit is the real component. Thus, using the parallel PAM interpretationof Exercise 8.13, a PAM error on the first bit occurs when the imaginary component of thenoise (in the proper direction) exceeds 1 and an error on the second bit occurs when the realcomponent exceeds 1. Thus the error probability for each bit is P 1 = Q(

2/N 0).

Mapping 1 is more complicated and also rather foolish, since whenever the threhsold is crossed in

the imaginary direction, it typically causes two bit errors rather than one. To be more precise,note that an error occurs on bit 1 if and only if the noise in the real direction crosses thethreshold. Thus P 1 is the error probability for the first of of two bits. However, an error is madeon the second of the two bits whenever the noise in either the real or imaginary direction (butnot both) crosses the threshold. Thus the probability of error for the second bit is 2P 1 − 2P 21 .

The point of the problem is that when bit error is important (as opposed to block error or QAMerror), then the mapping can be important. This is not a big deal since bit error probability initself is not usually the important issue, but it can often cause confusion.

Exercise 8.15: Theorem 8.4.1, generalized to MAP detection is as follows:

Let U (t) = nk=1 U k p(t − kT ) be a QAM (or PAM) baseband input to a WGN channel and

assume that { p(t−kT ; 1 ≤ k ≤ n} is an orthonormal sequence. Assume that U = (U 1, . . . , U n)T

is an n-vector of iid random symbols, each with the pmf p(0), . . . , p(M − 1). Then the M n-aryMAP decision on U = (U 1, . . . , U n)T is equivalent to making separate M -ary MAP decisions oneach U k, 1 ≤ k ≤ n, where the decision on each U k can be based either on the observation v(t)or the observation of vk.

To see why this is true, view the detection as considering all binary MAP detections betweeneach pair u ,u of M n-ary possible inputs and choose the MAP decision over all. From (8.42)

90



and (8.43) of the text,

LLRu ,u (v ) =n

k=1

−(vk − uk)2 + (vk + uk)2

N 0.

The MAP test is to compare this with the MAP threshold,

ln

PrU (u

)PrU (u )

=

nk=1

ln p(uk)

p(uk).

It can be seen that this comparison is equivalent to n single letter comparisons. The sequenceu that satisfies these tests for all u is the sequence for which each single letter test is satisfied.

Exercise 8.16:

(a) Let the energy of the signals s0(t) and s1(t) be E s0 and E s1 respectively.

E s0 = T

0 (s0(t))2

dt

=

T

0

2E

T

1 + cos(4πf 0t)

2 dt

= E + E

T

T

0cos(4πf 0t) dt

= E

1 +

sin(4πf 0T )

4πf 0T

≈ E,

where we have used f 0T 1. Similarly, E s1 ≈ E .

(b) The condition s0(0) = s0(T ) requires that s0(t) has an integer number of cycles, so thatf 0 = n/T for some nonnegative integer n. Similarly, s1(t) must have an integer number of cycles, so f 1 = m/T for some nonnegative integer m. Since s0(0) = s1(0) =

E/T , we then

have s0(0) = s0(T ) = s1(0) = s1(T ).

To test for orthogonality under the above circumstances, T

0s0(t)s1(t) dt =

T

0

2E

T cos(

2πnt

T )cos(

2πmt

T ) dt

= T

0

E

T cos(2π(n+m)t

T ) + cos(

2π(n−m)t

T ) dt

= 0 for n = m.

Thus all the conditions are satisfied if n and m are unequal and nonnegative integers.

When such a system is used for communication, using s0(t) for 0 and s1(t) for 1, and sendingsuccesive binary digits in successive intervals of duration T , the waveform made up of successivesignals has no discontinuities and no discontinuities in slope. When, in addition, m = n + 1,the frequency band used by the communication is little more than 1/T . The system (with

91



the added condition m = n + 1) is called minimum shift keying because this is the smallestshift in frequency between successive truncated sinusoids that maintains the above smoothnessconditions.

(c) We first convert the waveform problem to a vector problem. The simplest orthonormalbasis starts with the normalized versions of the two signal waveforms, φ0(t) = 1/E s0(t) and

φ1(t) = 1/E s1(t). The rest of the orthonormal expansion is of no concern, since the noise inthose dimensions is independent of that in φ0(t) and φ1(t) and is also independent of the signals.

Thus we have the 2-dimensional vector Gaussian noise problem where the signal constellationis binary, with a 0 = (

√ E, 0) and a 1 = (0,

√ E ). The noise (in the two relevant dimensions) is

Z = (Z 0, Z 1) where Z 0, Z 1 are iid and N (0,

N 0/2). The output Y , conditional on U = a 0, isY = (Y 0, Y 1) = (

√ E + Z 0, Z 1) and Y conditional on U = a 1 is Y = (Y 0, Y 1) = (Z 0,

√ E + Z 1).

Since the signals are equiprobable, the minimum error probability detection is ML, which isminimum distance (see diagram below).

a 1

a 0

U = a 0U = a 1

The decision can be characterized as choose U = a 0 if y0 ≥ y1 and choose U = a 1 otherwise.

A block diagram of a detector that does this is given below.

Received waveformY (t)

X

X

φ0(t)

φ1(t)

T 0

T 0

compare U

(d) The distance between the signal points is√

2E , so the noise in the direction between thesignals must exceed

E/2 to cause an error. Thus Pr(e) = Q(

E/N 0).

Exercise 8.17:

(a) For U = a, V k = agk + Z k, so Y = a

k gkq k + σ2

q 2k = ag , q + σ2q 2. Thus, givenU = a, Y ∼ N (ag ,q , σ2q 2). Similarly, given U = −a, Y ∼ N (−ag ,q , σ2q 2). This is just a 1-dimensional Gaussian detection problem with antipodal signals. Thus, the ML detectordecides U = a if y > 0 and U = −a otherwise.

(b) and (c) Using the analysis in Section 8.3.1 of the text, the probability of error is given by

Pr(e) = Q

ag , q ||q ||σ

= Q

aβ ||g ||

σ

where β =

g , q g q .

(d) Scaling of q will not change the error probability since we are scaling the signal and noiseby the same proportion. Algebraically, scaling q will not change β .

(e) The minimum error probability occurs when β is maximized over q . By the Schwarz in-equality, |β | ≤ 1 with equality if q = cg for any c > 0. Thus the minimum error probability

92



occurs when β = 1. This is obtained by choosing q = cg for any c > 0 and is called maximalratio combining.

(f) No. Mathematically, this is the problem of antipodal vectors in WGN analyzed in Section8.3.3 of the text, and as shown there, the ML detector is a threshold test on a linear combinationof the received vector components. The exercise here both shows that the same analysis applies

to multiple antennas and illustrates another approach to solving the problem.(g) and (h) If we first divide V k by σk so that the noise variances are equal to 1, and thenmultiply by q k, where q k = q kσk we have an equivalent system

V k = U gk + Z k,

where V k = V kσk

, gk = gkσk

, Z k = Z kσk

∼ N (0, 1), and q k = q kσk. Using the probability of errorexpression from part (b), with q , g , and σ replaced by q , g , and 1 respectively, we get

Pr(e) = Q

ag , q ||q ||

= Q (aβ ||g ||) .

Choosing β = 1 as before, Pr(e) is minimized over choices of q by β = 1 and Pr(e) = Q(ag ).

Exercise 8.18:

(a) The two codewords 00 and 01 are mapped into the signals (a, a) and (a, -a). In R2, thisappears as

(a, a)

(a, −a)

Thus the first bit contributes nothing to the distance between the two signals, but achieves

orthogonality using only ±a as signal values. As mentioned in Section 8.6.1 of the text, this isa trivial example of binary orthogonal codewords (binary sequences that differ from each otherin half their positions).

(b) Any row u in the first half of H b+1 can be represented as (u 1,u 1) where u 1 ∈ H b is repeatedas the first and second 2b entries of u . Similarly any other row u in the first half of H b+1 canbe represented as (u 1,u 1). The mod 2 sum of these two rows is thus

u ⊕ u = (u 1,u 1) ⊕ (u 1,u 1) = (u 1 ⊕ u 1,u 1 ⊕ u 1).

Since the mod 2 sum of any two rows of H b is another row of H b, u 1 ⊕ u 1 = u 1 is a row of H b.Thus u ⊕ u = (u 1,u 1) is a row in the first half of H b+1.

Any row in the second half of H b+1 can be represented as (u 1,u 1⊕ →1 ) where →1 is a vector of 2b ones and u1 is a row of H b. Letting u be another vector in the second half of H b+1 with thesame form of representation,

u ⊕ u = (u 1,u 1⊕→1 ) ⊕ (u 1,u 1⊕

→1 ) = (u 1 ⊕ u 1,u 1 ⊕ u 1),

where we have used the fact that→1 ⊕ →

1 is the zero vector. Thus u ⊕ u is a row in the firsthalf of H b+1.

93



Finally if u = (u 1,u 1) and u = (u 1,u 1⊕→1 ), then

u ⊕ u = (u 1,u 1) ⊕ (u 1,u 1⊕→1 ) = (u 1 ⊕ u 1,u 1 ⊕ u 1⊕

→1 )),

so that u ⊕ u is a row in the second half of H b+1.

Since H 1 clearly has the property that each mod 2 sum of rows is another row, it follows byinduction that H b has this same property for all b ≥ 1.

Exercise 8.19: As given in the hint, r

j=0

m j

is the number of binary m-tuples with at most

r ones. In this formula, as in most other formulas using factorials, 0! must be defined to be 1,and thus

m0

= 1. Each m-tuple with at most r ones that ends in a one has at most r − 1 ones

in the first m − 1 positions. There are r−1

j=0

m−1 j

such m − 1 tuples with at most r − 1 ones,

so this is the number of binary m-tuples ending in one. The number ending in 0 is similarly thenumber of binary m − 1 tuples containing r or fewer ones. Thus

r

j=0

m

j =r−1

j=0

m−1

j +r

j=0

m−1

j . (45)

(b) Starting with m = 1, the code RM(0,1) consists of two codewords, 00 and 11, so k(0, 1)(which is the base 2 log of the number of codewords) is 1. Similarly, RM(1,1) consists of fourcodewords so k(1, 1) = 2. Since

10

= 1 and

11

= 1, the formula

k(r, m) =

r j=0

m

j

(46)

is satisfied for m = 1, forming the basis for the induction. Next, for any m ≥ 2, assume that(46) is satisfied for m = m

−1 and each r, 0

≤ r

≤ m. For 0 < r < m, each codeword x has

the form x = (u ,u ⊕ v ) where u ∈ RM(r, m) and v ∈ RM(r − 1, m). Since each choice of u and v leads to a distinct codeword x , the number of codewords in RM(r, m) is the product of the number in RM(r, m) and that in RM(r − 1, m). Taking logs to get the number of binaryinformation digits, RM(r, m) = RM(r, m − 1) + RM(r − 1, m − 1). Using (45), k(r, m) satisfies(46). Finally, k(0, m) = 1 and k(m, m) = m, also satisfying (46).

Exercise 8.20:

(a) First note that RM(0,1) has two rows, 00 and 11, whereas RM(11) has four rows, 00, 01, 10,11. Thus RM(0, 1) ⊂ RM(1, 1), setting up the basis for the induction.

For any m ≥ 2, let m = m−1 and assume that RM(r−1, m) ⊂ RM(r, m) for all r, 0 < r ≤ m.

Any x ∈ RM(r − 1, m) for 0 < r − 1 < m has the representation x = (u ,u ⊕ v ) for someu ∈ RM(r − 1, m) and v ∈ RM(r − 2, m). By assumption, u and v also satisfy u ∈ RM(r, m)and v ∈ RM(r − 1, m). It then follows that x ∈ RM(r, m).

The above argument applies for 2 ≤ r ≤ m but does not cover either r = 1 or r = m. Forr = 1, note that RM(0, m) consists of the all zero row and the all one row. This is included inRM(1, m) by choosing v to be all zeros. For r = m, note that RM(m, m) contains all binary 2m

tuples, and thus contains all codes of length 2m. The code RM(m − 1, m) does not contain allbinary 2m tuples, so the containment is strict.

94



(b) Following the hints, first consider x = (u ,u ) where u = 0 and u ∈ RM(r, m − 1). Thenx has twice the weight of u , so its weight is at least 2m−r. Next, suppose x = (0, v ) wherev ∈ RM(r − 1, m − 1) and v = 0. Then x has the same weight as v , which is at least 2m−r.

Next consider x = (u ,u ⊕ v ) with u = 0 and u = v . In this case, the last half of x is all zero,but the first half has weight at least 2m−r because u = v ∈ RM(r − 1, m − 1). Finally, consider

x = (u ,u ⊕v ) where 0 = u = v = 0. From part (a), v ∈ RM(r, m−1), so u ⊕v ∈ RM(r, m−1).This means that the first half and the second half of x each contain at least 2m−r−1 ones, sothat x contains at least 2m−r ones.

(c) For m = 1, dmin(r, m) = 2m−r for both r = 0 and r = 1. Part (b) provides the inductivestep for 0 < r < m and we have already seen that dmin(0, m) = 2m and dmin(m, m) = 20 = 1.

95



For sufficiently large r, b/r c/f ). For example, if hs = 10m, hr = 1m, and r = 1km, thenb/r = 0.02, which is much smaller than the wavelength c/f at f = 1gH. With this assumption,the second exponent above is close to zero and can be approximated by its first order expansion,

E r(f, t) ≈ αe2πi[f t−f r1/c]

1

r1− 1

r2(1 − 2πifb/rc) .

Note that 1/r1 − 1/r2 is approximately b/r3, so it can be approximated by zero. Thus

E r(f, t) ≈

αe2πi[f t−f r1/c] 2πifb

r2rc

≈ −2παfb

cr2 sin[2π(f t − f r1/c)], (48)

since r2 ≈ r for r 1. Thus, E r ≈ β/r2 where β = −(2πα/c)sin[2π(f t − f r1/c)].

The main point here is that the difference between r1 and r2 gets so small with increasing r thatthere is no Doppler spread, but simply a cancellation of path responses which causes the 1/r2

decay with r. Viewed another way, the ground plane is gradually absorbing the radiated power.The student should now go back and look at the various approximations, observing that termsin 1/r3 were set to zero, and terms in 1/r2 were kept.

Exercise 9.4: When the channel is modeled by the multipath model of (9.14), h(f, t) =J j=1 β j(t)e−2πf τ j(t) and the attenuation on each path is permitted to be time-varying. The

time-varying impulse response is then

h(τ, t) =

J j=1

β j(t)δ (τ − τ j (t)).

Note that if the attenuation β j (t) were also a function of f , then this could not be done, andthe inverse transform would depend on the detailed form of β (f, t). Since it is only a functionof t, and t is fixed in the above inverse transform, the dependence on t makes no difference.Substituting the above equation into the convolution equation for LTV filters,

y(t) = ∞−∞ x(t − τ )h(τ, t)dτ

= J

j=1 β j (t)x(t − τ j(t)).

Exercise 9.5:

(a) Since there is only one path and it has Doppler shift D1, the Doppler Spread D is 0 and∆ = D1.

ˆh(f, t) = e

2πi

D1t

.ψ(f, t) = e−2πit∆h(f, t) = 1.

The envelope of the output when the input is cos(2πf t) is

|h(f, t)| = |ψ(f, t)| = 1.

Note that in this case, there is Doppler shift but no fading, and the envelope, which is one,captures this lack of fading.

98



(b) Hereh(f, t) = e2πiD1t + 1

and there are two paths: one with Doppler shift D1 and the other with zero Doppler shift. ThusD = D1 and ∆ = D1/2.

ψ(f, t) = e−2πit∆

h(f, t)= e−2πitD1/2(e2πiD1t + 1)

= eπiD1t + e−πiD1t

= 2 c os(πD1t).

The envelope of the output when the input is cos(2πf t) is

|h(f, t)| = |ψ(f, t)| = 2| cos(πD1t)|.

Note that the fading here occurs at a frequency D1/2 Note that this is the same as if there weretwo paths with Doppler shifts D1/2 and −D1/2. In other words, the fading frequency depends

on the Doppler spread rather than the individual shifts.

Exercise 9.6:

(a) The envelope of [yf (t)] is defined to be |h(f, t)|. Then

|yf (t)| = |e2πifth(f, t)| = |e2πift| · |h(f, t)| = |h(f, t)|.

This is true no matter what f is, but the definition of |h(f, t)| as an envelope only correspondsto our intuitive idea of envelope when f is large.

(b)

([yf (t)])2 = |h(f, t)|2 cos2(2πf t +∠h(f, t))

= 1

2|h(f, t)|2(1 + cos(4πf t + 2∠h(f, t))).

The result of lowpass filtering this power waveform is

1

2|h(f, t)|2.

With the assumption of large f , the angle of h(f, t) is approximately constant over a cycle, sothe short-time time-average of the power is 1

2 |h(f, t)|2. Thus the output of lowpass filtering thepower waveform can be interpreted as the short-time time-average of the power.

The square root of this time-average is

|h(f, t)|√ 2

,

which is just a scaled version of the envelope of [yf (t)].

Thus, over short time scales, we can find the envelope of [yf (t)] by squaring [yf (t)], lowpassfiltering, taking the square root, and multiplying by

√ 2.

99



In the Central Limit Theorem (CLT) one adds N iid rvs Y 1, . . . , Y n of given mean and variance,divides by

√ N to normalize, and then goes to the limit N → ∞. Here we are adding N iid

rvs (X n = θnφn) but normalizing by changing the distribution of each X n as N changes. Thismight seem like a small difference, but in the CLT case, a typical normalized sample sequencey1/

√ N , . . . , yN /

√ N is a sequence of many small numbers; here a sample sequence x1, . . . , xN

is a sequence of numbers, almost all of which are 0, with a small subset, not growing with N , of ±1’s.

(c) As mentioned above, G0,0 is the sum of a small number of ±1’s, so it will have an integerdistribution where the integer will be small with high probability. To work this out analytically,

let G0,0 = V (1)

N + V (−1)

N where V (1)

N is the number of values of n, 1 ≤ n ≤ N , for which of

θnφn = 1. Similarly, V (−1)

N is the number of values of n for which θnφn = −1. Since θnφn is 1with probability 1/N , V (1) has the binomial pmf,

Pr(V (1)

N = k) = N !

k!(N − k)!(1/N )k(1 − 1/N )N −k.

Note that N !/(N

−k)! = N (N

−1)

· · ·(N

−k + 1). Thus, as N

→ ∞ for any fixed k ,

limN →∞

N !

(N − k)! N k → 1.

Also (1 − 1/N )N −k = exp[(N − k)ln(1 − 1/N ). Thus for any fixed k ,

limN →∞

(1 − 1/N )N −k = e−1.

Putting these relations together,

limN →∞

Pr(V (1)

N = k) = Pr(V (1) = k) = e−1

k! .

In other words, the limiting rv, V (1), is a Poisson rv with mean 1. By the same argument,V (−1), the limiting number of occurrences of θnφn = −1, is a Poisson rv of mean 1. The limitingrvs V (0) and V (−1) are independent.5 Finally, the limiting rv G0,0 is equal to V (1) − V (−1).Convolving the pmf’s,

Pr(G0,0 = ) =∞

k=0

e−1

(k+l)! · e

−1

k! for ≥ 0.

This goes to 0 rapidly with increasing . The pdf is symmetric around 0, so the same formulaapplies for G0,0 = −.

Exercise 9.10:

(a) Note that the function g(τ, t) defined in the problem is the same as gW (τ, t) defined in (9.42)of the text. That is, it is the baseband time-varying impulse response g(τ, t) (given as a sum of impulses in the simplified multipath model) which is then low pass filtered to W /2. Hence we

5This is mathematically subtle since V (1)N and V (−1)N are dependent. A cleaner mathematical approach wouldbe to find the pmf of the number of values of n that are either ±1 and then to find the conditional number thatare +1 and −1. The answer is the same.

101



use the notation gW (τ, t) in the rest of the problem. Starting with the definition of the outputv(t),

v(t) =

W/2

−W/2u(f )h(f + f c, t)e2πi(f −∆)t df

= W/2

−W/2u(f )g(f, t)e2πift df

=

W/2

−W/2u(f )

gW (τ, t)e−2πifτ dτ

e2πift df

=

gW (τ, t)

W/2

−W/2u(f )e2πif (t−τ ) df

dτ

=

gW (τ, t)u(t − τ ) dτ.

We used the same argument here as in the bandpass time-varying impulse response in the text.

The argument is valid for any W /2 greater than or equal to the baseband input bandwidth.(b) This is derived in (9.43) of the text, and is gk,m = T gW (kT,mT ) with T = 1/W . That is,

gk,m = 1

W gW ( k

W , m

W ).

Note that gk,m is implicitly a function of the bandwidth W used to define the discrete-timemodel. What this equation says is that the taps in the tapped delay line model are the samples(scaled by 1/W ) of the analog time-varying impulse response, filtered to bandwidth W /2.

As a simple example, if g(τ, t) = δ (τ ) (i.e., there is only one path, and it is not fading), thengW (τ, t) = W sinc(W τ ). As W increases, the sinc function increases in amplitude and shrinksin width. The tapped model is independent of

W and satisfies gk,m = 1 for k = 0, m = 0 and

gk,m = 0 otherwise.

(c) Using the result in (b) for the random tap gains, Gk,m = (1/W )GW (k/W , m/W ), we seethat

R( k

W , n

W ) = 1

W E

GW ( k

W , 0)G∗W (

k

W , n

W )

= W E [GW (k, 0)G∗W (k, m)]

= W R(k, n).

(d) We can express R(τ, 0)dτ as

R(τ, 0) dτ = 1W E[|GW (τ, 0)|2] dτ.

The second integral is the expected energy in the channel response as filtered to W /2. Tounderstand the scaling, suppose the fading is flat over the bandwidths W of interest, i.e., thatG(τ, 0) is an impulse at 0 times a random constant of second moment σ2. Then GW (τ, 0) is asinc function of random amplitude and

E [Gw(τ, o)|2dτ = σ2W . Thus

R(τ, 0) = σ2, which isthe mean square amplification of the input. This is independent of W . Perhaps the best lessonhere is that it is easier to work in the discrete tap domain.

102



Exercise 9.11:

(a) From equation (9.59) of the text,

Pr[e|(|G| = g)] = 1

2 exp

− a2g2

2W N 0

.

Since |G| has the Rayleigh density

f |G|(g) = 2ge−g2, (49)

the probability of error is given by

Pr(e) =

∞0

Pr[e|(|G| = g)] f |G|(g) dg

=

∞0

g exp

−g2

1 +

a2

2W N 0

dg

= 2 +

a2

W N 0−1

=

1

2 + E b/N 0

which is the same result as that derived in equation (9.56).

(b) We want to find E[(a/|G|)2]. Since (a/|G|)2 goes to infinity quadratically as |G| → 0 andthe probability density of |G| goes to 0 linearly as |G| → 0, we expect that this expected energymight be infinite. To show this,

E[(a/|G|)2] =

∞0

a2

g2f |G|g) dg =

∞0

a2

g2 2ge−g2 dg

≥

1

0

2a2

g e−1 dg = ∞.

Exercise 9.12:

(a) The complex Gaussian rvs Z 0,Z 1 are independent, which means that the real and imaginaryparts of Z 0 are independent of the real and imaginary parts of Z 1. Since Z 0 and Z 1 are circularlysymmetric, the real and imaginary parts of each separately are independent, so Z 0,re, Z 0,im, Z 1,re,and Z 1,im are all independent and thus jointly Gaussian. They can also be seen to be identicallydistributed.

It can be shown by tedious calculation with the above rvs that Z 0 and Z 1 are also iid andcircularly symmetric, but it is more insightful to use the fact (see Section 7.8.1 of the text) that

(Z 0, Z 1)T

is a linear transformation (over C) of (Z 0,Z 1)T

, and thus is itself jointly circularly-symmetric 6 Gaussian vector and thus specified by its (complex) covariance function. We seethat E[Z 02] = N 0, E[Z 0Z 1] = 0, and E[Z 12] = N 0. Thus Z 0 and Z 1 are independent andhave the joint density given by (7.74).

6A complex Gaussian rv Z is circularly symmetric if eiφZ has the same distribution as Z for all phases φ. It isnot enough for each component of Z to be circularly symmetric. For example, if Z 1 ∼ CN (0, 1),(Z 2) = (Z 1),and (Z 2) is iid with (Z 1),(Z 1), then Z 1 and Z 2 are jointly Gaussian and each are circularly symmetric, butthey are not jointly circularly symmetric.

103



(b) Under U = (a, a), V 0 = aG + Z 0 and V 1 = aG + Z 1 Hence,

V 0 = V 0 + V 1√

2=

√ 2aG +

Z 0 + Z 1√ 2

=√

2aG + Z 0

V 1 = V 1 − V 0√

2

= Z 1 − Z 0√

2

= Z 1

Under U = (a, −a), V 0 = aG + Z 0 and V 1 = −aG + Z 1 Hence,

V 0 = V 0 + V 1√

2=

Z 0 + Z 1√ 2

= Z 0

V 1 = V 1 − V 0√

2= −√

2aG + Z 1 − Z 0√

2= −√

2aG + Z 1

At this point, the problem has been transformed to the flat fading problem treated in Section9.6.1. There are two minor variations. First the a in the text corresponds to

√ 2a here and the

noise variance is scaled differently, N 0/W there and N 0 here. Second, the signal term√

2aGappears here with opposite signs in the two hypotheses. This makes no difference since G iscircularly symmetric and appears only once in each hypothesis (the skeptical student shouldlook at the LLR in part (c) to verify this).

(c) The log likelihood ratio and decision rule are the same as in section 9.6.1 where a is replacedby

√ 2a, v0 is replaced by v 0 and v1 is replaced by v 1. The log likelihood ratio is

LLR(v0, v1) = [|v0|2 − |v1|2]a2

(a2 + W N 0)(W N 0)

and the decision rule is to decode H = 0 if |v0|2 ≥ |v1|2 and H = 1 otherwise.

(d) The original Pr(e) in (9.56) of the text is given by,

Pr{e} =

2 + E bN 0

−1.

This is also valid here, taking E b as 2a2. It is not surprising that the answers are the same, sincefrom a fundamental signal space view, the problems are the same, involving orthogonal (in C)signals in white noise.

(e) The pair (V 0, V 1) is clearly a function of (V 0, V 1 ) since,V 0V 1

=

1√ 2

1 −11 1

V 0V 1

and the transformation matrix is invertible. This is important because it shows that ( V 0, V 1) isa sufficient statistic for optimal detection.

The approach in this exercise is more general than it appears. Any two orthogonal signalhypotheses can be transformed by such a transformation to a basis using one signal as one basisvector and the other signal as the other basis vector. It is required, in reducing the problem tothe flat fading case here that the fading is actually the same over all degrees of freedom.

104



Exercise 9.13:

(a) Since X 0, X 1, X 2, X 3 are independent and the first two have densities αe−αx and the secondtwo have densities β e−βx under u 0,

f X |U (x | u 0) = α2β 2 exp[−α(x0 + x1) − β (x2 + x3)]

f X |U (x | u 1) = α2β 2 exp[−β (x0 + x1) − α(x2 + x3)].

(b) Taking the log of the ratio,

LLR(x ) = (β − α)(x0 + x1) − (β − α)(x2 + x3)

= (β − α)(x0+x1−x2−x3).

(c) Convolving the density for X 0 (conditional on u 0) with the conditional density for X 1,

f Y 0|U (y0 | u 0) = α2y0e−αy0 .

Similarly,

f Y 1|U (y1 | u 0) = β 2y1e−βy1.

(d) Given U = u 0, an error occurs if Y 1 ≥ Y 0. This is somewhat tedious, but probably thesimplest approach is to first find Pr(e) conditional on u 0 and Y 0 = y0 and then multiply by theconditional density of y0 and integrate the answer over y0.

Pr(e | U = u 0, Y 0 = y0) =

∞y0

β 2y1e−βy1 dy1 = (1 + βy0)e−βy0.

Performing the final tedious integration,

Pr(e) = Pr(e | U = u 0) = α3 + 3α2β

(α + β )3 =

1 + 3β/α

(1 + β/α)3.

Since β /α = 1 + E b/2N 0, this yields the final expression for Pr(e). The next exercise generalizesthis and derives the result in a more insightful way.

(e) Technically, we never used any assumption about this correlation. The reason is the sameas with the flat fading case. G0,0 and G1,0 affect the result under one hypothesis and G0,2 andG1,2 under the other, but they never enter anything together.

Exercise 9.14:

(a) Under hypothesis H=1, we see that V m = Z m for 1 ≤ m < L and V m = √ E bGm−L,m + Z mfor L ≤ m < 2L. Thus (under H = 1), V m is circularly symmetric complex Gaussian withvariance N 0

2 per per real and imaginary part and V m is similarly circularly symmetric complex

Gaussian with variance E b2L + N 0

2 . That is, given H = 1, V m ∼ CN (0, N 0) for 0 ≤ m < Land V m ∼ CN (0, E b/L + N 0) for L ≤ m < 2L. In the same way, conditional on H = 0),V m ∼ CN (0, E b/L + N 0) for 0 ≤ m < L and V m ∼ CN (0, N 0) for L ≤ m < 2L. Conditional oneither hypothesis, the random variables V 0, . . . , V 2L−1 are statistically independent.

105



in the problem statement). Thus the receiver now makes 2L − 1 hard decisions, each correctwith probability p and, given the hypothesis, each independent of the others. Thus, using these2L − 1 hard decisions to make a final decision, the ML rule for the final decision is majorityrule on the 2L − 1 local hard decisions. This means that the final decision is in error if L ormore of the local decisions are in error. Since p is the probability of a local decision error, (50)

gives the probability of a final decision error. This is the same as the error probability with Lthorder diversity making an optimal decision on the raw received sequence. We note that in thesituations such as that assumed here, where diversity is gained by dissipating available power,there is an additional 3dB power advantage in using ‘soft decisions’ beyond that exhibited here.

Exercise 9.15:

(a) The interpretations required in this exercise are not as trivial as they appear, but provideadded insight into the baseband equivalent model. The delay on path 2 is

τ 2(t) = r2(t)/c = (r0 + ∆r + vt)/c.

(f + f c)τ 2(t) = f r0

c +

f ∆r

c +

f vt

c +

f cr0

c +

f c∆r

c +

f cvt

c . (53)

The first path, at distance r0 is assumed to have response 1 at frequency f c + f and time 0. Thedifference between the two paths, at distance r0, is accounted for by the phase φ of the antennapattern, which is implicitly assumed in (9.89) to be constant over the range of t and f of interest.Thus the sum of the first and 4th term in (53) must be 0. The fifth term doesn’t involve f or t,and thus can be incorporated with the fixed phase term φ, making ψ = φ + f c∆r/c. Finally, thethird term above contains the product f t, which is negligible compared to the remaining termsfor small f and t. Perhaps more to the point, this term is 0 when looking at variations solelywith t or solely with f , which are the issues with finding

D and

L. Substituting the remaining

terms of (53) into (9.89) of the problem statement, we get (9.90).

(b) Note that ∆r/c is the delay difference between the 2 paths at time 0, and is thus what hasbeen defined as the multipath spread L. The multipath spread varies with t, but this changeis usually not significant over the time intervals involved in signal detection; in any case, if onewants to view L as a function of t, one can not ignore the third term in (53). The same issueissue determines D as f cv/c.

(c) The smallest t > 0 at which g(0, t) = 2 is the smallest t > 0 at which f cvt/c = 1, whichis t = c/f cv, i.e., t = 1/D. It appears reasonable to define coherence time this way, since it isthe period of the fading in this case and some authors define coherence time this way. Thereare two problems with this definition. First, receivers would recover the carrier frequency as

half way between f c and f c + f /2, leading to a different system function (see Section 9.3.2), andsecond, coherence time is used to represent the time over which the system function is relativelyunchanging. Thus, T coh is defined in the text for this case as c/(2f cv).

(d) In the same way as in part (c), the smallest f > 0 at which g(f, 0) = 2 is f = c/∆r. It isagain plausible to define this as the coherence frequency F coh, but as above it is more reasonable,and in conformance with the text, to define F coh in this case as c/(2∆r).

108



Exercise 9.16: See the solution to Exercise 8.10, which is virtually identical.

Exercise 9.17:

(a) Note that the kth term of u ∗ u † is

(u ∗ u †)k =

uu ∗+k = 2a2nδ k.

We assume from the context that n is the length of the sequence and assume from the factorof 2 that this is an ideal 4-QAM PN sequence. Since u 2 is the center term of u ∗ u †, i.e.,(u ∗u †)0, it follows that u 2 = 2a2n. Similarly, b 2 is the center term of (b ∗ b †)0. Using thecommutativity and associativity of convolution,

b ∗ b † = u ∗ g ∗ b † ∗ g †= g ∗ u ∗ u † ∗ g †= 2a2ng ∗ g †.

Finally, since g 2 is the center term of gg †, i.e., (g ∗ g )0

b 2 = 2a2ng 2 = u 2g 2.

(b) If u 0 and u 1 are ideal PN sequences as given in part (a), then u 02 = u 12 = 2a2n. Usingpart (a) then,

b 02 = u 02g 2 = u 12g 2 = b 12.

Exercise 9.18:

(a) This is equivalent to the conventional problem of antipodal waveforms in WGN. The wave-forms (after passing through the channel) are ±v(t) where v(t) = sinc(t) − sinc(t − ). MLdetection is accomplished by passing the received signal V (t) = ±v(t) + Z (T ) through a filtermatched to v (t) and sampling, i.e. by forming v ,v =

v(t)v(t)dt and deciding H = 0 if the

inner product is positive and H = 1 otherwise.

(b) The probability of error is the conventional antipodal waveform in WGN result,

Pr(e) = Q

||v ||

N 0/2

= Q

2E

N 0

.

(c) Using the hint, v (t) ≈ d(sinc(t)

dt so v(f ) ≈ 2πif rect(f ). We then have

E =

|v(t)|2dt =

|v(f )|2df ≈ 4π22

−1/2

1/2f 2df = π22/3.

Substituting this into part (b),

P (e) = Q

π

2

3N 0

.

109



(d)The multipath spread is , which is considered very small relative to the signaling interval,so that the discrete channel is modeled with a single tap having the value7 g = 1 − sinc(−).The input is ±1. Thus the real part of the discrete-time observation at the receiver is v =±[1 − sinc()] + Z (0) where Z (0) ≈ N (0, N 0/2). This is the standard problem of antipodal onedimensional signals in Gaussian noise. Thus

Pr(e) = Q

2[1 − sinc()]2

N 0

.

If we approximate 1 − sinc() for small, we see by sketching the function that it is 0 at = 0and quadratic in . Analytically, we can expand sinc() in a power series to get

sinc() = sin(π)

π =

1

π[π − (π)3

6 + · · · ] = 1 − (π)2/6 + · · · .

Thus, 1 − sinc() ≈ (π)2/6, so

Pr(e) = Q(π)2

6

2N 0

.

(e)For δ small, Q(δ ) ≈ 12 + δ√

2π. Thus, with the analog technique, Pr(e) approaches 1/2 linearly

in and with the digital technique it approaches 1/2 quadratically in . Both show bad fading(since the fading is actually bad), but the discrete technique is considerably worse.

(f) The problem with the discrete approach is that when two terms almost cancel out at thecenter tap, the other taps can be more significant. It is not hard to see that the other tapsare linear rather than quadratic in . In fact, if we had sampled at a time /2, the output of the single tap would have been exactly 0, and all of the contribution of u(t) would have beenrepresented at the other taps. In other words, the approximation of using only a single tap to

approximate the channel when the time spread is small need not work well in a deep fade. Thediscrete time model is not inherently faulty, but the assumption that taps outside the multipathspread can be safely ignored is faulty.

principles of digital communication_solutions

Documents