1 a general lower bound on the decoding complexity of

1

A general lower bound on the decoding

complexity of message passing decodingPulkit Grover and Anant Sahai

Department of Electrical Engineering and Computer Sciences, UC Berkeley

pulkit,[email protected]

Abstract

We find lower bounds on complexity-performance tradeoffs for message-passing decoding model,

where the complexity is measured by the size of the largest neighborhood, and the performance by the

rate and the average bit-error probability. These lower bounds show that the maximum neighborhood size

increases unboundedly as the error probability approaches zero, and as the rate approaches capacity. No

assumption is made on the code structure or on particular implementation of message-passing decoding:

the obtained bounds are based purely on the limitations of the message-passing decoding model. In

contrast with most of the existing results that hold on average over an ensemble, these bounds hold

for any chosen code. The bounds are derived using an application of the sphere-packing concept to

neighborhood sizes of message passing decoding, instead of the usual blocklength. Next, assuming that

the code is a Low-Density Parity-Check code, we show that similar lower bounds hold as the rate

approaches Gallager’s upper bound. Results with slightly weaker scaling are also derived for average

neighborhood size for any code.

I. INTRODUCTION

It is desirable for a channel code to operate at rates close to the channel capacity, at low error probability,

and to have low encoding and decoding complexity. To what extent are these goals consistent with each

other? Historically, discovery of any class of codes and/or encoding/decoding algorithm has therefore

been followed by an analysis of the performance-complexity tradeoffs for the code/algorithm. These

tradeoffs treat proximity to capacity and error probability as the performance of the code, and measure

the complexity of encoding/decoding as the number of computations performed in a computation model,

or as a function of a code parameter.

October 29, 2010 DRAFT

2

Shannon (1959) [1] (for AWGN channels) and Shannon, Gallager and Berlekamp (1974) [2] (for

arbitrary discrete memoryless channels) derived outer bounds on the performance complexity tradeoffs

for block-codes. Since the decoding algorithm can be different for different sub-families of block-codes,

the measure of complexity was taken to be the block-length. The obtained results show that the error

probability Pe for a rate R code operating close to the channel capacity C at behaves as Pe ≈ e−mEr(R)

where m is the block-length, and Er(R) = Θ((C −R)2

)is the random-coding error exponent. The

block-length thus diverges to infinity as the error probability converges to zero, and as the rate approaches

capacity. To obtain decoding complexity in the Turing-machine model for a specific code and decoding

algorithm, one only needs to understand the complexity as a function of the block-length. Inner bounds

(achievability) based on random coding and expurgation of “bad” codewords were derived by Gallager

in 1965 [3]. These bounds show that the outer bounds on the exponents are tight at high rates. Recent

works by Fossorier [4], Weichman and Sason [5], and Polyanskiy, Poor and Verdu [6] tighten the inner

and outer bounds at finite block-lengths.

Convolutional codes, discovered by Elias in 1955 [7], called for a similar analysis. With termination

at finite lengths, these codes can be thought of as block-codes, but the performance of these codes is

limited by the constraint-length (or more generally, the length of the memory at the encoder), which is

typically much smaller than the block-length. The results for block-codes were thus no longer useful for

these codes. Since the codes are based on dynamical systems, it is meaningful to treat the memory-length

at the encoder is used as a measure of complexity. The inner bound is obtained using finite-constraint-

length linear convolutional codes. The outer bound only depends on the encoding memory-length [8],

[9], and shows that the memory-length must diverge to infinity in a manner similar to that for block-

codes. Since the complexity of Viterbi decoding [8] increases exponentially in the constraint length, the

result characterizes the performance-complexity tradeoff for Viterbi decoding. A separate analysis was

performed for sequential decoding model. Jacobs and Berlekamp [10] showed that the (random) number

of guesses required in sequential decoding is lower-bounded by a Pareto random variable. Successively

smaller moments of the the number of guesses diverge as the rate approaches capacity — notably, there

is a computational cutoff rate above which the average number of guesses is infinity. These lower bounds

depend only on the operating rate and the channel capacity. Matching inner bounds are obtained using

random linear convolutional codes and specific sequential decoding algorithms [11], [12]. The tightness

of the inner and outer bounds was shown by Arikan in his thesis [13].

A clear pattern thus emerges in such analyses — outer bounds are obtained by abstracting away

from the code structure, allowing the code to be arbitrary apart from keeping the complexity-measure


3

constant. Inner bounds possibly show the tightness of the outer bounds in an asymptotic sense. Tightness

ensures that the limitations of the code family or the decoding model are understood completely, allowing

comparison with other families and newer constructions that defy the bounds.

The last two decades have seen re-emergence of sparse-graph codes, such as the Low-Density Parity-

Check (LDPC) codes, and associated message-passing based decoding algorithms [14]. LDPC codes,

along with some message-passing based decoding algorithms were developed by Gallager in his the-

sis [15]. The ideas have since been extended to many other classes of codes (e.g. irregular LDPC

codes, Irregular Repeat-Accumulate (IRA) codes, Accumulate-Repeat-Accumulate (ARA) codes) and

message-passing based decoding algorithms (e.g. belief-propagation (BP) decoding [16]). To understand

the limitations of these new code classes and decoding algorithms, a similar analysis as for other code

families and decoding models is needed here.

The issue was explored first by Gallager himself in his thesis, where he derived upper bounds on

the rate for reliable communication as a function of the number of ones in each row of the parity-check

matrix of a regular LDPC code [15, Pg. 38]. This has been interpreted as a result on graphical complexity

of the LDPC code, where graphical complexity is the average number of edges per information bit in

the graphical representation [17] of these codes. The result generalizes to irregular-LDPC codes [18], but

does not extend to many other code families of interest. In fact, Pfister, Sason, and Urbanke [19] provide

a BEC-capacity-achieving sequence of non-systematic IRA codes that has bounded graphical complexity

(under belief-propagation decoding). Pfister and Sason further extended it in [20] by giving systematic

constructions for the BEC with similar properties. Hsu and Anastasopoulos extended the result in [21]

to codes for noisy channels with bounded graphical complexity (under ML decoding). In conclusion,

the tradeoff between the graphical complexity and the performance is trivial in that there are bounded

graphical complexity constructions that approach capacity and attain arbitrarily low error probability, at

least over the BEC.

While graphical complexity can be interpreted abstractly as a measure of code complexity itself,

more concretely, in message-passing decoding, it is a measure of complexity per-iteration. This naturally

suggests another measure of complexity for message-passing decoding model — the number of iterations.

The issue was first addressed by Khandekar and McEliece where they conjectured (based on EXIT chart

arguments) the scaling of the number of iterations with error probability and the gap from capacity.

In [22, Pg. 69-72] and [23], They conjecture that for all sparse-graph codes, the number of iterations

must scale either multiplicatively as Ω(

1gap log2

(1〈Pe〉

)), or additively as Ω

(1

gap + log2

(1〈Pe〉

))in

vicinity of capacity and low error probability. This conjecture comes from an EXIT-chart based graphical


4

argument concerning the message-passing decoding of sparse-graph codes over the BEC. The intuition

is that the bound should also hold for general memoryless channels, since the BEC is the channel with

the simplest decoding.

In [24], Sason considers specific families of codes: the LDPC codes, the Accumulate-Repeat-Accumulate

(ARA) codes, and the Irregular-Repeat Accumulate (IRA) codes. He demonstrates that in the presence

of degree-2 nodes in the graph, the number of iterations scales as Ω( 1gap ) for the BEC in the limit of low

error probability, partially resolving the Khandekar-McEliece conjecture (the dependence on the error

probability is not addressed, nor is the issue settled for other classes of sparse-graph codes). If, however,

the fraction of degree-2 nodes for these codes converges to zero, then the bounds in [24] become trivial.

Sason notes that all the known traditionally capacity-achieving sequences of these code families have a

non-zero fraction of degree-2 nodes.

The approach in [24], however, does not abstract away from the code structure to evaluate the limits

of the message passing algorithm itself. Such a study is necessitated by the discovery of ever new classes

of codes based on sparse-graphs can be decoded using such algorithms. Further, it has been suggested

that even some classes of codes that are not based on sparse-graphs (e.g., Reed-Solomon codes [25], and

the promising new class of polarization codes [26]) can also be decoded using message-passing based

algorithms.

Since all these codes are a sub-class of block-codes, a hope could be that the results for block coding

are applicable here. However, for the low-complexity decoding algorithms proposed by Gallager (now

known as Gallager A and Gallager B algorithms), each message bit need not make use of the entire block-

length to decode its value. Instead, each bit decodes its value by passing messages over the code-graph,

utilizing only a small subset of the received symbols that lie in its graphical neighborhood. The size of

this decoding neighborhood is thus a natural measure of complexity for message-passing decoding.

In this paper, we provide general bounds on the decoding complexity of message-passing decoding.

Instead of the analyzing the number of iterations, we instead consider a related measure of neighborhood

size (see Table I for a quick summary comparison of results in the area.). The two measures are related

by the maximum connectivity at the decoder, an issue that is further explored in [27], [28][waterslide].

The approach here is based on that in [29], but the results obtained here are more general and stronger.


5

It is desirable for a channel code to operate at rates close to the channel capacity, at low error

probability, and to have low encoding and decoding complexity. A natural question is whether these goals

are consistent with each other. Historically, discovery of any class of codes and/or encoding/decoding

algorithm is followed by an analysis of the performance-complexity tradeoffs for the code/algorithm. The

approaches for this analysis can be classified into four different types:

1) Fixing the code (or the code family) and a particular decoding algorithm.

2) Fixing the code (or the code family) but not the decoding algorithm.

3) Fixing only the decoding algorithm, and allowing code to be arbitrary.

4) Allowing the code and the decoding algorithm to be arbitrary.

In each of these approaches, the problem is investigated from two complementary directions — the

achievability and the converse. The toolset for the two directions is often quite different. Providing

a good inner bound (the achievability) requires developing a good code and a suitably inexpensive

encoding/decoding algorithm. One only needs to ensure that the code family and/or the decoding algorithm

lie in the set of interest. In contrast, the outer bound (the converse) may allow for many possible codes

or decoding algorithms (or both).

Approach 1 yields the strongest bounds, but of limited applicability since both the code family and

the decoding algorithm are fixed. Approaches 2 and 3, in contrast, offer insights on the limitations of the

code family or the decoding algorithm in isolation, and are thus more general. The problem formulation

of approach 4 is vague unless we adhere to a computation model to measure the complexity.

The same problem afflicts approach 2. However, for a specific code family, the issue of algorithmic

complexity can be side-stepped by using a code parameter (e.g. block-length) as a measure of complexity.

For any choice of encoding/decoding algorithm, the complexity can then be calculated as a function of

this measure. The analysis of performance-complexity tradeoffs for block-codes is a good example of this

approach. Random-coding and expurgated error exponents [30] provide achievable tradeoffs (that serve

as the inner bounds) between the complexity (measured in block-length) and the error probability. On the

other hand, sphere-packing and straight-line bound converses provide the outer bounds [30]. The analysis

is not without its limitations. For example, it does not provide any insight for decoding algorithms that

do not use the entire block for decoding each symbol — a point we shall revisit later.

Approach 2 has also been followed for codes defined using dynamical systems, the length of memory

at the encoder is used as a measure of complexity. The inner bound is obtained using finite-constraint-

length linear convolutional codes. The outer bound only requires the memory at the encoder to obtain

the decoding complexity as a function of this memory length [8], [9].


6

In Approach 3, without restricting attention to any particular code family, a class of encoding/decoding

algorithms is considered. Explicit complexity results in the number of computations required at the

encoder/decoder are then obtained. For example, this approach was used to analyze the complexity-

performance tradeoffs for sequential decoding. Jacobs and Berlekamp [10] showed that the (random)

number of guesses required in sequential decoding is lower-bounded by a Pareto random variable,

regardless of the choice of the code. Successively smaller moments of the the number of guesses diverge

as the rate approaches capacity — notably, there is a computational cutoff rate for sequential decoding

corresponding to when the average number of computations must diverge. Matching inner bounds are

obtained using random linear convolutional codes and specific sequential decoding algorithms [11], [12].

The past few years have seen emergence of sparse-graph codes and associated message-passing based

iterative decoding algorithms [14]. Thus the family of sparse-graph codes and the message-passing

decoding algorithm demand a similar understanding as we have for other classes of codes and decoding

algorithms. Message-passing based iterative decoding algorithms decode each bit using a “neighborhood”

of channel outputs that is specific to the bit. The size of this neighborhood can be much smaller than

the block-length (indeed, much of the analysis assumes that the block-length is infinite). Thus the results

from block-codes, while valid, do not offer insights into performance-complexity tradeoffs for decoding

complexity under message-passing decoding.

Following approach 2, analysis has been performed for sub-families of sparse-graph codes by measuring

the “graphical complexity” of these codes, which is the average number of edges per-bit in the Tanner

graph representation of these codes [14]. Graphical complexity serves as a measure of the complexity

per-iteration in decoding of these codes, but also as the implementation complexity for the encoder and

the decoder. For regular LDPC codes, Gallager1 provided outer bounds on the graphical complexity for

the BSC channel showed that the graphical complexity must diverge to infinity in order for these codes to

approach capacity [15, Pg. 38]. The result has subsequently been generalized to irregular codes over any

binary-input symmetric memoryless channel [31], [32] and even to some classes of Markov channels [33].

These results show that the graphical complexity of LDPC codes must go to infinity as Ω(

ln(

1gap

))in

the limit of low error probability. Here we use the Ω notation to denote lower-bounds in the order sense

of [34]. Observe that these results hold for any decoding algorithm.

Mere puncturing of the code, however, leads to a dense-parity check matrix, and hence the results for

1The graphical interpretation of these codes, and hence of Gallager’s results, came only much later (in 1981) and is due to

Tanner [17].


7

LDPC codes are not directly applicable. This observation naturally suggests that bounds on graphical

complexity of LDPC codes can be beaten. In fact, Pfister, Sason, and Urbanke demonstrate in [19] a

BEC-capacity-approaching family of non-systematic codes whose graphical models contain a bounded

average number of edges per information bit (under belief-propagation decoding). Pfister and Sason

further extended it in [20] by giving systematic constructions for the BEC with similar properties. Hsu and

Anastasopoulos extended the result in [21] to codes for noisy channels with bounded graphical complexity

(under ML decoding). In conclusion, the complexity-performance tradeoff for graphical complexity is

trivial in that the complexity is bounded regardless of the desired rate.

The issue has been considered by looking at the number of iterations, primarily using approach 1 (i.e.

by restricting the code family as well as the decoding algorithm). In [22, Pg. 69-72] and [23], Khandekar

and McEliece conjectured that for all sparse-graph codes, the number of iterations must scale either

multiplicatively as Ω(

1gap log2

(1〈Pe〉

)), or additively as Ω

(1

gap + log2

(1〈Pe〉

))in vicinity of capacity

and low error probability. This conjecture comes from an EXIT-chart based graphical argument concerning

the message-passing decoding of sparse-graph codes over the BEC. The intuition is that the bound should

also hold for general memoryless channels, since the BEC is the channel with the simplest decoding.

In [24], Sason considers specific families of codes: the LDPC codes, the Accumulate-Repeat-Accumulate

(ARA) codes, and the Irregular-Repeat Accumulate (IRA) codes. He demonstrates that in the absence of

degree-2 nodes in the graph, the number of iterations scales as Ω( 1gap ) for the BEC in the limit of low

error probability, partially resolving the Khandekar-McEliece conjecture (the dependence on the error

probability is not addressed, nor is the issue settled for other classes of sparse-graph codes). If, however,

the fraction of degree-2 nodes for these codes converges to zero, then the bounds in [24] become trivial.

Sason notes that all the known traditionally capacity-achieving sequences of these code families have a

non-zero fraction of degree-2 nodes.

Neither of these two approaches abstracts away from the code structure to evaluate the limits of

the message passing algorithm itself, which is necessitated by the discovery that many new classes of

codes based on sparse-graphs can be decoded using such algorithms. Further, it has been suggested that

even some classes of codes that are not based on sparse-graphs (e.g., Reed-Solomon codes [25], and

the promising new class of polarization codes [26]) can also be decoded using message-passing based

algorithms. In this paper, we use approach 3 to provide general bounds on the decoding complexity

of message-passing decoding. Instead of the analyzing the number of iterations, we instead consider a

related measure of neighborhood size (see Table I for a quick summary comparison of results in the

area.). The two measures are related by the maximum connectivity at the decoder, an issue that is further


8

explored in [27], [28][waterslide]. The approach here is based on that in [29], but the results obtained

here are more general and stronger.

This paper is organized as follows. We introduce the notation and the problem in Section II. Section III

provides the lower bounds on the maximum neighborhood size as a function of the desired bit-error

probability and the rate. Tighter bounds are provided for LDPC codes of given average check degree,

and also for sparse-graph codes with given threshold behavior. In Section III-D, slightly weaker results

are derived for the average neighborhood size, instead of the maximum neighborhood size. We conclude

in Section IV.

Lower bounds on complexity of algorithms in circuit implementations have yielded [35] lower bounds

on energy consumption in circuits. In a companion paper [waterslide], we explore parallel implications

on decoding energy for message-passing decoding.

TABLE I

COMPARISON OF VARIOUS RESULTS ON COMPLEXITY-PERFORMANCE TRADEOFFS

Reference Codes decoding algorithm channel Lower bound on complexity

Gallager [15] regular LDPC ML (and hence all) BSC code density=Ω(log(1/gap))

Burshtein et al [31] LDPC ML (and hence all) Symmetric code density=Ω(log(1/gap))

Sason et al [32] (including irregular) memoryless

Khandekar et al [23], LDPC, IRA, ARA Belief-Propagation decoding BEC iterations=Ω(1/gap)

Sason et al [24]

This paper all any message-passing decoding BSC or nbd size= Ω

(log2

(1〈Pe〉

)gap2

)(including non-linear) AWGN

II. NOTATIONS, DEFINITIONS AND PROBLEM STATEMENT

In the following, we introduce the notation and describe our model of the decoder. Consider a point-

to-point communication link. An information sequence Bk1 is encoded into 2mR codewords Xm

1 , using a

possibly randomized encoder. The information sequences are assumed to consist of iid fair coin tosses

and hence the rate of the code is R = k/m. Following tradition, both k and m are considered to be very

large.

Two channel models are considered: the BSC and the power-constrained AWGN channel. The observed

channel output is denoted by Ym1 in either case. For the BSC, the underlying channel crossover probability

is denoted by p. In our derivations we often need a “test” channel, denoted by G, that models a deviant


9

channel behavior. For BSC, the crossover probability of the test channel is denoted by g. For the AWGN

channel, the average received power is denoted by PT , and the noise variance by σ20 so the received SNR

is PTσ2

0. The noise variance of the AWGN test channel is denoted by σ2

G.

We do not impose any a priori structure on the code itself. The decoding algorithm estimates the value

of each bit from a subset of channel outputs that is decided apriori for each bit. We refer to this subset

of channel as the decoding “neighborhood” of the i-th bit. This neighborhood can be generated, for

example, using iterative decoding algorithms. We use 〈Pe,i〉 to denote the average probability of bit error

on the i-th message bit (averaged over over the channel realizations, the messages, the encoding, and the

decoding) at the end of the decoding. Similarly, 〈Pe〉 = 1k

∑i 〈Pe,i〉 is used to denote the overall average

probability of bit error after l iterations. Appropriate subscripts are used to denote the test channel, for

example, while 〈Pe〉 is used to denote the average bit-error probability for the true channel, 〈Pe〉G denotes

average bit-error probability under a test channel G.

III. LOWER BOUNDS ON THE REQUIRED MAXIMUM NEIGHBORHOOD SIZE

In this section, lower bounds are derived on the required maximum neighborhood size in message-

passing decoding as a function of the gap from capacity and the desired error probability. These bounds

reveal that the maximum neighborhood size must grow unboundedly with improved system performance

— the rates approach capacity and the error probability converges to zero.

A. Lower bounds on the probability of error as a function of the maximum neighborhood size

The main bounds are given by theorems that capture a local sphere-packing effect. These bounds are

closely related2 to the local sphere-packing bounds used in the context of streaming codes with bounded

bit-delay [36], [37]. These sphere-packing bounds can be turned around to give a family of lower bounds

on the neighborhood size n as a function of 〈Pe〉. This family is indexed by the choice of a hypothetical

channel G and the bounds can be optimized numerically for any desired set of parameters.

Theorem 1: Consider a BSC with crossover probability p < 12 . Let n be the maximum size of the

decoding neighborhood of any individual bit. The following lower bound holds on the average probability

of bit error

〈Pe〉 ≥ supC−1(R)<g≤ 1

2

h−1b (δ(G))

22−nD(g||p)

(p(1− g)

g(1− p)

)ε√n, (1)

2The main technical difference between the results here and those of [36] is that the bounds here are nonasymptotic and are

applicable to average probability of bit-error rather than the maximum probability of error.


10

where hb(·) is the usual binary entropy function, D(g||p) = g log2

(gp

)+ (1− g) log2

(1−g1−p

)is the usual

KL-divergence, and

δ(G) = 1− C(G)

R(2)

where C(G) = 1− hb(g)

and ε =

√√√√ 1

K(g)log2

(2

h−1b (δ(G))

)(3)

where K(g) = inf0<η<1−g

D(g + η||g)

η2. (4)

Proof: See Appendix A.

For regular-LDPC codes, in [15, Pg. 38], Gallager provided an upper bound on the rate that is valid for

ML decoding (and hence any decoding algorithm). The bound has since been extended to irregular-LDPC

codes as well [32], and is given by R < 1− hb(p)hb(pd) , where pd = 1−(1−2p)d

2 , where d is the average check

node degree of the LDPC code. Because this bound is strictly smaller than the channel capacity, it is a

natural question whether this bound acts as the capacity for LDPC codes in Theorem 1. This intution

is formalized in the next theorem. Further, specializing the decoding algorithm to belief-propagation

decoding [16] (and related algorithms, such as Gallager A and Gallager B [16]) and random ensembles

of sparse-graph codes [14], tighter bounds can be derived which are also provided in the next theorem.

Theorem 2: For setting as in Theorem 1, for LDPC codes of average check node degree d,

〈Pe〉 ≥ supg:1− hb(g)

hb(gd)<R

h−1b (δ2(d,G))

22−nD(g||p)

(p(1− g)

g(1− p)

)ε2√n(5)

where hb(·) and D(g||p) are as in Theorem 1, gd = 1−(1−2g)d

2 , and

δ2(d,G) = hb(gd)−hb(gd)− hb(g)

R(6)

and ε2 =

√√√√ 1

K(g)log2

(2

h−1b (δ2(d,G))

), (7)

for K(g) as defined in Theorem 1. Further, for setting as in Theorem 1, for iteratively decoded sparse-

graph codes with threshold parameter p∗, and δ3 as the minimum average bit-error probability above

threshold,

〈Pe〉 ≥ supg>p∗

h−1b (δ3)

22−nD(g||p)

(p(1− g)

g(1− p)

)ε3√n(8)


11

where

ε3 =

√√√√ 1

K(g)log2

(2

h−1b (δ3)

). (9)

Proof: The proofs are similar to that of Theorem 1, and are relegated to Appendix B. The first

involves proving a version of Fano’s inequality using Gallager’s upper bound. The second relies on the

observation that for Belief-Propagation and associated algorithms, there exists a non-zero lower bound

on the error probability regardless of the number of iterations.

The second part of Theorem 2 holds for the decoding algorithms that operate on the code graph

itself for codes that are generated randomly, as specified by the “socket construction” of [16]. Since the

threshold can depend on scheduling of message-passing, we assume that the threshold is calculated for

the same scheduling of message-passing as is implemented to obtain the neighborhoods. We note here

that δ3 can be zero for some sparse-graph codes, such as the LDGM codes, and the bound is trivial for

such codes.

Theorem 3: For the AWGN channel and the decoder model in Section II, let n be the maximum size

of the decoding neighborhood of any individual message bit. The following lower bound holds on the

average probability of bit error.

〈Pe〉 ≥ supσ2G: C(G)<R

h−1b (δ(G))

2exp

(−nD(σ2

G||σ20)−

√n

(3

2+ 2 ln

(2

h−1b (δ(G))

))(σ2G

σ20

− 1

))(10)

where δ(G) = 1−C(G)/R, the capacity C(G) = 12 log2

(1 + PT

σ2G

), and the KL divergence D(σ2

G||σ20) =

12

(σ2G

σ20− 1− ln

(σ2G

σ20

)).

The following lower bound also holds on the average probability of bit error

〈Pe〉 ≥ supσ2G>σ

20µ(n): C(G)<R

h−1b (δ(G))

2exp

(−nD(σ2

G||σ20)− 1

2φ(n, h−1

b (δ(G)))

(σ2G

σ20

− 1

)), (11)

where

µ(n) =1

2

(1 +

1

T (n) + 1+

4T (n) + 2

nT (n)(1 + T (n))

)(12)

where T (n) = −WL(− exp(−1)(1/4)1/n) (13)

and WL(x) solves x = WL(x) exp(WL(x)) (14)

while satisfying WL(x) ≤ −1 ∀x ∈ [− exp(−1), 0],

and

φ(n, y) = −n(WL

(− exp(−1)

(y2

) 2

n

)+ 1

). (15)


12

The WL(x) is the transcendental Lambert W function [38] that is defined implicitly by the relation (14)

above.

Proof: See Appendix C.

The expression (11) is better for plotting bounds when we expect n to be moderate while (10) is more

easily amenable to asymptotic analysis as n gets large.

B. ‘Gap’ to capacity

In the vicinity of capacity, it is possible to communicate at rates above the channel capacity for finite

bit-error probability. Thus the notion of gap from capacity needs delicate handling. In the following, we

show that the appropriate definition of gap is C1−hb(〈Pe〉) −R.

Before transmission, the k bits could be lossily compressed using a source code to ≈ (1− hb(〈Pe〉))k

bits. The channel code could then be used to protect these bits, and the resulting codeword transmitted

over the channel. The decoder can decode jointly the source and the channel code. This scheme is optimal

by the lossy source-channel separation theorem. Therefore, for fixed 〈Pe〉, the maximum achievable rate

is C1−hb(〈Pe〉) .

The appropriate definition of the total gap is, therefore, C1−hb(〈Pe〉) − R. This can be broken down as

sum of two ‘gap’sC

1− hb(〈Pe〉)−R =

C

1− hb(〈Pe〉)− C

+ C −R (16)

The first term goes to zero as 〈Pe〉 → 0 and the second term is the intuitive idea of gap to capacity.

The traditional approach of error exponents is to study the behavior for fixed gap and allowing 〈Pe〉 →

0. Considering the error exponent as a function of the gap reveals something about how difficult it is to

approach capacity.

The natural other path is to fix 〈Pe〉 > 0 and let R→ C. It turns out that the bounds of Theorems 1 and

3 do not give very interesting results in such a case. We need 〈Pe〉 → 0 alongside R→ C. To capture the

intuitive idea of gap, which is just the second term in (16), we want to be able to assume that the effect

of the second term dominates the first. This way, we can argue that the decoding complexity increases

to infinity as gap → 0 and not just because 〈Pe〉 → 0. For this, it suffices to consider 〈Pe〉 = gapβ . As

long as β > 1, the 2 logα1

gap scaling on iterations holds, but it slows down to 2β logα1

gap for 0 < β < 1.

When β is small, the average probability of bit error is dropping so slowly with gap that the dominant

gap is actually the(

C1−hb(〈Pe〉) − C

)term in (16).


13

C. Lower bounds on the required maximum neighborhood size as the rate approaches capacity

1) A simple lower bound: Given a crossover probability p, there exists a semi-trivial bound on

the neighborhood size that only depends on the 〈Pe〉. Since there is at least one configuration of the

neighborhood that will decode to an incorrect value for this bit, it is clear that

〈Pe〉 ≥ pn. (17)

However, this bound does not have any dependence on the rate and so does not capture the fact that the

complexity should increase as the gap shrinks.

A similar bound can be derived for general channels. The above bound implicitly allows for repetition

coding in that it assumes that each codeword symbol encodes information contained only for the particular

bit. This concept can be extended to general channels. If the entire neighborhood of Bi were to encode

only the information for Bi, this scheme has a corresponding rate of 1n for a neighborhood of size n.

For large n, we can use the sphere-packing bound to obtain the following lower bound on the bit-error

probability

〈Pe〉 & e−nEsp(1

n) ≥ e−nEsp(0). (18)

The bound is clearly loose, because it requires the entire neighborhood to encode just one bit.

2) Lower bound on n as rate approaches capacity and 〈Pe〉 approaches 0: Section III-A suggests

that the dominant terms governing the number of iterations as we approach capacity are logα ln 1〈Pe〉 +

2 logα1

gap . However, the argument of the previous section requires that 〈Pe〉 → 0 for a finite non-zero

gap. The form of the expression however, seems to suggest that unless 〈Pe〉 is going to zero very rapidly,

the dominant term could be the 2 logα1

gap . To verify this, in this section we consider a joint path to

capacity with 〈Pe〉 = gapβ .

Theorem 4: For the problem as stated in Section II, we obtain the following lower bounds on the

required neighborhood size n for 〈Pe〉 = gapβ and gap → 0.

For the BSC and the AWGN channel,

• For β < 1, n = Ω(

1gap2β−ν

),

• For β ≥ 1, n = Ω(

1gap2−ν

),

for any ν > 0. The same scaling laws hold for LDPC codes of average degree d used over the BSC, with

gap being the gap from Gallager’s upper bound C = 1− hb(pd)hb(p)

, where pd is defined in Theorem 2. For

sparse-graph codes under iterative-decoding with threshold p∗ and δ3 as the minimum average bit-error

probability above threshold, the scaling is n = Ω(

1gap2−ν

p

)for all β ≥ 0, where gapp = p∗ − p is the

gap of the crossover probability from the channel threshold.


14

Proof: We give the proof here in the case of the BSC with some details relegated to the Appendix.

The AWGN case follows analogously, with some small modifications that are detailed in Appendix E.

Let the code for the given BSC(p) have rate R. Consider BSC channels G, chosen so that C(G) < R,

where C(·) maps a BSC to its capacity in bits per channel use. Taking log2 (·) on both sides of (1) (for

a fixed g),

log2 (〈Pe〉) ≥ log2

(h−1b (δ(G))

)− 1− nD (g||p)− ε

√n log2

(g(1− p)p(1− g)

). (19)

Rewriting (19),

nD (g||p) + ε√n log2

(g(1− p)p(1− g)

)+ log2 (〈Pe〉)− log2

(h−1b (δ(G))

)+ 1 ≥ 0. (20)

This equation is quadratic in√n. The LHS potentially has two roots. If both the roots are not real, then

the expression is always positive, and we get a trivial lower bound of√n ≥ 0. Therefore, the cases of

interest are when the two roots are real. The larger of the two roots is a lower bound on√n.

Denoting the coefficient of n by a = D (g||p), that of√n by b = ε log2

(g(1−p)p(1−g)

), and the constant

terms by c = log2 (〈Pe〉)− log2

(h−1b (δ(G))

)+ 1 in (20), the quadratic formula then reveals

√n ≥ −b+

√b2 − 4ac

2a. (21)

Since the lower bound holds for all g satisfying C(G) < R = C − gap, we substitute g∗ = p + gapr,

for some r < 1 and small gap. This choice is motivated by examining Fig. 1. The constraint r < 1 is

imposed because it ensures C(g∗) < R for small enough gap.

Lemma 1: In the limit of gap → 0, for g∗ = p+ gapr to satisfy C(g∗) < R, it suffices that r be less

than 1.

Proof:

C(g∗) = C(p+ gapr)

= C(p) + gapr × C ′(p) + o(gapr)

≤ C(p)− gap = R,

for small enough gap and r < 1. The final inequality holds since C(p) is a monotonically-decreasing

concave-∩ function for a BSC with p < 12 whereas gapr increases faster than any linear function of gap

when gap is small enough.

In steps, we now Taylor-expand the terms on the LHS of (20) about g = p.


15

goptvsgap.pdf

Fig. 1. The behavior of g∗, the optimizing value of g for the bound in Theorem 1, with gap. We plot log(gopt − p) vs

log(gap). The resulting straight lines inspired the substitution of g∗ = p+ gapr .

Lemma 2 (Bounds on hb(p) and h−1b (p) from [39]): For all d > 1, and for all x ∈ [0, 1

2 ] and y ∈

[0, 1]

hb(x) ≥ 2x (22)

hb(x) ≤ 2x1−1/dd/ ln(2) (23)

h−1b (y) ≥ y

d

d−1

(ln(2)

2d

) d

d−1

(24)

h−1b (y) ≤ 1

2y. (25)

Proof: See Appendix D-A.

Lemma 3:

d

d− 1r log2 (gap)− 1 +K1 + o(1) ≤ log2

(h−1b (δ(g∗))

)≤ r log2 (gap)− 1 +K2 + o(1) (26)

where K1 = dd−1

(log2

(h′b(p)C(p)

)+ log2

(ln(2)d

))where d > 1 is arbitrary and K2 = log2

(h′b(p)C(p)

).


16

Proof: See Appendix D-B.

Lemma 4:

D(g∗||p) =gap2r

2p(1− p) ln(2)(1 + o(1)). (27)

Proof: See Appendix D-C.

Lemma 5:

log2

(g∗(1− p)p(1− g∗)

)=

gapr

p(1− p) ln(2)(1 + o(1)).

Proof: See Appendix D-D.

Lemma 6:√r

K(p)

√log2

(1

gap

)(1 + o(1)) ≤ ε ≤

√rd

(d− 1)K(p)

√log2

(1

gap

)(1 + o(1))

where K(p) is from (4).

Proof: See Appendix D-E.

If c < 0, then the bound (21) is guaranteed to be positive. For 〈Pe〉 = gapβ , the condition c < 0 is

equivalent to

β log2 (gap)− log2

(h−1b (δ(g∗))

)+ 1 < 0 (28)

Since we want (28) to be satisfied for all small enough values of gap, we can use the approximations in

Lemma 3–6 and ignore constants to immediately arrive at the following sufficient condition

β log2 (gap)− d

d− 1r log2 (gap) < 0

i.e. r <β(d− 1)

d,

where d can be made arbitrarily large. Now, using the approximations in Lemma 3 and Lemma 5, and

substituting them into (21), we can evaluate the solution of the quadratic equation.

As shown in Appendix D-F, this gives us the following lower bound on n.

n ≥ Ω

(log2 (1/gap)

gap2r

)(29)

for any r < minβ, 1. The first part of Theorem 4 follows.

For LDPC codes with bounded average degree, we only need to verify that δ2(G) in Theorem 2

behaves like δ(G) as gap → 0 (note that the definition of gap for LDPC codes is taken to be the gap


17

from Gallager’s upper bound rather than the channel capacity). This can be seen easily as follows

δ2(d,G) = hb(gd)−hb(gd)− hb(g)

R

= hb(gd)

1−1− hb(g)

hb(gd)

R

= hb(gd)

(1− C(g)

C(g)− gap

). (30)

Now, for g∗ = p+ gapr, r < 1,

C(g∗) = C(p+ gapr) = C(p) + gaprC ′(p) + o(gapr),

where C ′(p) < 0. Thus,

C(g∗)

C(g∗)− gap=

C(p) + gaprC ′(p) + o(gapr)

C(p)(

1− gap

C(p)

)=

(1 + gapr

C ′(p)

C(p)+ o(gapr)

)(1 +

gap

C(p)+ o(gap)

)

= 1 +C ′(p)

C(p)+ o(gapr), (31)

since r < 1. Expanding gd,

gd =1− (1− 2p− 2gapr)d

2

=1− (1− 2p)d

(1− 2gapr

1−2p

)d2

=1− (1− 2p)d

(1− 2dgapr

1−2p + o(gapr))

2

= pd + (1− 2p)d−1dgapr + o(gapr).

Thus,

hb(gd) = hb(pd) + h′b(pd)(1− 2p)d−1dgapr + o(gapr), (32)

which approaches hb(pd) as gap → 0. Plugging (32) and (31) in (30), it is clear that δ2(d,G) also

behaves like δG (i.e. Θ(gapr)).

For sparse-graph codes under iterative decoding, choose gapp = p∗ − p, the gap from the threshold.

Again, g = p+ gaprp ensures that g > p∗. Taking log2(·) on both sides of (8) for chosen g,

nD (g||p) + ε3√n log2

(g(1− p)p(1− g)

)+ log2 (〈Pe〉)− log2

(h−1b (δ3)

)+ 1 ≥ 0. (33)


18

Observe that δ3 is a constant that does not depend on the desired 〈Pe〉. For small enough 〈Pe〉 (regardless

of gapp), the term c = log2 (〈Pe〉)− log2

(h−1b (δ3)

)+ 1 is positive in the quadratic equation. Thus the

neighborhood size scales as Ω(

1gap2−ν

p

)for all ν > 0 for all β as gapp → 0 (i.e., even for a constant

〈Pe〉 satisfying c < 0, the scaling holds as gapp → 0.

The lower bound on neighborhood size n can immediately be converted into a lower bound on the

minimum number of computational iterations by just taking logα(·). Note that this is not a comment

about the degree of a potential sparse graph that defines the code. This is just about the maximum degree

of the decoder’s computational nodes and is a bound on the number of computational iterations required

to hit the desired average probability of error.

The lower bounds are plotted in Fig. 2 for various different values of β and reveal a log 1gap scaling

to the required number of iterations when the decoder has bounded degree for message passing. This is

much larger than the trivial lower bound of log log 1gap but is much smaller than the Khandekar-McEliece

conjectured 1gap or 1

gap log2

(1

gap

)scaling for the number of iterations required to traverse such paths

toward certainty at capacity.

D. Lower bounds on the average neighborhood size

A lower bound on the maximum neighborhood size may be too weak in certain conditions. For example,

it does not rule out decoders with just one bit with a large neighborhood. The rest of the nodes can have

neighborhood of size 1! In this section we derive lower bounds on the average neighborhood size that

show that the average neighborhood size might scale slower than the maximum neighborhood size, but

still is unbounded as gap → 0 and 〈Pe〉 → 0. The main lower bound and its scaling for the BSC is

stated in the following theorem. To avoid repetition, we do not include the lower bound for the AWGN

channel since it can be derived analogously.

Theorem 5: For a BSC with crossover probability p < 12 , the following lower bound holds on the

average probability of bit-error for given average neighborhood size n = 1k

∑ki=1 ni

〈Pe〉 ≥ supC−1(R)<g≤ 1

2

(h−1b (δ(G))

)28

2− 4n

h−1b

(δ(G))D(g||p)

(p(1− g)

g(1− p)

)ε(h−1b

(δ(G))

2

)√4n

h−1b

(δ(G))

, (34)

for neighborhood sizes ni fixed in advance. The definitions of ε(·) and D(·|| · ) are as in Theorem 1. The

following expression is a lower bound to the asymptotic scaling of n for 〈Pe〉 = gapβ and gap → 0.

• For β < 2, n = Ω

(1

gapβ2 −ν

).


19

variousbeta.pdf

Fig. 2. Lower bounds for neighborhood size vs the gap to capacity for 〈Pe〉 = gapβ for various values of β. The curve titled

“balanced” gaps (see (16)) shows the behavior for C1−hb(〈Pe〉)

−C = C−R. The curves are plotted by brute-force optimization

of (1), but reveal slopes that are as predicted in Theorem 4.

• For β ≥ 2, n = Ω(

1gap1−ν

),

for any ν > 0.

Proof: See Appendix F.

IV. DISCUSSIONS AND CONCLUSIONS

We gave lower bounds on the required neighborhood size for a specified code performance, without

making any assumptions on the code structure. Decoding based on a fixed neighborhood size may seem

*********************** TO BE COMPLETED ************* The bounds are order optimal —

consider a random block-code with neighborhood size equal to the block-length. Then clearly, n ≤log2

(1

〈Pe〉

)Er(R) , where Er(R) is the random coding error exponent [30]. The main implications of the proofs

above is then that allowing for different neighborhood sizes instead of the entire block does not improve

the behavior of the error exponent at high rates.


20

For decoding algorithms that operate on regular graphs (e.g. regular LDPC codes), these bounds can

be used to obtain lower bounds on the required number of iterations to achieve a certain performance.

The scaling of the bound with the error probability and gap simplifies to

l &1

log2 (α− 1)log2

log2

(1〈Pe〉

)gap2

. (35)

As a function of 〈Pe〉, there exist codes that attain this double-logarithmic scaling. In particular, regular

LDPC codes attain this behavior. However, it is well known that regular LDPC codes do not approach the

channel capacity for any memoryless channel under belief-propagation decoding, and thus they operate

at a substantial gap from capacity. In contrast, there exist irregular LDPC codes and other families of

codes (e.g. IRA codes, ARA codes) that achieve capacity, at least for the BEC. These codes, however,

require the number of iterations to scale as log2

(1〈Pe〉

), and thus require a large number of iterations at

low error probability. Interestingly, recent advances by Lentmaier et al suggest that there exist codes that

might approach capacity with the number of iterations scaling double-logarithmically in 1〈Pe〉 .

If the objective is to minimize the energy used per-bit (under a rate constraint), at short distances,

small gap from capacity may be undesirable. This direction has been explored in [27], [28][waterslide],

where we account for the decoding energy as well. Using a VLSI model of decoding similar to the one

used by Thompson in [35], [40], we show that the transmit energy needs to be at a certain gap from that

predicted by Shannon theory in order for the decoding complexity (and hence the decoding energy) to

be small. Since there is little advantage in approaching capacity in such situations, regular LDPCs may

offer a class of codes that is well suited for short-distance communication.

Technical tools developed in this paper advance those in [36] to neighborhood sizes (instead of delay)

for average bit-error probability (instead of block-error probability) in non-asymptotic cases. The results

on scaling of average neighborhood size also yield new result for average delay, since decoding with

bounded delay is a special case of decoding with bounded neighborhood sizes. These tools are then

further advanced in [41] to obtain bounds on distortion for Witsenhausen’s counterexample and its vector

extensions in distributed control.

APPENDIX A

PROOF OF THEOREM 1: LOWER BOUND ON 〈Pe〉 FOR THE BSC

The derivation of the lower bound can be divided into the following steps: we first show that the

average probability of error for any code must be significant if the (test) channel were a such that the

capacity fell below the rate. Then, a mapping is given that maps the probability of an individual error


21

event under the test channel to a lower-bound on its probability under the true channel. This mapping

is shown to be convex-∪ in the probability of error. The convexity allows us to obtain a lower-bound to

the average probability of error under the true channel.

of Theorem 1:

Suppose we ran the given encoder and decoder over a test channel G instead.

Lemma 7 (Lower bound on 〈Pe〉 under test channel G.): If a rate-R code is used over a channel

G with C(G) < R, then the average probability of bit error satisfies

〈Pe〉G ≥ h−1b (δ(G)) (36)

where δ(G) = 1− C(G)R . This holds for any channel model G, not just BSCs.

Proof: See Appendix A-A.

Let bk1 denote the entire message, and let xm1 be the corresponding codeword. Let the common

randomness available to the encoder and decoder be denoted by the random variable U , and its realizations

by u.

Consider the i-th message bit Bi. Its decoding is performed by observing a particular decoding

neighborhood3 of channel outputs ynnbd,i. The corresponding channel inputs are denoted by xnnbd,i, and the

relevant channel noise by znnbd,i = xnnbd,i⊕ynnbd,i where ⊕ is used to denote modulo 2 addition. The decoder

just checks whether the observed ynnbd,i ∈ Dy,i(0, u) to decode to Bi = 0 or whether ynnbd,i ∈ Dy,i(1, u)

to decode to Bi = 1.

For given xnnbd,i, the error event is equivalent to znnbd,i falling in a decoding region Dz,i(xnnbd,i,bk1, u) =

Dy,i(1⊕ bi, u)⊕ xnnbd,i. Thus by the linearity of expectations, (36) can be rewritten as:

1

k

∑i

1

2k

∑bk1

∑u

Pr(U = u) PrG

(Znnbd,i ∈ Dz,i(xnnbd,i(bk1, u),bk1, u)) ≥ h−1

b (δ(G)) . (37)

The following lemma gives a lower bound to the probability of an event under channel BSC(p) given

a lower bound to its probability under channel G.

Lemma 8: Let A be a set of BSC(p) channel-noise realizations zn1 such that PrG(A) = δ. Then

Pr(A) ≥ fG (δ) (38)

3For any given decoder implementation, the size of the decoding neighborhood might be different for different bits i. However,

for notational clarity, we assume that all the neighborhoods are of the same size n corresponding to the largest possible

neighborhood size. This can be assumed without loss of generality since smaller decoding neighborhoods can be supplemented

with additional channel outputs that are ignored by the decoder.


22

where

fG(x) =x

22−nD(g||p)

(p(1− g)

g(1− p)

)ε(x)√n

(39)

is a convex-∪ increasing function of x and

ε(x) =

√1

K(g)log2

(2

x

). (40)

Proof: See Appendix A-B.

Applying Lemma 8 in the style of (37) tells us that:

〈Pe〉 =1

k

∑i

1

2k

∑bk1

∑u

Pr(U = u) Pr(Znnbd,i ∈ Dz,i(xnnbd,i(b

k1, u),bk1, u)

)≥ 1

k

∑i

1

2k

∑bk1

∑u

Pr(U = u)fG(PrG

(Znnbd,i ∈ Dz,i(xnnbd,i(b

k1, u),bk1, u)

)). (41)

Since the increasing function fG(·) is also convex-∪, (41) and (37) imply that

〈Pe〉 ≥ fG

1

k

∑i

1

2k

∑bk1

∑u

Pr(U = u) PrG

(Znnbd,i ∈ Dz,i(xnnbd,i(b

k1, u),bk1, u)

)≥ fG(h−1

b (δ(G))).

This proves Theorem 1.

At the cost of slightly more complicated notation, by following the techniques in [36], similar results

can be proved for decoding across any discrete memoryless channel by using Hoeffding’s inequality

in place of the Chernoff bounds used here in the proof of Lemma 7. In place of the KL-divergence

term D(g||p), for a general DMC the arguments give rise to a term maxxD(Gx||Px) that picks out

the channel input letter that maximizes the divergence between the two channels’ outputs. For output-

symmetric channels, the combination of these terms and the outer maximization over channels G with

capacity less than R will mean that the divergence term will behave like the standard sphere-packing

bound when n is large. When the channel is not output symmetric (in the sense of [30]), the resulting

divergence term will behave like the Haroutunian bound for fixed block-length coding over DMCs with

feedback [42].

A. Proof of Lemma 7: A lower bound on 〈Pe〉G.

The lemma can be proved using tools from rate-distortion theory. We give an alternative proof that is

more closely related to Fano’s inequality


23

Proof:

H(Bk1)−H(Bk

1|Ym1 ) = I(Bk

1;Ym1 ) ≤ I(Xm

1 ;Ym1 ) ≤ mC(G).

Since the Ber(12) message bits are iid, H(Bk

1) = k. Therefore,

1

kH(Bk

1|Ym1 ) ≥ 1− C(G)

R. (42)

Suppose the message bit sequence was decoded to be Bk1 . Denote the error sequence by Bk

1 . Then,

Bk1 = Bk

1 ⊕ Bk1, (43)

where the addition ⊕ is modulo 2. The only complication is the possible randomization of both the

encoder and decoder. However, note that even with randomization, the true message Bk1 is independent

of Bk1 conditioned on Ym

1 . Thus,

H(Bk1|Ym

1 ) = H(Bk1 ⊕Bk

1|Ym1 )

= H(Bk1 ⊕Bk

1|Ym1 ) + I(Bk

1; Bk1|Ym

1 )

= H(Bk1 ⊕Bk

1|Ym1 )−H(Bk

1|Ym1 , B

k1) +H(Bk

1|Ym1 )

= H(Bk1 ⊕Bk

1|Ym1 )−H(Bk

1 ⊕ Bk1|Ym

1 , Bk1) +H(Bk

1|Ym1 )

= I(Bk1 ⊕Bk

1; Bk1|Ym

1 ) +H(Bk1|Ym

1 )

≥ H(Bk1|Ym

1 )

≥ k

(1− C(G)

R

).

This implies

1

k

k∑i=1

H(Bi|Ym1 ) ≥ 1− C(G)

R. (44)

Since conditioning reduces entropy, H(Bi) ≥ H(Bi|Ym1 ). Therefore,

1

k

k∑i=1

H(Bi) ≥ 1− C(G)

R. (45)

Since Bi are binary random variables, H(Bi) = hb(〈Pe,i〉G), where hb(·) is the binary entropy function.

Since hb(·) is a concave-∩ function, h−1B (·) is convex-∪ when restricted to output values from [0, 1

2 ].

Thus, (45) together with Jensen’s inequality implies the desired result (36).


24

B. Proof of Lemma 8: a lower bound on 〈Pe,i〉 as a function of 〈Pe,i〉G.

Proof: First, consider a strongly G−typical set of znnbd,i, given by

Tε,G = zn1 s.t.n∑i=1

zi − ng ≤ ε√n. (46)

In words, Tε,G is the set of noise sequences with weights smaller than ng + ε√n. The probability of an

event A can be bounded using

δ = PrG

(Zn1 ∈ A)

= PrG

(Zn1 ∈ A ∩ Tε,G) + PrG

(Zn1 ∈ A ∩ T cε,G)

≤ PrG

(Zn1 ∈ A ∩ Tε,G) + PrG

(Zn1 ∈ T cε,G).

Consequently,

PrG

(Zn1 ∈ A ∩ Tε,G) ≥ δ − PrG

(T cε,G). (47)

We now need the following Lemma,

Lemma 9: The probability of the atypical set of Bernoulli-g channel noise Zi is bounded above by

PrG

(∑ni Zi − ng√

n> ε

)≤ 2−K(g)ε2 (48)

where K(g) = inf0<η≤1−g

D(g+η||g)η2 .

Proof: See Appendix A-C.

Choose ε such that

2−K(g)ε2 =δ

2

i.e. ε2 =1

K(g)log2

(2

δ

). (49)

Thus (47) becomes

PrG

(Zn1 ∈ A ∩ Tε,G) ≥ δ

2. (50)

Let nzn1 denote the number of ones in zn1 . Then,

PrG

(Zn1 = zn1 ) = gnzn1 (1− g)n−nzn

1 . (51)


25

This allows us to lower bound the probability of A for the underlying channel as follows:

Pr(Zn1 ∈ A) ≥ Pr(Zn1 ∈ A ∩ Tε,G)

=∑

zn1∈A∩Tε,G

Pr(zn1 )

PrG(zn1 )PrG

(zn1 )

=∑

zn1∈A∩Tε,G

pnzn1 (1− p)n−nzn

1

gnzn1 (1− g)n−nzn

1

PrG

(zn1 )

≥ (1− p)n

(1− g)n

∑zn1∈A∩Tε,G

PrG

(zn1 )

(p(1− g)

g(1− p)

)ng+ε√n

=(1− p)n

(1− g)n

(p(1− g)

g(1− p)

)ng+ε√nPrG

(A ∩ Tε,G)

≥ δ

22−nD(g||p)

(p(1− g)

g(1− p)

)ε√n.

This results in the desired expression:

fG(x) =x

22−nD(g||p)

(p(1− g)

g(1− p)

)ε(x)√n

. (52)

where ε(x) =√

1K(g) log2

(2x

). To see the convexity-∪ of fG(x), it is useful to apply some substitutions.

Let c1 = 2−nD(g||p)

2 > 0 and let ξ =√

nK(g) ln 2 ln(p(1−g)g(1−p)). Notice that ξ < 0 since the term inside the ln

is less than 1. Then fG(x) = c1x exp(ξ√

ln 2− lnx).

Differentiating fG(x) once results in

f ′(x) = c1 exp

(ξ

√ln(2) + ln(

1

x)

)(1 +

−ξ

2√

ln(2) + ln( 1x)

). (53)

By inspection, f ′G(x) > 0 for all 0 < x < 1 and thus fG(x) is a monotonically increasing function.

Differentiating fG(x) twice with respect to x gives

f ′′G(x) = −ξc1 exp

(ξ√

ln(2) + ln( 1x))

2√

ln(2) + ln( 1x)

1 +1

2(ln(2) + ln( 1x))− ξ

2√

ln(2) + ln( 1x)

. (54)

Since ξ < 0, it is evident that all the terms in (54) are strictly positive. Therefore, fG(·) is convex-∪.

C. Proof of Lemma 9: Bernoulli Chernoff bound

Proof: Recall that Zi are iid Bernoulli random variables with mean g ≤ 1/2.

Pr

(∑i(Zi − g)√

n≥ ε)

= Pr

(∑i(Zi − g)

n≥ ε)

(55)


26

where ε =√nε and so n = ε2/ε2. Therefore,

Pr

(∑i(Zi − g)√

n≥ ε)≤ [((1− g) + g exp(s))× exp(−s(g + ε))]n for all s ≥ 0. (56)

Choose s satisfying

exp(−s) =g

(1− g)×(

1

(g + ε)− 1

). (57)

It is safe to assume that g+ ε ≤ 1 since otherwise, the relevant probability is 0 and any bound will work.

Substituting (57) into (56) gives

Pr

(∑i(Zi − g)√

n≥ ε)≤ 2−

D(g+ε||g)

ε2ε2 .

This bound holds under the constraint ε2

ε2 = n. To obtain a bound that holds uniformly for all n, we fix

ε, and take the supremum over all the possible ε values.

Pr

(∑i(Zi − g)√

n≥ ε)≤ sup

0<ε≤1−gexp

(− ln(2)

D(g + ε||g)

ε2ε2)

≤ exp

(− ln(2)ε2 inf

0<ε≤1−g

D(g + ε||g)

ε2

),

giving us the desired bound.

APPENDIX B

PROOF OF THEOREM 2

We use Gallager’s technique from [15, Pg. 38] to upper bound the mutual information I(Xm1 ,Y

m1 )

under channel behavior G.

I(Xm1 ,Y

m1 ) = H(Ym

1 )−H(Ym1 |Xm

1 )

(a)

≤ mR+m(1−R)hb(gd)−mH(Yi|Xi)

= mR+m(1−R)hb(gd)−mhb(g). (58)

The crucial part of Gallager’s proof is used to derive (a) above, and is detailed in the following. The

output vector Ym1 can alternatively be specified by providing Y at any mR linearly independent positions

in the code (say YmR1 , and the syndrome vector S

m(1−R)1 . Now, using the chain rule and the fact that

conditioning reduces entropy,

H(Ym1 ) ≤ H(YmR

1 ) +H(Sm(1−R)1 ). (59)

The entropy of YmR1 ≤ mR, giving the first term in (58) (a) above. The entropy of the syndrome vector

Sm(1−R)1 can be upper bounded by

∑m(1−R)i=1 H(Si). Probability of Si being 1 is the probability that


27

there are odd number of errors in d output values that form the check set. Thus Pr(Si = 1) = gd. The

second term in (58) (a) follows. Now (42) changes to

1

kH(Bm

1 |Ym1 ) ≥ 1− mR+m(1−R)hb(gd)−mhb(g)

k

= hb(gd)−hb(gd)− hb(g)

R.

The rest of the proof is the same as that for Theorem 1.

For sparse-graph codes under message-passing decoding with known threshold parameter under the

scheduling algorithm chosen, if there exists a δ3 > 0 such that for channel parameter above the threshold,

〈Pe〉G ≥ δ3, then the same proof works by using 〈Pe〉G ≥ δ3 in (36).

APPENDIX C

PROOF OF THEOREM 3: LOWER BOUND ON 〈Pe〉 FOR AWGN CHANNELS

The AWGN case can be proved using an argument almost identical to the BSC case. Once again,

the focus is on the channel noise Z in the decoding neighborhoods [43]. Notice that Lemma 7 already

applies to this channel even if the power constraint only has to hold on average over all codebooks and

messages. Thus, all that is required is a counterpart to Lemma 8 giving a convex-∪ mapping from the

probability of a set of channel-noise realizations under a Gaussian channel with noise variance σ2G back

to their probability under the original channel with noise variance σ20 .

Lemma 10: Let A be a set of Gaussian channel-noise realizations zn1 such that PrG(A) = δ. Then

Pr(A) ≥ fG (δ) (60)

where

fG(δ) =δ

2exp

(−nD(σ2

G||σ20)−

√n

(3

2+ 2 ln

(2

δ

))(σ2G

σ20

− 1

)). (61)

Furthermore, fG(x) is a convex-∪ increasing function in δ for all values of σ2G ≥ σ2

0 .

In addition, the following bound is also convex-∪ whenever σ2G > σ2

0µ(n) with µ(n) as defined in

(12).

fL(δ) =δ

2exp

(−nD(σ2

G||σ20)− 1

2φ(n, δ)

(σ2G

σ20

− 1

))(62)

where φ(n, δ) is as defined in (15).

Proof: See Appendix C-A.

With Lemma 10 playing the role of Lemma 8, the proof for Theorem 3 proceeds identically to that

of Theorem 1.


28

It should be clear that similar arguments can be used to prove similar results for any additive-noise

models for continuous output communication channels. However, we do not believe that this will result

in the best possible bounds. Instead, even the bounds for the AWGN case seem suboptimal because we

are ignoring the possibility of a large deviation in the noise that happens to be locally aligned to the

codeword itself.

A. Proof of Lemma 10: a lower bound on 〈Pe,i〉 as a function of 〈Pe,i〉G

Proof: Consider the length-n set of G-typical additive noise given by

Tε,G =

zn1 :||zn1 ||2 − nσ2

G

n≤ ε. (63)

With this definition, (47) continues to hold in the Gaussian case.

There are two different Gaussian counterparts to Lemma 9. They are both expressed in the following

lemma.

Lemma 11: For Gaussian noise Zi with variance σ2G,

Pr

(1

n

n∑i=1

Z2i

σ2G

> 1 +ε

σ2G

)≤((

1 +ε

σ2G

)exp

(− ε

σ2G

))n

2

. (64)

Furthermore

Pr

(1

n

n∑i=1

Z2i

σ2G

> 1 +ε

σ2G

)≤ exp

(−√nε

4σ2G

)(65)

for all ε ≥ 3σ2G√n

.

Proof: See Appendix C-B.

To have Pr(T cε,G) ≤ δ2 , it suffices to pick any ε(δ, n) large enough.

So

Pr(A) ≥∫zn1∈A∩Tε,G

fP (zn1 )dzn1

=

∫zn1∈A∩Tε,G

fP (zn1 )

fG(zn1 )fG(zn1 )dzn1 . (66)

Consider the ratio of the two pdf’s for zn1 ∈ Tε,G

fP (zn1 )

fG(zn1 )=

(√σ2G

σ20

)nexp

(−‖zn1‖2

(1

2σ20

− 1

2σ2G

))≥ exp

(−(nσ2

G + nε(δ, n))

(1

2σ20

− 1

2σ2G

)+ n ln

(σGσ0

))= exp

(−ε(δ, n)n

2σ2G

(σ2G

σ20

− 1

)− nD(σ2

G||σ20)

)(67)


29

where D(σ2G||σ2

0) is the KL-divergence between two Gaussian distributions of variances σ2G and σ2

0

respectively. Substitute (67) back in (66) to get

Pr(A) ≥ exp

(−ε(δ, n)n

2σ2G

(σ2G

σ20

− 1

)− nD(σ2

G||σ20)

)∫zn1∈A∩Tε,G

fG(zn1 )dzn1

≥ δ

2exp

(−nD(σ2

G||σ20)− ε(δ, n)n

2σ2G

(σ2G

σ20

− 1

)). (68)

At this point, it is necessary to make a choice of ε(δ, n). If we are interested in studying the asymptotics

as n gets large, we can use (65). This reveals that it is sufficient to choose ε ≥ σ2G max

3√n,−4 ln(δ)−ln(2)√

n

.

A safe bet is ε = σ2G

3+4 ln( 2

δ)√

nor nε(δ, n) =

√n(3 + 4 ln(2

δ ))σ2G. Thus (50) holds as well with this choice

of ε(δ, n).

Substituting into (68) gives

Pr(A) ≥ δ

2exp

(−nD(σ2

G||σ20)−

√n

(3

2+ 2 ln

(2

δ

))(σ2G

σ20

− 1

)).

This establishes the desired fG(·) function from (61). To see that this function fG(x) is convex-∪

and increasing in x, define c1 = exp(−nD(σ2

G||σ20)−

√n(

32 + 2 ln (2)

) (σ2G

σ20− 1)− ln(2)

)and ξ =

2√n(σ2G

σ20− 1)> 0. Then fG(δ) = c1δ exp(ξ ln(δ)) = c1δ

1+ξ which is clearly monotonically increasing

and convex-∪ by inspection.

Attempting to use (64) is a little more involved. Let ε = εσ2G

for notational convenience. Then we must

solve (1 + ε) exp(−ε) = ( δ2)2

n . Substitute u = 1 + ε to get u exp(−u + 1) = ( δ2)2

n . This immediately

simplifies to −u exp(−u) = − exp(−1)( δ2)2

n . At this point, we can immediately verify that ( δ2)2

n ∈ [0, 1]

and hence by the definition of the Lambert W function in [38], we get u = −WL(− exp(−1)( δ2)2

n ). Thus

ε(δ, n) = −WL

(− exp(−1)(

δ

2)

2

n

)− 1. (69)

Substituting this into (68) immediately gives the desired expression (62). All that remains is to verify

the convexity-∪. Let v = 12

(σ2G

σ20− 1)

. As above, fL(δ) = δc2 exp(−nvε(δ, n)). The derivatives can be

taken using very tedious manipulations involving the relationship W ′L(x) = WL(x)x(1+WL(x)) from [38] and

can be verified using computer-aided symbolic calculation. In our case −ε(δ, n) = (WL(x) + 1) and so

this allows the expressions to be simplified.

f ′L(δ) = c2 exp(−nvε)(

2v + 1 +2v

ε

). (70)

Notice that all the terms above are positive and so the first derivative is always positive and the function

is increasing in δ. Taking another derivative gives

f ′′L(δ) = c22v(1 + ε) exp(−nvε)

δε

(1 + 4v +

4v

ε− 4

nε− 2

nε2

). (71)


30

Recall from (69) and the properties of the Lambert WL function that ε is a monotonically decreasing

function of δ that is +∞ when δ = 0 and goes down to 0 at δ = 2. Look at the term in brackets above

and multiply it by the positive nε2. This gives the quadratic expression

(4v + 1)nε2 + 4(vn− 1)ε− 2. (72)

This (72) is clearly convex-∪ in ε and negative at ε = 0. Thus it must have a single zero-crossing for

positive ε and be strictly increasing there. This also means that the quadratic expression is implicitly a

strictly decreasing function of δ. It thus suffices to just check the quadratic expression at δ = 1 and make

sure that it is non-negative. Evaluating (69) at δ = 1 gives ε(1, n) = T (n) where T (n) is defined in (13).

It is also clear that (72) is a strictly increasing linear function of v and so we can find the minimum value

for v above which (72) is guaranteed to be non-negative. This will guarantee that the function fL is convex-

∪. The condition turns out to be v ≥ 2+4T−nT 2

4nT (T+1) and hence σ2G = σ2

0(2v+ 1) ≥ σ2G

2 (1 + 1T+1 + 4T+2

nT (T+1)).

This matches up with (12) and hence the Lemma is proved.

B. Proof of Lemma 11: Chernoff bound for Gaussian noise

Proof: The sum∑n

i=1Z2i

σ2G

is a standard χ2 random variables with n degrees of freedom.

Pr

(1

n

n∑i=1

Z2i

σ2G

> 1 +ε

σ2G

)(a)

≤ infs>0

(exp(−s(1 + ε

σ2G

))√

1− 2s

)n(b)

≤(√

1 +ε

σ2G

exp

(− ε

2σ2G

))n(73)

=

((1 +

ε

σ2G

)exp

(− ε

σ2G

))n

2

(74)

where (a) follows using standard moment generating functions for χ2 random variables and Chernoff

bounding arguments and (b) results from the substitution s = ε2(σ2

G+ε) . This establishes (64).

For tractability, the goal is to replace (73) with a exponential of an affine function of εσ2G

. For notational

convenience, let ε = εσ2G

. The idea is to bound the polynomial term√

1 + ε with an exponential as long

as ε > ε∗.

Let ε∗ = 3√n

and let K = 12 −

14√n

. Then it is clear that

√1 + ε ≤ exp(Kε) (75)


31

as long as ε ≥ ε∗. First, notice that the two agree at ε = 0 and that the slope of the concave-∩ function√

1 + ε there is 12 . Meanwhile, the slope of the convex-∪ function exp(Kε) at 0 is K < 1

2 . This means

that exp(Kε) starts out below√

1 + ε. However, it has crossed to the other side by ε = ε∗. This can

be verified by taking the logs of both sides of (75) and multiplying them both by 2. Consider the LHS

evaluated at ε∗ and lower-bound it by a third-order power-series expansion

ln

(1 +

3√n

)≤ 3√

n− 9

2n+

9

n3/2.

meanwhile the RHS of (75) can be dealt with exactly:

2Kε∗ =

(1− 1

2√n

)3√n

=3√n− 3

2n.

For n ≥ 9, the above immediately establishes (75) since 92n −

32n = 3

n ≥9

n√

9. The cases n =

1, 2, 3, 4, 5, 6, 7, 8 can be verified by direct computation. Using (75), for ε > ε∗ we have:

Pr(T cε,G) ≤(

exp(Kε) exp

(−1

2ε

))n= exp

(−√n

4ε

). (76)

APPENDIX D

APPROXIMATION ANALYSIS FOR THE BSC

A. Lemma 2

Proof: (22) and (25) are obvious from the concave-∩ nature of the binary entropy function and its

values at 0 and 12 .

hb(x) = x log2 (1/x) + (1− x) log2 (1/(1− x))

(a)

≤ 2x log2 (1/x) = 2x ln(1/x)/ ln(2)

(b)

≤ 2xd

(1

x1/d− 1

)/ ln(2) ∀d > 1

≤ 2x1−1/dd/ ln(2).

Inequality (a) follows from the fact that xx < (1−x)1−x for x ∈ (0, 12). For inequality (b), observe that

ln(x) ≤ x − 1. This implies ln(x1/d) ≤ x1/d − 1. Therefore, ln(x) ≤ d(x1/d − 1) for all x > 0 since1d ≤ 1 for d ≥ 1.


32

The bound on h−1b (x) follows immediately by identical arguments.

B. Lemma 3

Proof: First, we investigate the small gap asymptotics for δ(g∗), where g∗ = p+ gapr and r < 1.

δ(g∗) = 1− C(g∗)

R

= 1− C(p+ gapr)

C(p)− gap

= 1−C(p)− gaprh′b(p) + o(gapr)

C(p)(1− gap/C(p))

= 1−(

1−h′b(p)

C(p)gapr + o(gapr)

)× (1 + gap/C(p) + o(gap))

=h′b(p)

C(p)gapr + o(gapr). (77)

Plugging (77) into (25) and using Lemma 2 gives

log2

(h−1b (δ(g∗))

)≤ log2

(h′b(p)

2C(p)gapr + o(gapr)

)(78)

= log2

(h′b(p)

2C(p)

)+ r log2 (gap) + o(1) (79)

= r log2 (gap)− 1 + log2

(h′b(p)

C(p)

)+ o(1) (80)

and this establishes the upper half of (26).

To see the lower half, we use (24):

log2

(h−1b (δ(g∗))

)≥ d

d− 1

(log2 (δ(g∗)) + log2

(ln 2

2d

))=

d

d− 1

(log2

(h′b(p)

C(p)gapr + o(gapr)

)+ log2

(ln 2

2d

))=

d

d− 1

(r log2 (gap) + log2

(h′b(p)

C(p)

)+ o(1) + log2

(ln 2

2d

))=

d

d− 1r log2 (gap)− 1 +K1 + o(1)

where K1 = dd−1

(log2

(h′b(p)C(p)

)+ log2

(ln(2)d

))and d > 1 is arbitrary.

C. Lemma 4

Proof:

D(g∗||p) = D(p+ gapr||p)

= 0 + 0× gapr +1

2

gap2r

p(1− p) ln(2)+ o(gap2r)


33

since D(p||p) = 0 and the first derivative is also zero. Simple calculus shows that the second derivative

of D(p+ x||p) with respect to x is log2(e)(p+x)(1−p−x) .

D. Lemma 5

Proof:

log2

(g∗(1− p)p(1− g∗)

)= log2

(1− pp

)+ log2

(g∗

1− g∗

)= log2

(1− pp

)+ log2 (g∗)− log2 (1− g∗)

= log2

(1− pp

)+ log2 (p+ gapr)− log2 (1− p− gapr)

= log2

(1− pp

)+ log2 (p) + log2

(1 +

gapr

p

)− log2 (1− p)− log2

(1− gapr

1− p

)=

gapr

p ln(2)+

gapr

(1− p) ln(2)+ o(gapr)

=gapr

p(1− p) ln(2)+ o(gapr)

=gapr

p(1− p) ln(2)(1 + o(1)).

E. Lemma 6

Proof: Expand (3):

ε =

√1

K(p+ gapr)

√√√√log2

(2

h−1b (δ(G))

)

=

√1

ln(2)K(p+ gapr)

√ln(2)− ln

(h−1b (δ(G))

)≥

√1

ln(2)K(p+ gapr)

√ln(2)− r ln(gap) + ln(2)−K2 ln(2) + o(1)

=

√1

ln(2)K(p+ gapr)

√r ln

(1

gap

)+ (2−K2) ln(2) + o(1)

=

√1

ln(2)K(p+ gapr)

√r ln

(1

gap

)(1 + o(1)).


34

and similarly

ε =

√1

ln(2)K(p+ gapr)

√ln(2)− ln

(h−1b (δ(G))

)≤

√1

ln(2)K(p+ gapr)

√(2−K2) ln(2) +

d

d− 1r ln

(1

gap

)+ o(1)

=

√rd

ln(2)(d− 1)K(p+ gapr)

√ln

(1

gap

)(1 + o(1)).

All that remains is to show that K(p + gapr) converges to K(p) as gap → 0. Examine (4). The

continuity of D(g+η||g)η2 is clear in the interior η ∈ (0, 1 − g) and for g ∈ (0, 1

2). All that remains is

to check the two boundaries. limη→0D(g+η||g)

η2 = 1g(1−g) ln 2 by the Taylor expansion of D(g + η||g) as

done in the proof of Lemma 4. Similarly, limη→1−gD(g+η||g)

η2 = D(1||g) = log2

(1

1−g

). Since K is

a minimization of a continuous function over a compact set, it is itself continuous and thus the limit

limgap→0K(p+ gapr) = K(p). Converting from natural logarithms to base 2 completes the proof.

F. Approximating the solution to the quadratic formula

In (21), for g = g∗ = p+ gapr,

a = D(g∗||p)

b = ε log2

(g∗(1− p)p(1− g∗)

)c = log2 (〈Pe〉)− log2

(h−1b (δ(g∗))

)+ 1.

The first term, a, is approximated by Lemma 4 so

a = gap2r

(1

2p(1− p) ln(2)+ o(1)

). (81)

Applying Lemma 5 and Lemma 6 reveals

b ≤

√rd

(d− 1)K(p)

√log2

(1

gap

)gapr

p(1− p) ln(2)(1 + o(1))

=1

p(1− p) ln(2)

√rd

(d− 1)K(p)

√gap2r log2

(1

gap

)(1 + o(1)) (82)

b ≥ 1

p(1− p) ln(2)

√r

K(p)

√gap2r log2

(1

gap

)(1 + o(1)). (83)


35

The third term, c, can be bounded similarly using Lemma 3 as follows,

c = β log2 (gap)− log2

(h−1b (δ(g∗))

)+ 1

≤ (d

d− 1r − β) log2

(1

gap

)+K3 + o(1) (84)

Also c ≥ (r − β) log2

(1

gap

)+K4 + o(1). (85)

for a pair of constants K3,K4. Thus, for gap small enough and r < β(d−1)d , we know that c < 0.

The lower bound on√n is thus

√n ≥

√b2 − 4ac− b

2a

=b

2a

(√1− 4ac

b2− 1

). (86)

Plugging in the bounds (81) and (83) reveals that

b

2a≥

√log2

(1

gap

)gapr

√r

K(p)(1 + o(1)) (87)

Similarly, using (81), (83), (84), we get

4ac

b2≤

4gap2r(

1p(1−p) ln(2)

)×((

dd−1r − β

)log2

(1

gap

)+K3

)(1 + o(1))(

1p(1−p) ln(2)

)2r

K(p)gap2r log2

(1

gap

)(1 + o(1))

= 4p(1− p)K(p) ln(2)

(d

d− 1− β

r

)+ o(1). (88)

This tends to a negative constant since r < β(d−1)d .

Plugging (87) and (88) into (86) gives:

n ≥

√ r

K(p)

log2

(1

gap

)gapr

(1 + o(1))

(√1 + 4p(1− p) ln(2)K(p)

(β

r− d

d− 1

)+ o(1)− 1

)2

=

log2

(1

gap

)gapr

2

1

K(p)

(√r + 4p(1− p) ln(2)K(p)

(β − rd

d− 1

)−√r

)2

(1 + o(1))

= Ω

((log2 (1/gap))2

gap2r

)(89)

for all r < min

βd

d−1

, 1

. By taking d arbitrarily large, we arrive at Theorem 4 for the BSC (the term

in the numerator can be absorbed in the ν for small enough gap).


36

APPENDIX E

APPROXIMATION ANALYSIS FOR THE AWGN CHANNEL

Taking logs on both sides of (10) for a fixed test channel G,

ln(〈Pe〉) ≥ ln(h−1b (δ(G))

)− ln(2)− nD(σ2

G||σ20)−

√n

(3

2+ 2 ln 2− 2 ln

(h−1b (δ(G)

))(σ2G

σ20

− 1

),

(90)

Rewriting this in the standard quadratic form using

a = D(σ2G∗ ||σ2

0), (91)

b =

(3

2+ 2 ln 2− 2 ln

(h−1b (δ(G))

))(σ2G

σ20

− 1

), (92)

c = ln(〈Pe〉)− ln(h−1b (δ(G))

)+ ln(2). (93)

it suffices to show that the terms exhibit behavior as gap → 0 similar to their BSC counterparts.

For Taylor approximations, we use the channel G∗, with corresponding noise variance σ2G∗ = σ2

0 + ζ,

where

ζ = gapr(

2σ20(PT + σ2

0)

PT

). (94)

Lemma 12: For small enough gap, for ζ as in (94), if r < 1 then C(G∗) < R.

Proof: Since C − gap = R > C(G∗), we must satisfy

gap ≤ 1

2log2

(1 +

PTσ2

0

− 1

2log2

(1 +

PTσ2

0

+ ζ

)).

So the goal is to lower bound the RHS above to show that (94) is good enough to guarantee that this is

bigger than the gap. So

=1

2

(log2

(1 +

ζ

σ20

− log2

(1 +

ζ

σ20

+ PT

)))=

1

2

(log2

(1 + 2gapr(1 +

σ20

PT)

)− log2

(1 + 2gapr

σ20

PT

))≥ 1

2

(cs

ln(2)2gapr(1 +

σ20

PT)− 1

ln(2)2gapr

σ20

PT

)= gapr

1

ln(2)

(cs − (1− cs)

σ20

PT

). (95)

For small enough gap, this is a valid lower bound as long as cs < 1. Choose cs so that 1 < cs <σ2

0

PT+σ20.

For ζ as in (94), the LHS is gaprK and thus clearly having r < 1 suffices for satisfying (95) for small

enough gap. This is because the derivative of gapr tends to infinity as gap → 0.

In the next Lemma, we perform the approximation analysis for the terms inside (91), (92) and (93).


37

Lemma 13: Assume that σ2G∗ = σ2

0 + ζ where ζ is defined in (94).

(a)σ2G∗

σ20

− 1 = gapr(

2(PT + σ20)

PT

). (96)

(b)

ln(δ(G∗)) = r ln(gap) + o(1)− ln(C). (97)

(c)

ln(h−1b (δ(G∗))) ≥ d

d− 1r ln(gap) + c2, (98)

for some constant c2 that is a function of d.

ln(h−1b (δ(G∗))) ≤ r ln(gap) + c3, (99)

for some constant c3.

(d)

D(σ2G∗ ||σ2

0) =(PT + σ2

0)2

P 2T

gap2r(1 + o(1)). (100)

Proof: (a) Immediately follows from the definitions and (94).

(b) We start with simplifying δ(G∗)

δ(G∗) = 1− C(G∗)

R

=C − gap − 1

2 log2

(1 + PT

σ2G∗

)C − gap

=

12 log2

(1 + PT

σ20

)− 1

2 log2

(1 + PT

σ20+ζ

)− gap

C − gap

=

12 log2

((σ

20+PTσ2

0)( σ2

0+ζPT+σ2

0+ζ ))− gap

C − gap

=

12 log2

(1 + ζ

σ20

)− 1

2 log2

(1 + ζ

PT+σ20

)− gap

C − gap

=

12ζσ2

0− 1

2ζ

PT+σ20

+ o(ζ)− gap

C − gap

=

12

(ζPT

σ20(PT+σ2

0) + o(ζ))− gap

C − gap

=1

C

(1

2

(gapr

2σ20(PT + σ2

0)

PT

PTσ2

0(P + σ20)

+ o(gapr)

)− gap

)(1− gap

C+ o(gap))

=gapr

C(1 + o(1)).


38

Taking ln(·) on both sides, the result is evident.

(c) follows from (b) and Lemma 2.

(d) comes from the definition of D(σ2G∗ ||σ2

0) followed immediately by the expansion ln(σ2G∗/σ

20) =

ln(1 + ζ/σ20) = ζ

σ20− 1

2( ζσ2

0)2 + o(gap2r). All the constant and first-order in gapr terms cancel since

σ2G∗

σ20

= 1 + ζσ2

0. This gives the result immediately.

Now, we can use Lemma 13 to approximate (91), (92) and (93).

a =(PT + σ2

0)2

P 2T

gap2r(1 + o(1)) (101)

b =

(3

2+ 2 ln 2− 2 ln(h−1

b (δ(G)))

)gapr

2(PT + σ20)

PT

≤ 2d(PT + σ20)

(d− 1)PTr ln

(1

gap

)gapr(1 + o(1)) (102)

b ≥ 2(PT + σ20)

PTr ln

(1

gap

)gapr(1 + o(1)) (103)

c ≤(

d

d− 1r − β

)ln

(1

gap

)(1 + o(1)) (104)

c ≥ (r − β) ln

(1

gap

)(1 + o(1)). (105)

Therefore, in parallel to (87), we have for the AWGN bound

b

2a≥ rPT

(PT + σ20)

ln(

1gap

)gapr

(1 + o(1)). (106)

Similarly, in parallel to (88), we have for the AWGN bound

4ac

b2≤ (1 + o(1))

1

r2

(d

d− 1r − β

)1

ln(

1gap

) .This is negative as long as r < β(d−1)

d , and so for every cS < 12 for small enough gap, we know that√

1− 4ac

b2− 1 ≥ cs

1

r2

(β − d

d− 1r

)1

ln(

1gap

)(1 + o(1)).

Combining this with (106) gives the bound:

n ≥ (1 + o(1))

cs 1

r2

(β − d

d− 1r

)1

ln(

1gap

) rPTPT + σ2

0

ln(

1gap

)gapr

2

(107)

= (1 + o(1))

(cs

PTr(PT + σ2

0)(β − d

d− 1r)

(1

gapr

))2

. (108)


39

Since this holds for all 0 < cs <12 and all r < min

1, β(d−1)

d

for all d > 1, Theorem 4 for AWGN

channels follows.

APPENDIX F

LOWER BOUND ON AVERAGE NEIGHBORHOOD SIZE

of Theorem 5: From Lemma 8, for any bit i of neighborhood size ni,

〈Pe,i〉 ≥ 〈Pe,i〉G2−niD(g||p)(p(1− g)

g(1− p)

)ε(〈Pe,i〉G)√ni

. (109)

Now, the average neighborhood n = 1k

∑ki=1 ni. Applying Markov’s inequality on the counting measure,

the fraction of bits that have neighborhood size larger than ηni for some η is smaller than nηn = 1

η . Thus

there exists a set S, with |S| ≥ k(

1− 1η

)such that ni ≤ ηn for all i ∈ S. Thus,

〈Pe,i〉 ≥ 〈Pe,i〉G2−ηnD(g||p)(p(1− g)

g(1− p)

)ε(〈Pe,i〉G)√ηn

, (110)

for all i ∈ S. Also notice that since 1k

∑ki=1 〈Pe,i〉G ≥ h−1

b (δ(G)), there exists a set T of size |T | ≥h−1b (δ(G))

2 such that for all i ∈ T , 〈Pe,i〉 ≥ h−1b (δ(G))

2 (this follows readily from the observation that

〈Pe,i〉 ≤ 1). Choose η satisfying

k

(1− 1

η

)= k

(1−

h−1b (δ(G))

4

), (111)

i.e., η = 4h−1b (δ(G))

. Then,

|S ∪ T |+ |S ∩ T | = |S|+ |T |

≥ k

(1− 1

η

)+ k

h−1b (δ(G))

2

= k

(1−

h−1b (δ(G))

4

)+ k

h−1b (δ(G))

2

= k

(1 +

h−1b (δ(G))

4

).

Thus,

|S ∩ T | ≥ |S|+ |T | − |S ∪ T | ≥ kh−1b (δ(G))

4, (112)


40

since |S ∪ T | ≤ k. Now,

〈Pe〉 =1

k

k∑i=1

〈Pe,i〉

≥ 1

k

∑i∈S∩T

〈Pe,i〉G2−niD(g||p)(p(1− g)

g(1− p)

)ε(〈Pe,i〉G)√ni

≥ 1

k

∑i∈S∩T

〈Pe,i〉G2−ηnD(g||p)(p(1− g)

g(1− p)

)ε(〈Pe,i〉G)√ηn

(a)

≥ |S ∩ T |k

h−1b (δ(G))

22−ηnD(g||p)

(p(1− g)

g(1− p)

)ε(h−1b

(δ(G))

2)√ηn

(b)

≥(h−1b (δ(G))

)28

2− 4n

h−1b

(δ(G))D(g||p)

(p(1− g)

g(1− p)

)ε(h−1b

(δ(G))

2

)√4n

h−1b

(δ(G))

.

where (a) follows from the monotonicity of fG(·) in Lemma 8, and the observation that 〈Pe,i〉G ≥h−1b (δ(G))

2 for all i ∈ T , and hence for all i ∈ S ∪ T . Inequality (b) follows from (112).

For performing the approximation analysis, we take log(·) of both sides of the lower bound,

log2 (〈Pe〉) ≥ 2 log2

(h−1b (δ(G))

)−3− 4n

h−1b (δ(G))

D(g||p)−ε

(h−1b (δ(G))

2

)√4n

h−1b (δ(G))

log2

(g(1− p)p(1− g)

)(113)

Choosing 〈Pe〉 = gapβ ,

4n

h−1b (δ(G))

D(g||p)+ε

(h−1b (δ(G))

2

)√4n

h−1b (δ(G))

log2

(g(1− p)p(1− g)

)+β log2 (gap)−2 log2

(h−1b (δ(G))

)+3 ≥ 0.

(114)

In this quadratic equation, a = D(g∗||p), b = ε log2

(g(1−p)p(1−g)

), and c = β log2 (gap)−2 log2

(h−1b (δ(G))

)+

3, while the variable is 4nh−1b (δ(G))

. Again, the constant 3 can be ignored at small gap.

c = β log2 (gap)− 2 log2

(h−1b (δ(g∗))

)+ 3

≤(

2d

d− 1r − β

)log2

(1

gap

)+K3 + o(1) (115)

Also c ≥ (2r − β) log2

(1

gap

)+K4 + o(1). (116)

for a pair of constants K3,K4. Thus, for gap small enough and r < β(d−1)2d , we know that c < 0.

Following the analysis in Theorem 4, we obtain

4n

h−1b (δ(G))

≥ Ω

((log2 ((1/gap)))2

gap2r

)(117)

for all r < min

β2d

d−1

, 1

. From Lemma 3, log2

(h−1b (δ(G))

)≥ d

d−1r log2 (gap)−1 +K1 + o(1). Thus,

by taking d arbitrarily large, we arrive at the approximation result.


41

REFERENCES

[1] C. Shannon, R. Gallager, and E. Berlekamp, “Lower bounds to error probability for coding on discrete memoryless channels.

I,” Information and Control, vol. 10, no. 1, pp. 65–103, 1967.

[2] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp, “Lower bounds: part I,” Information and Control, vol. 25, no. 3, pp.

222–266, Jul. 1974.

[3] R. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory, vol. 11,

no. 1, pp. 3–18, 2002.

[4] A. Valembois and M. Fossorier, “Sphere-packing bounds revisited for moderate block lengths,” IEEE Trans. Inform. Theory,

vol. 50, no. 12, pp. 2998–3014, 2004.

[5] G. Wiechman and I. Sason, “An improved sphere-packing bound for finite-length codes over symmetric memoryless

channels,” IEEE Trans. Inform. Theory, vol. 54, no. 5, pp. 1962–1990, 2008.

[6] Y. Polyanskiy, H. Poor, and S. Verdu, “Dispersion of Gaussian channels,” in IEEE International Symposium on Information

Theory (ISIT). IEEE, 2009, pp. 2204–2208.

[7] P. Elias, “Coding for noisy channels,” IRE National Convention Record, vol. 4, pp. 37–46, 1955.

[8] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Inform.

Theory, vol. 13, no. 2, pp. 260–269, Apr. 1967.

[9] G. D. Forney, “Convolutional codes II. maximum-likelihood decoding,” Information and Control, vol. 25, no. 3, pp. 222–

266, Jul. 1974.

[10] I. M. Jacobs and E. R. Berlekamp, “A lower bound to the distribution of computation for sequential decoding,” IEEE

Trans. Inform. Theory, vol. 13, no. 2, pp. 167–174, Apr. 1967.

[11] G. D. Forney, “Convolutional codes III. sequential decoding,” Information and Control, vol. 25, no. 3, pp. 267–297, Jul.

1974.

[12] F. Jelinek, “Upper bounds on sequential decoding performance parameters,” IEEE Trans. Inform. Theory, vol. 20, no. 2,

pp. 227–239, Mar. 1974.

[13] E. Arikan, “Sequential decoding for multiple access channels,” Ph.D. dissertation, Massachusetts Institute of Technology,

Cambridge, MA, 1985.

[14] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press, 2007.

[15] R. G. Gallager, “Low-density parity-check codes,” Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge,

MA, 1960.

[16] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,”

IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.

[17] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533–547, Sep.

1981.

[18] I. Sason and R. L. Urbanke, “Parity-check density versus performance of binary linear block codes: New bounds and

applications,” IEEE Trans. Inform. Theory, vol. 53, no. 2, pp. 550–579, 2007.

[19] H. D. Pfister, I. Sason, and R. Urbanke, “Capacity-achieving ensembles for the binary erasure channel with bounded

complexity,” IEEE Trans. Inform. Theory, vol. 51, no. 7, pp. 2352–2379, Jul. 2005.


42

[20] H. D. Pfister and I. Sason, “Accumulaterepeataccumulate codes: Capacity-achieving ensembles of systematic codes for the

erasure channel with bounded complexity,” IEEE Trans. Inform. Theory, vol. 53, no. 6, pp. 2088–2115, Jun. 2007.

[21] C.-H. Hsu and A. Anastasopoulos, “Capacity-achieving codes for noisy channels with bounded graphical complexity and

maximum likelihood decoding,” submitted to IEEE Transactions on Information Theory, 2006.

[22] A. Khandekar, “Graph-based codes and iterative decoding,” Ph.D. dissertation, California Institute of Technology, Pasadena,

CA, 2002.

[23] A. Khandekar and R. McEliece, “On the complexity of reliable communication on the erasure channel,” in IEEE

International Symposium on Information Theory, 2001.

[24] I. Sason, “Bounds on the number of iterations for turbo-like ensembles over the binary erasure channel,” submitted to

IEEE Transactions on Information Theory, 2007.

[25] J. Jiang and K. Narayanan, “Iterative soft decision decoding of Reed Solomon codes based on adaptive parity check

matrices,” in Proceedings of the 2004 IEEE Symposium on Information Theory, Chicago, Illinois, Jun. 2004, p. 261.

[26] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input

memoryless channels,” Jul. 2008. [Online]. Available: http://arxiv.org/abs/cs/08073917

[27] P. Grover and A. Sahai, “Green codes: Energy-efficient short-range communication,” in Proceedings of the 2008 IEEE

Symposium on Information Theory, Toronto, Canada, Jul. 2008.

[28] ——, “Time-division multiplexing for green broadcasting,” in Proceedings of the 2009 IEEE Symposium on Information

Theory, Seoul, South Korea, Jul. 2009.

[29] P. Grover, “Bounds on the tradeoff between rate and complexity for sparse-graph codes,” in 2007 IEEE Information Theory

Workshop (ITW), Lake Tahoe, CA, 2007.

[30] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: John Wiley, 1971.

[31] D. Burshtein, M. Krivelevich, S. Litsyn, and G. Miller, “Upper bounds on the rate of LDPC codes,” IEEE Trans. Inform.

Theory, vol. 48, Sep. 2002.

[32] I. Sason and R. Urbanke, “Parity-check density versus performance of binary linear block codes over memoryless symmetric

channels,” IEEE Trans. Inform. Theory, vol. 49, no. 7, pp. 1611–1635, Jul. 2003.

[33] P. Grover and A. K. Chaturvedi, “Upper bounds on the rate of LDPC codes for a class of finite-state Markov channels,”

IEEE Trans. Inform. Theory, vol. 53, pp. 794–804, Feb. 2007.

[34] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithmns, 1989.

[35] C. D. Thompson, “Area-time complexity for VLSI,” in STOC ’79: Proceedings of the eleventh annual ACM symposium

on Theory of computing. New York, NY, USA: ACM, 1979, pp. 81–88.

[36] A. Sahai, “Why block-length and delay behave differently if feedback is present,” IEEE Trans. Inform. Theory, pp. 1860

– 1886, May 2008. [Online]. Available: http://arxiv.org/abs/cs/0610138

[37] M. S. Pinsker, “Bounds on the probability and of the number of correctable errors for nonblock codes,” Problemy Peredachi

Informatsii, vol. 3, no. 4, pp. 44–55, Oct./Dec. 1967.

[38] R. M. Corless, G. H. Gonnet, D. E. G. Hare, and D. E. Knuth, “On the Lambert W function,” Advances in Computational

Mathematics, vol. 5, pp. 329–359, 1996.

[39] H. Palaiyanur, personal communication, 2007.

[40] C. D. Thompson, “A complexity theory for VLSI,” Ph.D. dissertation, Pittsburgh, PA, USA, 1980.

[41] P. Grover, A. Sahai, and S. Y. Park, “The finite-dimensional witsenhausen counterexample,” in Proceedings of the 7th


43

International Symposium on Modeling and Opitmization in Ad-Hoc and Wireless Networks (WiOpt), Workshop on Control

over Communication Channels (ConCom), Seoul, Korea, Jun. 2009.

[42] E. A. Haroutunian, “Lower bound for error probability in channels with feedback,” Problemy Peredachi Informatsii, vol. 13,

no. 2, pp. 36–44, 1977.

[43] C. Chang, personal communication, Nov. 2007.


1 a general lower bound on the decoding complexity of

Documents