1 a general lower bound on the decoding complexity of
TRANSCRIPT
1
A general lower bound on the decoding
complexity of message passing decodingPulkit Grover and Anant Sahai
Department of Electrical Engineering and Computer Sciences, UC Berkeley
pulkit,[email protected]
Abstract
We find lower bounds on complexity-performance tradeoffs for message-passing decoding model,
where the complexity is measured by the size of the largest neighborhood, and the performance by the
rate and the average bit-error probability. These lower bounds show that the maximum neighborhood size
increases unboundedly as the error probability approaches zero, and as the rate approaches capacity. No
assumption is made on the code structure or on particular implementation of message-passing decoding:
the obtained bounds are based purely on the limitations of the message-passing decoding model. In
contrast with most of the existing results that hold on average over an ensemble, these bounds hold
for any chosen code. The bounds are derived using an application of the sphere-packing concept to
neighborhood sizes of message passing decoding, instead of the usual blocklength. Next, assuming that
the code is a Low-Density Parity-Check code, we show that similar lower bounds hold as the rate
approaches Gallager’s upper bound. Results with slightly weaker scaling are also derived for average
neighborhood size for any code.
I. INTRODUCTION
It is desirable for a channel code to operate at rates close to the channel capacity, at low error probability,
and to have low encoding and decoding complexity. To what extent are these goals consistent with each
other? Historically, discovery of any class of codes and/or encoding/decoding algorithm has therefore
been followed by an analysis of the performance-complexity tradeoffs for the code/algorithm. These
tradeoffs treat proximity to capacity and error probability as the performance of the code, and measure
the complexity of encoding/decoding as the number of computations performed in a computation model,
or as a function of a code parameter.
October 29, 2010 DRAFT
2
Shannon (1959) [1] (for AWGN channels) and Shannon, Gallager and Berlekamp (1974) [2] (for
arbitrary discrete memoryless channels) derived outer bounds on the performance complexity tradeoffs
for block-codes. Since the decoding algorithm can be different for different sub-families of block-codes,
the measure of complexity was taken to be the block-length. The obtained results show that the error
probability Pe for a rate R code operating close to the channel capacity C at behaves as Pe ≈ e−mEr(R)
where m is the block-length, and Er(R) = Θ((C −R)2
)is the random-coding error exponent. The
block-length thus diverges to infinity as the error probability converges to zero, and as the rate approaches
capacity. To obtain decoding complexity in the Turing-machine model for a specific code and decoding
algorithm, one only needs to understand the complexity as a function of the block-length. Inner bounds
(achievability) based on random coding and expurgation of “bad” codewords were derived by Gallager
in 1965 [3]. These bounds show that the outer bounds on the exponents are tight at high rates. Recent
works by Fossorier [4], Weichman and Sason [5], and Polyanskiy, Poor and Verdu [6] tighten the inner
and outer bounds at finite block-lengths.
Convolutional codes, discovered by Elias in 1955 [7], called for a similar analysis. With termination
at finite lengths, these codes can be thought of as block-codes, but the performance of these codes is
limited by the constraint-length (or more generally, the length of the memory at the encoder), which is
typically much smaller than the block-length. The results for block-codes were thus no longer useful for
these codes. Since the codes are based on dynamical systems, it is meaningful to treat the memory-length
at the encoder is used as a measure of complexity. The inner bound is obtained using finite-constraint-
length linear convolutional codes. The outer bound only depends on the encoding memory-length [8],
[9], and shows that the memory-length must diverge to infinity in a manner similar to that for block-
codes. Since the complexity of Viterbi decoding [8] increases exponentially in the constraint length, the
result characterizes the performance-complexity tradeoff for Viterbi decoding. A separate analysis was
performed for sequential decoding model. Jacobs and Berlekamp [10] showed that the (random) number
of guesses required in sequential decoding is lower-bounded by a Pareto random variable. Successively
smaller moments of the the number of guesses diverge as the rate approaches capacity — notably, there
is a computational cutoff rate above which the average number of guesses is infinity. These lower bounds
depend only on the operating rate and the channel capacity. Matching inner bounds are obtained using
random linear convolutional codes and specific sequential decoding algorithms [11], [12]. The tightness
of the inner and outer bounds was shown by Arikan in his thesis [13].
A clear pattern thus emerges in such analyses — outer bounds are obtained by abstracting away
from the code structure, allowing the code to be arbitrary apart from keeping the complexity-measure
October 29, 2010 DRAFT
3
constant. Inner bounds possibly show the tightness of the outer bounds in an asymptotic sense. Tightness
ensures that the limitations of the code family or the decoding model are understood completely, allowing
comparison with other families and newer constructions that defy the bounds.
The last two decades have seen re-emergence of sparse-graph codes, such as the Low-Density Parity-
Check (LDPC) codes, and associated message-passing based decoding algorithms [14]. LDPC codes,
along with some message-passing based decoding algorithms were developed by Gallager in his the-
sis [15]. The ideas have since been extended to many other classes of codes (e.g. irregular LDPC
codes, Irregular Repeat-Accumulate (IRA) codes, Accumulate-Repeat-Accumulate (ARA) codes) and
message-passing based decoding algorithms (e.g. belief-propagation (BP) decoding [16]). To understand
the limitations of these new code classes and decoding algorithms, a similar analysis as for other code
families and decoding models is needed here.
The issue was explored first by Gallager himself in his thesis, where he derived upper bounds on
the rate for reliable communication as a function of the number of ones in each row of the parity-check
matrix of a regular LDPC code [15, Pg. 38]. This has been interpreted as a result on graphical complexity
of the LDPC code, where graphical complexity is the average number of edges per information bit in
the graphical representation [17] of these codes. The result generalizes to irregular-LDPC codes [18], but
does not extend to many other code families of interest. In fact, Pfister, Sason, and Urbanke [19] provide
a BEC-capacity-achieving sequence of non-systematic IRA codes that has bounded graphical complexity
(under belief-propagation decoding). Pfister and Sason further extended it in [20] by giving systematic
constructions for the BEC with similar properties. Hsu and Anastasopoulos extended the result in [21]
to codes for noisy channels with bounded graphical complexity (under ML decoding). In conclusion,
the tradeoff between the graphical complexity and the performance is trivial in that there are bounded
graphical complexity constructions that approach capacity and attain arbitrarily low error probability, at
least over the BEC.
While graphical complexity can be interpreted abstractly as a measure of code complexity itself,
more concretely, in message-passing decoding, it is a measure of complexity per-iteration. This naturally
suggests another measure of complexity for message-passing decoding model — the number of iterations.
The issue was first addressed by Khandekar and McEliece where they conjectured (based on EXIT chart
arguments) the scaling of the number of iterations with error probability and the gap from capacity.
In [22, Pg. 69-72] and [23], They conjecture that for all sparse-graph codes, the number of iterations
must scale either multiplicatively as Ω(
1gap log2
(1〈Pe〉
)), or additively as Ω
(1
gap + log2
(1〈Pe〉
))in
vicinity of capacity and low error probability. This conjecture comes from an EXIT-chart based graphical
October 29, 2010 DRAFT
4
argument concerning the message-passing decoding of sparse-graph codes over the BEC. The intuition
is that the bound should also hold for general memoryless channels, since the BEC is the channel with
the simplest decoding.
In [24], Sason considers specific families of codes: the LDPC codes, the Accumulate-Repeat-Accumulate
(ARA) codes, and the Irregular-Repeat Accumulate (IRA) codes. He demonstrates that in the presence
of degree-2 nodes in the graph, the number of iterations scales as Ω( 1gap ) for the BEC in the limit of low
error probability, partially resolving the Khandekar-McEliece conjecture (the dependence on the error
probability is not addressed, nor is the issue settled for other classes of sparse-graph codes). If, however,
the fraction of degree-2 nodes for these codes converges to zero, then the bounds in [24] become trivial.
Sason notes that all the known traditionally capacity-achieving sequences of these code families have a
non-zero fraction of degree-2 nodes.
The approach in [24], however, does not abstract away from the code structure to evaluate the limits
of the message passing algorithm itself. Such a study is necessitated by the discovery of ever new classes
of codes based on sparse-graphs can be decoded using such algorithms. Further, it has been suggested
that even some classes of codes that are not based on sparse-graphs (e.g., Reed-Solomon codes [25], and
the promising new class of polarization codes [26]) can also be decoded using message-passing based
algorithms.
Since all these codes are a sub-class of block-codes, a hope could be that the results for block coding
are applicable here. However, for the low-complexity decoding algorithms proposed by Gallager (now
known as Gallager A and Gallager B algorithms), each message bit need not make use of the entire block-
length to decode its value. Instead, each bit decodes its value by passing messages over the code-graph,
utilizing only a small subset of the received symbols that lie in its graphical neighborhood. The size of
this decoding neighborhood is thus a natural measure of complexity for message-passing decoding.
In this paper, we provide general bounds on the decoding complexity of message-passing decoding.
Instead of the analyzing the number of iterations, we instead consider a related measure of neighborhood
size (see Table I for a quick summary comparison of results in the area.). The two measures are related
by the maximum connectivity at the decoder, an issue that is further explored in [27], [28][waterslide].
The approach here is based on that in [29], but the results obtained here are more general and stronger.
October 29, 2010 DRAFT
5
It is desirable for a channel code to operate at rates close to the channel capacity, at low error
probability, and to have low encoding and decoding complexity. A natural question is whether these goals
are consistent with each other. Historically, discovery of any class of codes and/or encoding/decoding
algorithm is followed by an analysis of the performance-complexity tradeoffs for the code/algorithm. The
approaches for this analysis can be classified into four different types:
1) Fixing the code (or the code family) and a particular decoding algorithm.
2) Fixing the code (or the code family) but not the decoding algorithm.
3) Fixing only the decoding algorithm, and allowing code to be arbitrary.
4) Allowing the code and the decoding algorithm to be arbitrary.
In each of these approaches, the problem is investigated from two complementary directions — the
achievability and the converse. The toolset for the two directions is often quite different. Providing
a good inner bound (the achievability) requires developing a good code and a suitably inexpensive
encoding/decoding algorithm. One only needs to ensure that the code family and/or the decoding algorithm
lie in the set of interest. In contrast, the outer bound (the converse) may allow for many possible codes
or decoding algorithms (or both).
Approach 1 yields the strongest bounds, but of limited applicability since both the code family and
the decoding algorithm are fixed. Approaches 2 and 3, in contrast, offer insights on the limitations of the
code family or the decoding algorithm in isolation, and are thus more general. The problem formulation
of approach 4 is vague unless we adhere to a computation model to measure the complexity.
The same problem afflicts approach 2. However, for a specific code family, the issue of algorithmic
complexity can be side-stepped by using a code parameter (e.g. block-length) as a measure of complexity.
For any choice of encoding/decoding algorithm, the complexity can then be calculated as a function of
this measure. The analysis of performance-complexity tradeoffs for block-codes is a good example of this
approach. Random-coding and expurgated error exponents [30] provide achievable tradeoffs (that serve
as the inner bounds) between the complexity (measured in block-length) and the error probability. On the
other hand, sphere-packing and straight-line bound converses provide the outer bounds [30]. The analysis
is not without its limitations. For example, it does not provide any insight for decoding algorithms that
do not use the entire block for decoding each symbol — a point we shall revisit later.
Approach 2 has also been followed for codes defined using dynamical systems, the length of memory
at the encoder is used as a measure of complexity. The inner bound is obtained using finite-constraint-
length linear convolutional codes. The outer bound only requires the memory at the encoder to obtain
the decoding complexity as a function of this memory length [8], [9].
October 29, 2010 DRAFT
6
In Approach 3, without restricting attention to any particular code family, a class of encoding/decoding
algorithms is considered. Explicit complexity results in the number of computations required at the
encoder/decoder are then obtained. For example, this approach was used to analyze the complexity-
performance tradeoffs for sequential decoding. Jacobs and Berlekamp [10] showed that the (random)
number of guesses required in sequential decoding is lower-bounded by a Pareto random variable,
regardless of the choice of the code. Successively smaller moments of the the number of guesses diverge
as the rate approaches capacity — notably, there is a computational cutoff rate for sequential decoding
corresponding to when the average number of computations must diverge. Matching inner bounds are
obtained using random linear convolutional codes and specific sequential decoding algorithms [11], [12].
The past few years have seen emergence of sparse-graph codes and associated message-passing based
iterative decoding algorithms [14]. Thus the family of sparse-graph codes and the message-passing
decoding algorithm demand a similar understanding as we have for other classes of codes and decoding
algorithms. Message-passing based iterative decoding algorithms decode each bit using a “neighborhood”
of channel outputs that is specific to the bit. The size of this neighborhood can be much smaller than
the block-length (indeed, much of the analysis assumes that the block-length is infinite). Thus the results
from block-codes, while valid, do not offer insights into performance-complexity tradeoffs for decoding
complexity under message-passing decoding.
Following approach 2, analysis has been performed for sub-families of sparse-graph codes by measuring
the “graphical complexity” of these codes, which is the average number of edges per-bit in the Tanner
graph representation of these codes [14]. Graphical complexity serves as a measure of the complexity
per-iteration in decoding of these codes, but also as the implementation complexity for the encoder and
the decoder. For regular LDPC codes, Gallager1 provided outer bounds on the graphical complexity for
the BSC channel showed that the graphical complexity must diverge to infinity in order for these codes to
approach capacity [15, Pg. 38]. The result has subsequently been generalized to irregular codes over any
binary-input symmetric memoryless channel [31], [32] and even to some classes of Markov channels [33].
These results show that the graphical complexity of LDPC codes must go to infinity as Ω(
ln(
1gap
))in
the limit of low error probability. Here we use the Ω notation to denote lower-bounds in the order sense
of [34]. Observe that these results hold for any decoding algorithm.
Mere puncturing of the code, however, leads to a dense-parity check matrix, and hence the results for
1The graphical interpretation of these codes, and hence of Gallager’s results, came only much later (in 1981) and is due to
Tanner [17].
October 29, 2010 DRAFT
7
LDPC codes are not directly applicable. This observation naturally suggests that bounds on graphical
complexity of LDPC codes can be beaten. In fact, Pfister, Sason, and Urbanke demonstrate in [19] a
BEC-capacity-approaching family of non-systematic codes whose graphical models contain a bounded
average number of edges per information bit (under belief-propagation decoding). Pfister and Sason
further extended it in [20] by giving systematic constructions for the BEC with similar properties. Hsu and
Anastasopoulos extended the result in [21] to codes for noisy channels with bounded graphical complexity
(under ML decoding). In conclusion, the complexity-performance tradeoff for graphical complexity is
trivial in that the complexity is bounded regardless of the desired rate.
The issue has been considered by looking at the number of iterations, primarily using approach 1 (i.e.
by restricting the code family as well as the decoding algorithm). In [22, Pg. 69-72] and [23], Khandekar
and McEliece conjectured that for all sparse-graph codes, the number of iterations must scale either
multiplicatively as Ω(
1gap log2
(1〈Pe〉
)), or additively as Ω
(1
gap + log2
(1〈Pe〉
))in vicinity of capacity
and low error probability. This conjecture comes from an EXIT-chart based graphical argument concerning
the message-passing decoding of sparse-graph codes over the BEC. The intuition is that the bound should
also hold for general memoryless channels, since the BEC is the channel with the simplest decoding.
In [24], Sason considers specific families of codes: the LDPC codes, the Accumulate-Repeat-Accumulate
(ARA) codes, and the Irregular-Repeat Accumulate (IRA) codes. He demonstrates that in the absence of
degree-2 nodes in the graph, the number of iterations scales as Ω( 1gap ) for the BEC in the limit of low
error probability, partially resolving the Khandekar-McEliece conjecture (the dependence on the error
probability is not addressed, nor is the issue settled for other classes of sparse-graph codes). If, however,
the fraction of degree-2 nodes for these codes converges to zero, then the bounds in [24] become trivial.
Sason notes that all the known traditionally capacity-achieving sequences of these code families have a
non-zero fraction of degree-2 nodes.
Neither of these two approaches abstracts away from the code structure to evaluate the limits of
the message passing algorithm itself, which is necessitated by the discovery that many new classes of
codes based on sparse-graphs can be decoded using such algorithms. Further, it has been suggested that
even some classes of codes that are not based on sparse-graphs (e.g., Reed-Solomon codes [25], and
the promising new class of polarization codes [26]) can also be decoded using message-passing based
algorithms. In this paper, we use approach 3 to provide general bounds on the decoding complexity
of message-passing decoding. Instead of the analyzing the number of iterations, we instead consider a
related measure of neighborhood size (see Table I for a quick summary comparison of results in the
area.). The two measures are related by the maximum connectivity at the decoder, an issue that is further
October 29, 2010 DRAFT
8
explored in [27], [28][waterslide]. The approach here is based on that in [29], but the results obtained
here are more general and stronger.
This paper is organized as follows. We introduce the notation and the problem in Section II. Section III
provides the lower bounds on the maximum neighborhood size as a function of the desired bit-error
probability and the rate. Tighter bounds are provided for LDPC codes of given average check degree,
and also for sparse-graph codes with given threshold behavior. In Section III-D, slightly weaker results
are derived for the average neighborhood size, instead of the maximum neighborhood size. We conclude
in Section IV.
Lower bounds on complexity of algorithms in circuit implementations have yielded [35] lower bounds
on energy consumption in circuits. In a companion paper [waterslide], we explore parallel implications
on decoding energy for message-passing decoding.
TABLE I
COMPARISON OF VARIOUS RESULTS ON COMPLEXITY-PERFORMANCE TRADEOFFS
Reference Codes decoding algorithm channel Lower bound on complexity
Gallager [15] regular LDPC ML (and hence all) BSC code density=Ω(log(1/gap))
Burshtein et al [31] LDPC ML (and hence all) Symmetric code density=Ω(log(1/gap))
Sason et al [32] (including irregular) memoryless
Khandekar et al [23], LDPC, IRA, ARA Belief-Propagation decoding BEC iterations=Ω(1/gap)
Sason et al [24]
This paper all any message-passing decoding BSC or nbd size= Ω
(log2
(1〈Pe〉
)gap2
)(including non-linear) AWGN
II. NOTATIONS, DEFINITIONS AND PROBLEM STATEMENT
In the following, we introduce the notation and describe our model of the decoder. Consider a point-
to-point communication link. An information sequence Bk1 is encoded into 2mR codewords Xm
1 , using a
possibly randomized encoder. The information sequences are assumed to consist of iid fair coin tosses
and hence the rate of the code is R = k/m. Following tradition, both k and m are considered to be very
large.
Two channel models are considered: the BSC and the power-constrained AWGN channel. The observed
channel output is denoted by Ym1 in either case. For the BSC, the underlying channel crossover probability
is denoted by p. In our derivations we often need a “test” channel, denoted by G, that models a deviant
October 29, 2010 DRAFT
9
channel behavior. For BSC, the crossover probability of the test channel is denoted by g. For the AWGN
channel, the average received power is denoted by PT , and the noise variance by σ20 so the received SNR
is PTσ2
0. The noise variance of the AWGN test channel is denoted by σ2
G.
We do not impose any a priori structure on the code itself. The decoding algorithm estimates the value
of each bit from a subset of channel outputs that is decided apriori for each bit. We refer to this subset
of channel as the decoding “neighborhood” of the i-th bit. This neighborhood can be generated, for
example, using iterative decoding algorithms. We use 〈Pe,i〉 to denote the average probability of bit error
on the i-th message bit (averaged over over the channel realizations, the messages, the encoding, and the
decoding) at the end of the decoding. Similarly, 〈Pe〉 = 1k
∑i 〈Pe,i〉 is used to denote the overall average
probability of bit error after l iterations. Appropriate subscripts are used to denote the test channel, for
example, while 〈Pe〉 is used to denote the average bit-error probability for the true channel, 〈Pe〉G denotes
average bit-error probability under a test channel G.
III. LOWER BOUNDS ON THE REQUIRED MAXIMUM NEIGHBORHOOD SIZE
In this section, lower bounds are derived on the required maximum neighborhood size in message-
passing decoding as a function of the gap from capacity and the desired error probability. These bounds
reveal that the maximum neighborhood size must grow unboundedly with improved system performance
— the rates approach capacity and the error probability converges to zero.
A. Lower bounds on the probability of error as a function of the maximum neighborhood size
The main bounds are given by theorems that capture a local sphere-packing effect. These bounds are
closely related2 to the local sphere-packing bounds used in the context of streaming codes with bounded
bit-delay [36], [37]. These sphere-packing bounds can be turned around to give a family of lower bounds
on the neighborhood size n as a function of 〈Pe〉. This family is indexed by the choice of a hypothetical
channel G and the bounds can be optimized numerically for any desired set of parameters.
Theorem 1: Consider a BSC with crossover probability p < 12 . Let n be the maximum size of the
decoding neighborhood of any individual bit. The following lower bound holds on the average probability
of bit error
〈Pe〉 ≥ supC−1(R)<g≤ 1
2
h−1b (δ(G))
22−nD(g||p)
(p(1− g)
g(1− p)
)ε√n, (1)
2The main technical difference between the results here and those of [36] is that the bounds here are nonasymptotic and are
applicable to average probability of bit-error rather than the maximum probability of error.
October 29, 2010 DRAFT
10
where hb(·) is the usual binary entropy function, D(g||p) = g log2
(gp
)+ (1− g) log2
(1−g1−p
)is the usual
KL-divergence, and
δ(G) = 1− C(G)
R(2)
where C(G) = 1− hb(g)
and ε =
√√√√ 1
K(g)log2
(2
h−1b (δ(G))
)(3)
where K(g) = inf0<η<1−g
D(g + η||g)
η2. (4)
Proof: See Appendix A.
For regular-LDPC codes, in [15, Pg. 38], Gallager provided an upper bound on the rate that is valid for
ML decoding (and hence any decoding algorithm). The bound has since been extended to irregular-LDPC
codes as well [32], and is given by R < 1− hb(p)hb(pd) , where pd = 1−(1−2p)d
2 , where d is the average check
node degree of the LDPC code. Because this bound is strictly smaller than the channel capacity, it is a
natural question whether this bound acts as the capacity for LDPC codes in Theorem 1. This intution
is formalized in the next theorem. Further, specializing the decoding algorithm to belief-propagation
decoding [16] (and related algorithms, such as Gallager A and Gallager B [16]) and random ensembles
of sparse-graph codes [14], tighter bounds can be derived which are also provided in the next theorem.
Theorem 2: For setting as in Theorem 1, for LDPC codes of average check node degree d,
〈Pe〉 ≥ supg:1− hb(g)
hb(gd)<R
h−1b (δ2(d,G))
22−nD(g||p)
(p(1− g)
g(1− p)
)ε2√n(5)
where hb(·) and D(g||p) are as in Theorem 1, gd = 1−(1−2g)d
2 , and
δ2(d,G) = hb(gd)−hb(gd)− hb(g)
R(6)
and ε2 =
√√√√ 1
K(g)log2
(2
h−1b (δ2(d,G))
), (7)
for K(g) as defined in Theorem 1. Further, for setting as in Theorem 1, for iteratively decoded sparse-
graph codes with threshold parameter p∗, and δ3 as the minimum average bit-error probability above
threshold,
〈Pe〉 ≥ supg>p∗
h−1b (δ3)
22−nD(g||p)
(p(1− g)
g(1− p)
)ε3√n(8)
October 29, 2010 DRAFT
11
where
ε3 =
√√√√ 1
K(g)log2
(2
h−1b (δ3)
). (9)
Proof: The proofs are similar to that of Theorem 1, and are relegated to Appendix B. The first
involves proving a version of Fano’s inequality using Gallager’s upper bound. The second relies on the
observation that for Belief-Propagation and associated algorithms, there exists a non-zero lower bound
on the error probability regardless of the number of iterations.
The second part of Theorem 2 holds for the decoding algorithms that operate on the code graph
itself for codes that are generated randomly, as specified by the “socket construction” of [16]. Since the
threshold can depend on scheduling of message-passing, we assume that the threshold is calculated for
the same scheduling of message-passing as is implemented to obtain the neighborhoods. We note here
that δ3 can be zero for some sparse-graph codes, such as the LDGM codes, and the bound is trivial for
such codes.
Theorem 3: For the AWGN channel and the decoder model in Section II, let n be the maximum size
of the decoding neighborhood of any individual message bit. The following lower bound holds on the
average probability of bit error.
〈Pe〉 ≥ supσ2G: C(G)<R
h−1b (δ(G))
2exp
(−nD(σ2
G||σ20)−
√n
(3
2+ 2 ln
(2
h−1b (δ(G))
))(σ2G
σ20
− 1
))(10)
where δ(G) = 1−C(G)/R, the capacity C(G) = 12 log2
(1 + PT
σ2G
), and the KL divergence D(σ2
G||σ20) =
12
(σ2G
σ20− 1− ln
(σ2G
σ20
)).
The following lower bound also holds on the average probability of bit error
〈Pe〉 ≥ supσ2G>σ
20µ(n): C(G)<R
h−1b (δ(G))
2exp
(−nD(σ2
G||σ20)− 1
2φ(n, h−1
b (δ(G)))
(σ2G
σ20
− 1
)), (11)
where
µ(n) =1
2
(1 +
1
T (n) + 1+
4T (n) + 2
nT (n)(1 + T (n))
)(12)
where T (n) = −WL(− exp(−1)(1/4)1/n) (13)
and WL(x) solves x = WL(x) exp(WL(x)) (14)
while satisfying WL(x) ≤ −1 ∀x ∈ [− exp(−1), 0],
and
φ(n, y) = −n(WL
(− exp(−1)
(y2
) 2
n
)+ 1
). (15)
October 29, 2010 DRAFT
12
The WL(x) is the transcendental Lambert W function [38] that is defined implicitly by the relation (14)
above.
Proof: See Appendix C.
The expression (11) is better for plotting bounds when we expect n to be moderate while (10) is more
easily amenable to asymptotic analysis as n gets large.
B. ‘Gap’ to capacity
In the vicinity of capacity, it is possible to communicate at rates above the channel capacity for finite
bit-error probability. Thus the notion of gap from capacity needs delicate handling. In the following, we
show that the appropriate definition of gap is C1−hb(〈Pe〉) −R.
Before transmission, the k bits could be lossily compressed using a source code to ≈ (1− hb(〈Pe〉))k
bits. The channel code could then be used to protect these bits, and the resulting codeword transmitted
over the channel. The decoder can decode jointly the source and the channel code. This scheme is optimal
by the lossy source-channel separation theorem. Therefore, for fixed 〈Pe〉, the maximum achievable rate
is C1−hb(〈Pe〉) .
The appropriate definition of the total gap is, therefore, C1−hb(〈Pe〉) − R. This can be broken down as
sum of two ‘gap’sC
1− hb(〈Pe〉)−R =
C
1− hb(〈Pe〉)− C
+ C −R (16)
The first term goes to zero as 〈Pe〉 → 0 and the second term is the intuitive idea of gap to capacity.
The traditional approach of error exponents is to study the behavior for fixed gap and allowing 〈Pe〉 →
0. Considering the error exponent as a function of the gap reveals something about how difficult it is to
approach capacity.
The natural other path is to fix 〈Pe〉 > 0 and let R→ C. It turns out that the bounds of Theorems 1 and
3 do not give very interesting results in such a case. We need 〈Pe〉 → 0 alongside R→ C. To capture the
intuitive idea of gap, which is just the second term in (16), we want to be able to assume that the effect
of the second term dominates the first. This way, we can argue that the decoding complexity increases
to infinity as gap → 0 and not just because 〈Pe〉 → 0. For this, it suffices to consider 〈Pe〉 = gapβ . As
long as β > 1, the 2 logα1
gap scaling on iterations holds, but it slows down to 2β logα1
gap for 0 < β < 1.
When β is small, the average probability of bit error is dropping so slowly with gap that the dominant
gap is actually the(
C1−hb(〈Pe〉) − C
)term in (16).
October 29, 2010 DRAFT
13
C. Lower bounds on the required maximum neighborhood size as the rate approaches capacity
1) A simple lower bound: Given a crossover probability p, there exists a semi-trivial bound on
the neighborhood size that only depends on the 〈Pe〉. Since there is at least one configuration of the
neighborhood that will decode to an incorrect value for this bit, it is clear that
〈Pe〉 ≥ pn. (17)
However, this bound does not have any dependence on the rate and so does not capture the fact that the
complexity should increase as the gap shrinks.
A similar bound can be derived for general channels. The above bound implicitly allows for repetition
coding in that it assumes that each codeword symbol encodes information contained only for the particular
bit. This concept can be extended to general channels. If the entire neighborhood of Bi were to encode
only the information for Bi, this scheme has a corresponding rate of 1n for a neighborhood of size n.
For large n, we can use the sphere-packing bound to obtain the following lower bound on the bit-error
probability
〈Pe〉 & e−nEsp(1
n) ≥ e−nEsp(0). (18)
The bound is clearly loose, because it requires the entire neighborhood to encode just one bit.
2) Lower bound on n as rate approaches capacity and 〈Pe〉 approaches 0: Section III-A suggests
that the dominant terms governing the number of iterations as we approach capacity are logα ln 1〈Pe〉 +
2 logα1
gap . However, the argument of the previous section requires that 〈Pe〉 → 0 for a finite non-zero
gap. The form of the expression however, seems to suggest that unless 〈Pe〉 is going to zero very rapidly,
the dominant term could be the 2 logα1
gap . To verify this, in this section we consider a joint path to
capacity with 〈Pe〉 = gapβ .
Theorem 4: For the problem as stated in Section II, we obtain the following lower bounds on the
required neighborhood size n for 〈Pe〉 = gapβ and gap → 0.
For the BSC and the AWGN channel,
• For β < 1, n = Ω(
1gap2β−ν
),
• For β ≥ 1, n = Ω(
1gap2−ν
),
for any ν > 0. The same scaling laws hold for LDPC codes of average degree d used over the BSC, with
gap being the gap from Gallager’s upper bound C = 1− hb(pd)hb(p)
, where pd is defined in Theorem 2. For
sparse-graph codes under iterative-decoding with threshold p∗ and δ3 as the minimum average bit-error
probability above threshold, the scaling is n = Ω(
1gap2−ν
p
)for all β ≥ 0, where gapp = p∗ − p is the
gap of the crossover probability from the channel threshold.
October 29, 2010 DRAFT
14
Proof: We give the proof here in the case of the BSC with some details relegated to the Appendix.
The AWGN case follows analogously, with some small modifications that are detailed in Appendix E.
Let the code for the given BSC(p) have rate R. Consider BSC channels G, chosen so that C(G) < R,
where C(·) maps a BSC to its capacity in bits per channel use. Taking log2 (·) on both sides of (1) (for
a fixed g),
log2 (〈Pe〉) ≥ log2
(h−1b (δ(G))
)− 1− nD (g||p)− ε
√n log2
(g(1− p)p(1− g)
). (19)
Rewriting (19),
nD (g||p) + ε√n log2
(g(1− p)p(1− g)
)+ log2 (〈Pe〉)− log2
(h−1b (δ(G))
)+ 1 ≥ 0. (20)
This equation is quadratic in√n. The LHS potentially has two roots. If both the roots are not real, then
the expression is always positive, and we get a trivial lower bound of√n ≥ 0. Therefore, the cases of
interest are when the two roots are real. The larger of the two roots is a lower bound on√n.
Denoting the coefficient of n by a = D (g||p), that of√n by b = ε log2
(g(1−p)p(1−g)
), and the constant
terms by c = log2 (〈Pe〉)− log2
(h−1b (δ(G))
)+ 1 in (20), the quadratic formula then reveals
√n ≥ −b+
√b2 − 4ac
2a. (21)
Since the lower bound holds for all g satisfying C(G) < R = C − gap, we substitute g∗ = p + gapr,
for some r < 1 and small gap. This choice is motivated by examining Fig. 1. The constraint r < 1 is
imposed because it ensures C(g∗) < R for small enough gap.
Lemma 1: In the limit of gap → 0, for g∗ = p+ gapr to satisfy C(g∗) < R, it suffices that r be less
than 1.
Proof:
C(g∗) = C(p+ gapr)
= C(p) + gapr × C ′(p) + o(gapr)
≤ C(p)− gap = R,
for small enough gap and r < 1. The final inequality holds since C(p) is a monotonically-decreasing
concave-∩ function for a BSC with p < 12 whereas gapr increases faster than any linear function of gap
when gap is small enough.
In steps, we now Taylor-expand the terms on the LHS of (20) about g = p.
October 29, 2010 DRAFT
15
goptvsgap.pdf
Fig. 1. The behavior of g∗, the optimizing value of g for the bound in Theorem 1, with gap. We plot log(gopt − p) vs
log(gap). The resulting straight lines inspired the substitution of g∗ = p+ gapr .
Lemma 2 (Bounds on hb(p) and h−1b (p) from [39]): For all d > 1, and for all x ∈ [0, 1
2 ] and y ∈
[0, 1]
hb(x) ≥ 2x (22)
hb(x) ≤ 2x1−1/dd/ ln(2) (23)
h−1b (y) ≥ y
d
d−1
(ln(2)
2d
) d
d−1
(24)
h−1b (y) ≤ 1
2y. (25)
Proof: See Appendix D-A.
Lemma 3:
d
d− 1r log2 (gap)− 1 +K1 + o(1) ≤ log2
(h−1b (δ(g∗))
)≤ r log2 (gap)− 1 +K2 + o(1) (26)
where K1 = dd−1
(log2
(h′b(p)C(p)
)+ log2
(ln(2)d
))where d > 1 is arbitrary and K2 = log2
(h′b(p)C(p)
).
October 29, 2010 DRAFT
16
Proof: See Appendix D-B.
Lemma 4:
D(g∗||p) =gap2r
2p(1− p) ln(2)(1 + o(1)). (27)
Proof: See Appendix D-C.
Lemma 5:
log2
(g∗(1− p)p(1− g∗)
)=
gapr
p(1− p) ln(2)(1 + o(1)).
Proof: See Appendix D-D.
Lemma 6:√r
K(p)
√log2
(1
gap
)(1 + o(1)) ≤ ε ≤
√rd
(d− 1)K(p)
√log2
(1
gap
)(1 + o(1))
where K(p) is from (4).
Proof: See Appendix D-E.
If c < 0, then the bound (21) is guaranteed to be positive. For 〈Pe〉 = gapβ , the condition c < 0 is
equivalent to
β log2 (gap)− log2
(h−1b (δ(g∗))
)+ 1 < 0 (28)
Since we want (28) to be satisfied for all small enough values of gap, we can use the approximations in
Lemma 3–6 and ignore constants to immediately arrive at the following sufficient condition
β log2 (gap)− d
d− 1r log2 (gap) < 0
i.e. r <β(d− 1)
d,
where d can be made arbitrarily large. Now, using the approximations in Lemma 3 and Lemma 5, and
substituting them into (21), we can evaluate the solution of the quadratic equation.
As shown in Appendix D-F, this gives us the following lower bound on n.
n ≥ Ω
(log2 (1/gap)
gap2r
)(29)
for any r < minβ, 1. The first part of Theorem 4 follows.
For LDPC codes with bounded average degree, we only need to verify that δ2(G) in Theorem 2
behaves like δ(G) as gap → 0 (note that the definition of gap for LDPC codes is taken to be the gap
October 29, 2010 DRAFT
17
from Gallager’s upper bound rather than the channel capacity). This can be seen easily as follows
δ2(d,G) = hb(gd)−hb(gd)− hb(g)
R
= hb(gd)
1−1− hb(g)
hb(gd)
R
= hb(gd)
(1− C(g)
C(g)− gap
). (30)
Now, for g∗ = p+ gapr, r < 1,
C(g∗) = C(p+ gapr) = C(p) + gaprC ′(p) + o(gapr),
where C ′(p) < 0. Thus,
C(g∗)
C(g∗)− gap=
C(p) + gaprC ′(p) + o(gapr)
C(p)(
1− gap
C(p)
)=
(1 + gapr
C ′(p)
C(p)+ o(gapr)
)(1 +
gap
C(p)+ o(gap)
)
= 1 +C ′(p)
C(p)+ o(gapr), (31)
since r < 1. Expanding gd,
gd =1− (1− 2p− 2gapr)d
2
=1− (1− 2p)d
(1− 2gapr
1−2p
)d2
=1− (1− 2p)d
(1− 2dgapr
1−2p + o(gapr))
2
= pd + (1− 2p)d−1dgapr + o(gapr).
Thus,
hb(gd) = hb(pd) + h′b(pd)(1− 2p)d−1dgapr + o(gapr), (32)
which approaches hb(pd) as gap → 0. Plugging (32) and (31) in (30), it is clear that δ2(d,G) also
behaves like δG (i.e. Θ(gapr)).
For sparse-graph codes under iterative decoding, choose gapp = p∗ − p, the gap from the threshold.
Again, g = p+ gaprp ensures that g > p∗. Taking log2(·) on both sides of (8) for chosen g,
nD (g||p) + ε3√n log2
(g(1− p)p(1− g)
)+ log2 (〈Pe〉)− log2
(h−1b (δ3)
)+ 1 ≥ 0. (33)
October 29, 2010 DRAFT
18
Observe that δ3 is a constant that does not depend on the desired 〈Pe〉. For small enough 〈Pe〉 (regardless
of gapp), the term c = log2 (〈Pe〉)− log2
(h−1b (δ3)
)+ 1 is positive in the quadratic equation. Thus the
neighborhood size scales as Ω(
1gap2−ν
p
)for all ν > 0 for all β as gapp → 0 (i.e., even for a constant
〈Pe〉 satisfying c < 0, the scaling holds as gapp → 0.
The lower bound on neighborhood size n can immediately be converted into a lower bound on the
minimum number of computational iterations by just taking logα(·). Note that this is not a comment
about the degree of a potential sparse graph that defines the code. This is just about the maximum degree
of the decoder’s computational nodes and is a bound on the number of computational iterations required
to hit the desired average probability of error.
The lower bounds are plotted in Fig. 2 for various different values of β and reveal a log 1gap scaling
to the required number of iterations when the decoder has bounded degree for message passing. This is
much larger than the trivial lower bound of log log 1gap but is much smaller than the Khandekar-McEliece
conjectured 1gap or 1
gap log2
(1
gap
)scaling for the number of iterations required to traverse such paths
toward certainty at capacity.
D. Lower bounds on the average neighborhood size
A lower bound on the maximum neighborhood size may be too weak in certain conditions. For example,
it does not rule out decoders with just one bit with a large neighborhood. The rest of the nodes can have
neighborhood of size 1! In this section we derive lower bounds on the average neighborhood size that
show that the average neighborhood size might scale slower than the maximum neighborhood size, but
still is unbounded as gap → 0 and 〈Pe〉 → 0. The main lower bound and its scaling for the BSC is
stated in the following theorem. To avoid repetition, we do not include the lower bound for the AWGN
channel since it can be derived analogously.
Theorem 5: For a BSC with crossover probability p < 12 , the following lower bound holds on the
average probability of bit-error for given average neighborhood size n = 1k
∑ki=1 ni
〈Pe〉 ≥ supC−1(R)<g≤ 1
2
(h−1b (δ(G))
)28
2− 4n
h−1b
(δ(G))D(g||p)
(p(1− g)
g(1− p)
)ε(h−1b
(δ(G))
2
)√4n
h−1b
(δ(G))
, (34)
for neighborhood sizes ni fixed in advance. The definitions of ε(·) and D(·|| · ) are as in Theorem 1. The
following expression is a lower bound to the asymptotic scaling of n for 〈Pe〉 = gapβ and gap → 0.
• For β < 2, n = Ω
(1
gapβ2 −ν
).
October 29, 2010 DRAFT
19
variousbeta.pdf
Fig. 2. Lower bounds for neighborhood size vs the gap to capacity for 〈Pe〉 = gapβ for various values of β. The curve titled
“balanced” gaps (see (16)) shows the behavior for C1−hb(〈Pe〉)
−C = C−R. The curves are plotted by brute-force optimization
of (1), but reveal slopes that are as predicted in Theorem 4.
• For β ≥ 2, n = Ω(
1gap1−ν
),
for any ν > 0.
Proof: See Appendix F.
IV. DISCUSSIONS AND CONCLUSIONS
We gave lower bounds on the required neighborhood size for a specified code performance, without
making any assumptions on the code structure. Decoding based on a fixed neighborhood size may seem
*********************** TO BE COMPLETED ************* The bounds are order optimal —
consider a random block-code with neighborhood size equal to the block-length. Then clearly, n ≤log2
(1
〈Pe〉
)Er(R) , where Er(R) is the random coding error exponent [30]. The main implications of the proofs
above is then that allowing for different neighborhood sizes instead of the entire block does not improve
the behavior of the error exponent at high rates.
October 29, 2010 DRAFT
20
For decoding algorithms that operate on regular graphs (e.g. regular LDPC codes), these bounds can
be used to obtain lower bounds on the required number of iterations to achieve a certain performance.
The scaling of the bound with the error probability and gap simplifies to
l &1
log2 (α− 1)log2
log2
(1〈Pe〉
)gap2
. (35)
As a function of 〈Pe〉, there exist codes that attain this double-logarithmic scaling. In particular, regular
LDPC codes attain this behavior. However, it is well known that regular LDPC codes do not approach the
channel capacity for any memoryless channel under belief-propagation decoding, and thus they operate
at a substantial gap from capacity. In contrast, there exist irregular LDPC codes and other families of
codes (e.g. IRA codes, ARA codes) that achieve capacity, at least for the BEC. These codes, however,
require the number of iterations to scale as log2
(1〈Pe〉
), and thus require a large number of iterations at
low error probability. Interestingly, recent advances by Lentmaier et al suggest that there exist codes that
might approach capacity with the number of iterations scaling double-logarithmically in 1〈Pe〉 .
If the objective is to minimize the energy used per-bit (under a rate constraint), at short distances,
small gap from capacity may be undesirable. This direction has been explored in [27], [28][waterslide],
where we account for the decoding energy as well. Using a VLSI model of decoding similar to the one
used by Thompson in [35], [40], we show that the transmit energy needs to be at a certain gap from that
predicted by Shannon theory in order for the decoding complexity (and hence the decoding energy) to
be small. Since there is little advantage in approaching capacity in such situations, regular LDPCs may
offer a class of codes that is well suited for short-distance communication.
Technical tools developed in this paper advance those in [36] to neighborhood sizes (instead of delay)
for average bit-error probability (instead of block-error probability) in non-asymptotic cases. The results
on scaling of average neighborhood size also yield new result for average delay, since decoding with
bounded delay is a special case of decoding with bounded neighborhood sizes. These tools are then
further advanced in [41] to obtain bounds on distortion for Witsenhausen’s counterexample and its vector
extensions in distributed control.
APPENDIX A
PROOF OF THEOREM 1: LOWER BOUND ON 〈Pe〉 FOR THE BSC
The derivation of the lower bound can be divided into the following steps: we first show that the
average probability of error for any code must be significant if the (test) channel were a such that the
capacity fell below the rate. Then, a mapping is given that maps the probability of an individual error
October 29, 2010 DRAFT
21
event under the test channel to a lower-bound on its probability under the true channel. This mapping
is shown to be convex-∪ in the probability of error. The convexity allows us to obtain a lower-bound to
the average probability of error under the true channel.
of Theorem 1:
Suppose we ran the given encoder and decoder over a test channel G instead.
Lemma 7 (Lower bound on 〈Pe〉 under test channel G.): If a rate-R code is used over a channel
G with C(G) < R, then the average probability of bit error satisfies
〈Pe〉G ≥ h−1b (δ(G)) (36)
where δ(G) = 1− C(G)R . This holds for any channel model G, not just BSCs.
Proof: See Appendix A-A.
Let bk1 denote the entire message, and let xm1 be the corresponding codeword. Let the common
randomness available to the encoder and decoder be denoted by the random variable U , and its realizations
by u.
Consider the i-th message bit Bi. Its decoding is performed by observing a particular decoding
neighborhood3 of channel outputs ynnbd,i. The corresponding channel inputs are denoted by xnnbd,i, and the
relevant channel noise by znnbd,i = xnnbd,i⊕ynnbd,i where ⊕ is used to denote modulo 2 addition. The decoder
just checks whether the observed ynnbd,i ∈ Dy,i(0, u) to decode to Bi = 0 or whether ynnbd,i ∈ Dy,i(1, u)
to decode to Bi = 1.
For given xnnbd,i, the error event is equivalent to znnbd,i falling in a decoding region Dz,i(xnnbd,i,bk1, u) =
Dy,i(1⊕ bi, u)⊕ xnnbd,i. Thus by the linearity of expectations, (36) can be rewritten as:
1
k
∑i
1
2k
∑bk1
∑u
Pr(U = u) PrG
(Znnbd,i ∈ Dz,i(xnnbd,i(bk1, u),bk1, u)) ≥ h−1
b (δ(G)) . (37)
The following lemma gives a lower bound to the probability of an event under channel BSC(p) given
a lower bound to its probability under channel G.
Lemma 8: Let A be a set of BSC(p) channel-noise realizations zn1 such that PrG(A) = δ. Then
Pr(A) ≥ fG (δ) (38)
3For any given decoder implementation, the size of the decoding neighborhood might be different for different bits i. However,
for notational clarity, we assume that all the neighborhoods are of the same size n corresponding to the largest possible
neighborhood size. This can be assumed without loss of generality since smaller decoding neighborhoods can be supplemented
with additional channel outputs that are ignored by the decoder.
October 29, 2010 DRAFT
22
where
fG(x) =x
22−nD(g||p)
(p(1− g)
g(1− p)
)ε(x)√n
(39)
is a convex-∪ increasing function of x and
ε(x) =
√1
K(g)log2
(2
x
). (40)
Proof: See Appendix A-B.
Applying Lemma 8 in the style of (37) tells us that:
〈Pe〉 =1
k
∑i
1
2k
∑bk1
∑u
Pr(U = u) Pr(Znnbd,i ∈ Dz,i(xnnbd,i(b
k1, u),bk1, u)
)≥ 1
k
∑i
1
2k
∑bk1
∑u
Pr(U = u)fG(PrG
(Znnbd,i ∈ Dz,i(xnnbd,i(b
k1, u),bk1, u)
)). (41)
Since the increasing function fG(·) is also convex-∪, (41) and (37) imply that
〈Pe〉 ≥ fG
1
k
∑i
1
2k
∑bk1
∑u
Pr(U = u) PrG
(Znnbd,i ∈ Dz,i(xnnbd,i(b
k1, u),bk1, u)
)≥ fG(h−1
b (δ(G))).
This proves Theorem 1.
At the cost of slightly more complicated notation, by following the techniques in [36], similar results
can be proved for decoding across any discrete memoryless channel by using Hoeffding’s inequality
in place of the Chernoff bounds used here in the proof of Lemma 7. In place of the KL-divergence
term D(g||p), for a general DMC the arguments give rise to a term maxxD(Gx||Px) that picks out
the channel input letter that maximizes the divergence between the two channels’ outputs. For output-
symmetric channels, the combination of these terms and the outer maximization over channels G with
capacity less than R will mean that the divergence term will behave like the standard sphere-packing
bound when n is large. When the channel is not output symmetric (in the sense of [30]), the resulting
divergence term will behave like the Haroutunian bound for fixed block-length coding over DMCs with
feedback [42].
A. Proof of Lemma 7: A lower bound on 〈Pe〉G.
The lemma can be proved using tools from rate-distortion theory. We give an alternative proof that is
more closely related to Fano’s inequality
October 29, 2010 DRAFT
23
Proof:
H(Bk1)−H(Bk
1|Ym1 ) = I(Bk
1;Ym1 ) ≤ I(Xm
1 ;Ym1 ) ≤ mC(G).
Since the Ber(12) message bits are iid, H(Bk
1) = k. Therefore,
1
kH(Bk
1|Ym1 ) ≥ 1− C(G)
R. (42)
Suppose the message bit sequence was decoded to be Bk1 . Denote the error sequence by Bk
1 . Then,
Bk1 = Bk
1 ⊕ Bk1, (43)
where the addition ⊕ is modulo 2. The only complication is the possible randomization of both the
encoder and decoder. However, note that even with randomization, the true message Bk1 is independent
of Bk1 conditioned on Ym
1 . Thus,
H(Bk1|Ym
1 ) = H(Bk1 ⊕Bk
1|Ym1 )
= H(Bk1 ⊕Bk
1|Ym1 ) + I(Bk
1; Bk1|Ym
1 )
= H(Bk1 ⊕Bk
1|Ym1 )−H(Bk
1|Ym1 , B
k1) +H(Bk
1|Ym1 )
= H(Bk1 ⊕Bk
1|Ym1 )−H(Bk
1 ⊕ Bk1|Ym
1 , Bk1) +H(Bk
1|Ym1 )
= I(Bk1 ⊕Bk
1; Bk1|Ym
1 ) +H(Bk1|Ym
1 )
≥ H(Bk1|Ym
1 )
≥ k
(1− C(G)
R
).
This implies
1
k
k∑i=1
H(Bi|Ym1 ) ≥ 1− C(G)
R. (44)
Since conditioning reduces entropy, H(Bi) ≥ H(Bi|Ym1 ). Therefore,
1
k
k∑i=1
H(Bi) ≥ 1− C(G)
R. (45)
Since Bi are binary random variables, H(Bi) = hb(〈Pe,i〉G), where hb(·) is the binary entropy function.
Since hb(·) is a concave-∩ function, h−1B (·) is convex-∪ when restricted to output values from [0, 1
2 ].
Thus, (45) together with Jensen’s inequality implies the desired result (36).
October 29, 2010 DRAFT
24
B. Proof of Lemma 8: a lower bound on 〈Pe,i〉 as a function of 〈Pe,i〉G.
Proof: First, consider a strongly G−typical set of znnbd,i, given by
Tε,G = zn1 s.t.n∑i=1
zi − ng ≤ ε√n. (46)
In words, Tε,G is the set of noise sequences with weights smaller than ng + ε√n. The probability of an
event A can be bounded using
δ = PrG
(Zn1 ∈ A)
= PrG
(Zn1 ∈ A ∩ Tε,G) + PrG
(Zn1 ∈ A ∩ T cε,G)
≤ PrG
(Zn1 ∈ A ∩ Tε,G) + PrG
(Zn1 ∈ T cε,G).
Consequently,
PrG
(Zn1 ∈ A ∩ Tε,G) ≥ δ − PrG
(T cε,G). (47)
We now need the following Lemma,
Lemma 9: The probability of the atypical set of Bernoulli-g channel noise Zi is bounded above by
PrG
(∑ni Zi − ng√
n> ε
)≤ 2−K(g)ε2 (48)
where K(g) = inf0<η≤1−g
D(g+η||g)η2 .
Proof: See Appendix A-C.
Choose ε such that
2−K(g)ε2 =δ
2
i.e. ε2 =1
K(g)log2
(2
δ
). (49)
Thus (47) becomes
PrG
(Zn1 ∈ A ∩ Tε,G) ≥ δ
2. (50)
Let nzn1 denote the number of ones in zn1 . Then,
PrG
(Zn1 = zn1 ) = gnzn1 (1− g)n−nzn
1 . (51)
October 29, 2010 DRAFT
25
This allows us to lower bound the probability of A for the underlying channel as follows:
Pr(Zn1 ∈ A) ≥ Pr(Zn1 ∈ A ∩ Tε,G)
=∑
zn1∈A∩Tε,G
Pr(zn1 )
PrG(zn1 )PrG
(zn1 )
=∑
zn1∈A∩Tε,G
pnzn1 (1− p)n−nzn
1
gnzn1 (1− g)n−nzn
1
PrG
(zn1 )
≥ (1− p)n
(1− g)n
∑zn1∈A∩Tε,G
PrG
(zn1 )
(p(1− g)
g(1− p)
)ng+ε√n
=(1− p)n
(1− g)n
(p(1− g)
g(1− p)
)ng+ε√nPrG
(A ∩ Tε,G)
≥ δ
22−nD(g||p)
(p(1− g)
g(1− p)
)ε√n.
This results in the desired expression:
fG(x) =x
22−nD(g||p)
(p(1− g)
g(1− p)
)ε(x)√n
. (52)
where ε(x) =√
1K(g) log2
(2x
). To see the convexity-∪ of fG(x), it is useful to apply some substitutions.
Let c1 = 2−nD(g||p)
2 > 0 and let ξ =√
nK(g) ln 2 ln(p(1−g)g(1−p)). Notice that ξ < 0 since the term inside the ln
is less than 1. Then fG(x) = c1x exp(ξ√
ln 2− lnx).
Differentiating fG(x) once results in
f ′(x) = c1 exp
(ξ
√ln(2) + ln(
1
x)
)(1 +
−ξ
2√
ln(2) + ln( 1x)
). (53)
By inspection, f ′G(x) > 0 for all 0 < x < 1 and thus fG(x) is a monotonically increasing function.
Differentiating fG(x) twice with respect to x gives
f ′′G(x) = −ξc1 exp
(ξ√
ln(2) + ln( 1x))
2√
ln(2) + ln( 1x)
1 +1
2(ln(2) + ln( 1x))− ξ
2√
ln(2) + ln( 1x)
. (54)
Since ξ < 0, it is evident that all the terms in (54) are strictly positive. Therefore, fG(·) is convex-∪.
C. Proof of Lemma 9: Bernoulli Chernoff bound
Proof: Recall that Zi are iid Bernoulli random variables with mean g ≤ 1/2.
Pr
(∑i(Zi − g)√
n≥ ε)
= Pr
(∑i(Zi − g)
n≥ ε)
(55)
October 29, 2010 DRAFT
26
where ε =√nε and so n = ε2/ε2. Therefore,
Pr
(∑i(Zi − g)√
n≥ ε)≤ [((1− g) + g exp(s))× exp(−s(g + ε))]n for all s ≥ 0. (56)
Choose s satisfying
exp(−s) =g
(1− g)×(
1
(g + ε)− 1
). (57)
It is safe to assume that g+ ε ≤ 1 since otherwise, the relevant probability is 0 and any bound will work.
Substituting (57) into (56) gives
Pr
(∑i(Zi − g)√
n≥ ε)≤ 2−
D(g+ε||g)
ε2ε2 .
This bound holds under the constraint ε2
ε2 = n. To obtain a bound that holds uniformly for all n, we fix
ε, and take the supremum over all the possible ε values.
Pr
(∑i(Zi − g)√
n≥ ε)≤ sup
0<ε≤1−gexp
(− ln(2)
D(g + ε||g)
ε2ε2)
≤ exp
(− ln(2)ε2 inf
0<ε≤1−g
D(g + ε||g)
ε2
),
giving us the desired bound.
APPENDIX B
PROOF OF THEOREM 2
We use Gallager’s technique from [15, Pg. 38] to upper bound the mutual information I(Xm1 ,Y
m1 )
under channel behavior G.
I(Xm1 ,Y
m1 ) = H(Ym
1 )−H(Ym1 |Xm
1 )
(a)
≤ mR+m(1−R)hb(gd)−mH(Yi|Xi)
= mR+m(1−R)hb(gd)−mhb(g). (58)
The crucial part of Gallager’s proof is used to derive (a) above, and is detailed in the following. The
output vector Ym1 can alternatively be specified by providing Y at any mR linearly independent positions
in the code (say YmR1 , and the syndrome vector S
m(1−R)1 . Now, using the chain rule and the fact that
conditioning reduces entropy,
H(Ym1 ) ≤ H(YmR
1 ) +H(Sm(1−R)1 ). (59)
The entropy of YmR1 ≤ mR, giving the first term in (58) (a) above. The entropy of the syndrome vector
Sm(1−R)1 can be upper bounded by
∑m(1−R)i=1 H(Si). Probability of Si being 1 is the probability that
October 29, 2010 DRAFT
27
there are odd number of errors in d output values that form the check set. Thus Pr(Si = 1) = gd. The
second term in (58) (a) follows. Now (42) changes to
1
kH(Bm
1 |Ym1 ) ≥ 1− mR+m(1−R)hb(gd)−mhb(g)
k
= hb(gd)−hb(gd)− hb(g)
R.
The rest of the proof is the same as that for Theorem 1.
For sparse-graph codes under message-passing decoding with known threshold parameter under the
scheduling algorithm chosen, if there exists a δ3 > 0 such that for channel parameter above the threshold,
〈Pe〉G ≥ δ3, then the same proof works by using 〈Pe〉G ≥ δ3 in (36).
APPENDIX C
PROOF OF THEOREM 3: LOWER BOUND ON 〈Pe〉 FOR AWGN CHANNELS
The AWGN case can be proved using an argument almost identical to the BSC case. Once again,
the focus is on the channel noise Z in the decoding neighborhoods [43]. Notice that Lemma 7 already
applies to this channel even if the power constraint only has to hold on average over all codebooks and
messages. Thus, all that is required is a counterpart to Lemma 8 giving a convex-∪ mapping from the
probability of a set of channel-noise realizations under a Gaussian channel with noise variance σ2G back
to their probability under the original channel with noise variance σ20 .
Lemma 10: Let A be a set of Gaussian channel-noise realizations zn1 such that PrG(A) = δ. Then
Pr(A) ≥ fG (δ) (60)
where
fG(δ) =δ
2exp
(−nD(σ2
G||σ20)−
√n
(3
2+ 2 ln
(2
δ
))(σ2G
σ20
− 1
)). (61)
Furthermore, fG(x) is a convex-∪ increasing function in δ for all values of σ2G ≥ σ2
0 .
In addition, the following bound is also convex-∪ whenever σ2G > σ2
0µ(n) with µ(n) as defined in
(12).
fL(δ) =δ
2exp
(−nD(σ2
G||σ20)− 1
2φ(n, δ)
(σ2G
σ20
− 1
))(62)
where φ(n, δ) is as defined in (15).
Proof: See Appendix C-A.
With Lemma 10 playing the role of Lemma 8, the proof for Theorem 3 proceeds identically to that
of Theorem 1.
October 29, 2010 DRAFT
28
It should be clear that similar arguments can be used to prove similar results for any additive-noise
models for continuous output communication channels. However, we do not believe that this will result
in the best possible bounds. Instead, even the bounds for the AWGN case seem suboptimal because we
are ignoring the possibility of a large deviation in the noise that happens to be locally aligned to the
codeword itself.
A. Proof of Lemma 10: a lower bound on 〈Pe,i〉 as a function of 〈Pe,i〉G
Proof: Consider the length-n set of G-typical additive noise given by
Tε,G =
zn1 :||zn1 ||2 − nσ2
G
n≤ ε. (63)
With this definition, (47) continues to hold in the Gaussian case.
There are two different Gaussian counterparts to Lemma 9. They are both expressed in the following
lemma.
Lemma 11: For Gaussian noise Zi with variance σ2G,
Pr
(1
n
n∑i=1
Z2i
σ2G
> 1 +ε
σ2G
)≤((
1 +ε
σ2G
)exp
(− ε
σ2G
))n
2
. (64)
Furthermore
Pr
(1
n
n∑i=1
Z2i
σ2G
> 1 +ε
σ2G
)≤ exp
(−√nε
4σ2G
)(65)
for all ε ≥ 3σ2G√n
.
Proof: See Appendix C-B.
To have Pr(T cε,G) ≤ δ2 , it suffices to pick any ε(δ, n) large enough.
So
Pr(A) ≥∫zn1∈A∩Tε,G
fP (zn1 )dzn1
=
∫zn1∈A∩Tε,G
fP (zn1 )
fG(zn1 )fG(zn1 )dzn1 . (66)
Consider the ratio of the two pdf’s for zn1 ∈ Tε,G
fP (zn1 )
fG(zn1 )=
(√σ2G
σ20
)nexp
(−‖zn1‖2
(1
2σ20
− 1
2σ2G
))≥ exp
(−(nσ2
G + nε(δ, n))
(1
2σ20
− 1
2σ2G
)+ n ln
(σGσ0
))= exp
(−ε(δ, n)n
2σ2G
(σ2G
σ20
− 1
)− nD(σ2
G||σ20)
)(67)
October 29, 2010 DRAFT
29
where D(σ2G||σ2
0) is the KL-divergence between two Gaussian distributions of variances σ2G and σ2
0
respectively. Substitute (67) back in (66) to get
Pr(A) ≥ exp
(−ε(δ, n)n
2σ2G
(σ2G
σ20
− 1
)− nD(σ2
G||σ20)
)∫zn1∈A∩Tε,G
fG(zn1 )dzn1
≥ δ
2exp
(−nD(σ2
G||σ20)− ε(δ, n)n
2σ2G
(σ2G
σ20
− 1
)). (68)
At this point, it is necessary to make a choice of ε(δ, n). If we are interested in studying the asymptotics
as n gets large, we can use (65). This reveals that it is sufficient to choose ε ≥ σ2G max
3√n,−4 ln(δ)−ln(2)√
n
.
A safe bet is ε = σ2G
3+4 ln( 2
δ)√
nor nε(δ, n) =
√n(3 + 4 ln(2
δ ))σ2G. Thus (50) holds as well with this choice
of ε(δ, n).
Substituting into (68) gives
Pr(A) ≥ δ
2exp
(−nD(σ2
G||σ20)−
√n
(3
2+ 2 ln
(2
δ
))(σ2G
σ20
− 1
)).
This establishes the desired fG(·) function from (61). To see that this function fG(x) is convex-∪
and increasing in x, define c1 = exp(−nD(σ2
G||σ20)−
√n(
32 + 2 ln (2)
) (σ2G
σ20− 1)− ln(2)
)and ξ =
2√n(σ2G
σ20− 1)> 0. Then fG(δ) = c1δ exp(ξ ln(δ)) = c1δ
1+ξ which is clearly monotonically increasing
and convex-∪ by inspection.
Attempting to use (64) is a little more involved. Let ε = εσ2G
for notational convenience. Then we must
solve (1 + ε) exp(−ε) = ( δ2)2
n . Substitute u = 1 + ε to get u exp(−u + 1) = ( δ2)2
n . This immediately
simplifies to −u exp(−u) = − exp(−1)( δ2)2
n . At this point, we can immediately verify that ( δ2)2
n ∈ [0, 1]
and hence by the definition of the Lambert W function in [38], we get u = −WL(− exp(−1)( δ2)2
n ). Thus
ε(δ, n) = −WL
(− exp(−1)(
δ
2)
2
n
)− 1. (69)
Substituting this into (68) immediately gives the desired expression (62). All that remains is to verify
the convexity-∪. Let v = 12
(σ2G
σ20− 1)
. As above, fL(δ) = δc2 exp(−nvε(δ, n)). The derivatives can be
taken using very tedious manipulations involving the relationship W ′L(x) = WL(x)x(1+WL(x)) from [38] and
can be verified using computer-aided symbolic calculation. In our case −ε(δ, n) = (WL(x) + 1) and so
this allows the expressions to be simplified.
f ′L(δ) = c2 exp(−nvε)(
2v + 1 +2v
ε
). (70)
Notice that all the terms above are positive and so the first derivative is always positive and the function
is increasing in δ. Taking another derivative gives
f ′′L(δ) = c22v(1 + ε) exp(−nvε)
δε
(1 + 4v +
4v
ε− 4
nε− 2
nε2
). (71)
October 29, 2010 DRAFT
30
Recall from (69) and the properties of the Lambert WL function that ε is a monotonically decreasing
function of δ that is +∞ when δ = 0 and goes down to 0 at δ = 2. Look at the term in brackets above
and multiply it by the positive nε2. This gives the quadratic expression
(4v + 1)nε2 + 4(vn− 1)ε− 2. (72)
This (72) is clearly convex-∪ in ε and negative at ε = 0. Thus it must have a single zero-crossing for
positive ε and be strictly increasing there. This also means that the quadratic expression is implicitly a
strictly decreasing function of δ. It thus suffices to just check the quadratic expression at δ = 1 and make
sure that it is non-negative. Evaluating (69) at δ = 1 gives ε(1, n) = T (n) where T (n) is defined in (13).
It is also clear that (72) is a strictly increasing linear function of v and so we can find the minimum value
for v above which (72) is guaranteed to be non-negative. This will guarantee that the function fL is convex-
∪. The condition turns out to be v ≥ 2+4T−nT 2
4nT (T+1) and hence σ2G = σ2
0(2v+ 1) ≥ σ2G
2 (1 + 1T+1 + 4T+2
nT (T+1)).
This matches up with (12) and hence the Lemma is proved.
B. Proof of Lemma 11: Chernoff bound for Gaussian noise
Proof: The sum∑n
i=1Z2i
σ2G
is a standard χ2 random variables with n degrees of freedom.
Pr
(1
n
n∑i=1
Z2i
σ2G
> 1 +ε
σ2G
)(a)
≤ infs>0
(exp(−s(1 + ε
σ2G
))√
1− 2s
)n(b)
≤(√
1 +ε
σ2G
exp
(− ε
2σ2G
))n(73)
=
((1 +
ε
σ2G
)exp
(− ε
σ2G
))n
2
(74)
where (a) follows using standard moment generating functions for χ2 random variables and Chernoff
bounding arguments and (b) results from the substitution s = ε2(σ2
G+ε) . This establishes (64).
For tractability, the goal is to replace (73) with a exponential of an affine function of εσ2G
. For notational
convenience, let ε = εσ2G
. The idea is to bound the polynomial term√
1 + ε with an exponential as long
as ε > ε∗.
Let ε∗ = 3√n
and let K = 12 −
14√n
. Then it is clear that
√1 + ε ≤ exp(Kε) (75)
October 29, 2010 DRAFT
31
as long as ε ≥ ε∗. First, notice that the two agree at ε = 0 and that the slope of the concave-∩ function√
1 + ε there is 12 . Meanwhile, the slope of the convex-∪ function exp(Kε) at 0 is K < 1
2 . This means
that exp(Kε) starts out below√
1 + ε. However, it has crossed to the other side by ε = ε∗. This can
be verified by taking the logs of both sides of (75) and multiplying them both by 2. Consider the LHS
evaluated at ε∗ and lower-bound it by a third-order power-series expansion
ln
(1 +
3√n
)≤ 3√
n− 9
2n+
9
n3/2.
meanwhile the RHS of (75) can be dealt with exactly:
2Kε∗ =
(1− 1
2√n
)3√n
=3√n− 3
2n.
For n ≥ 9, the above immediately establishes (75) since 92n −
32n = 3
n ≥9
n√
9. The cases n =
1, 2, 3, 4, 5, 6, 7, 8 can be verified by direct computation. Using (75), for ε > ε∗ we have:
Pr(T cε,G) ≤(
exp(Kε) exp
(−1
2ε
))n= exp
(−√n
4ε
). (76)
APPENDIX D
APPROXIMATION ANALYSIS FOR THE BSC
A. Lemma 2
Proof: (22) and (25) are obvious from the concave-∩ nature of the binary entropy function and its
values at 0 and 12 .
hb(x) = x log2 (1/x) + (1− x) log2 (1/(1− x))
(a)
≤ 2x log2 (1/x) = 2x ln(1/x)/ ln(2)
(b)
≤ 2xd
(1
x1/d− 1
)/ ln(2) ∀d > 1
≤ 2x1−1/dd/ ln(2).
Inequality (a) follows from the fact that xx < (1−x)1−x for x ∈ (0, 12). For inequality (b), observe that
ln(x) ≤ x − 1. This implies ln(x1/d) ≤ x1/d − 1. Therefore, ln(x) ≤ d(x1/d − 1) for all x > 0 since1d ≤ 1 for d ≥ 1.
October 29, 2010 DRAFT
32
The bound on h−1b (x) follows immediately by identical arguments.
B. Lemma 3
Proof: First, we investigate the small gap asymptotics for δ(g∗), where g∗ = p+ gapr and r < 1.
δ(g∗) = 1− C(g∗)
R
= 1− C(p+ gapr)
C(p)− gap
= 1−C(p)− gaprh′b(p) + o(gapr)
C(p)(1− gap/C(p))
= 1−(
1−h′b(p)
C(p)gapr + o(gapr)
)× (1 + gap/C(p) + o(gap))
=h′b(p)
C(p)gapr + o(gapr). (77)
Plugging (77) into (25) and using Lemma 2 gives
log2
(h−1b (δ(g∗))
)≤ log2
(h′b(p)
2C(p)gapr + o(gapr)
)(78)
= log2
(h′b(p)
2C(p)
)+ r log2 (gap) + o(1) (79)
= r log2 (gap)− 1 + log2
(h′b(p)
C(p)
)+ o(1) (80)
and this establishes the upper half of (26).
To see the lower half, we use (24):
log2
(h−1b (δ(g∗))
)≥ d
d− 1
(log2 (δ(g∗)) + log2
(ln 2
2d
))=
d
d− 1
(log2
(h′b(p)
C(p)gapr + o(gapr)
)+ log2
(ln 2
2d
))=
d
d− 1
(r log2 (gap) + log2
(h′b(p)
C(p)
)+ o(1) + log2
(ln 2
2d
))=
d
d− 1r log2 (gap)− 1 +K1 + o(1)
where K1 = dd−1
(log2
(h′b(p)C(p)
)+ log2
(ln(2)d
))and d > 1 is arbitrary.
C. Lemma 4
Proof:
D(g∗||p) = D(p+ gapr||p)
= 0 + 0× gapr +1
2
gap2r
p(1− p) ln(2)+ o(gap2r)
October 29, 2010 DRAFT
33
since D(p||p) = 0 and the first derivative is also zero. Simple calculus shows that the second derivative
of D(p+ x||p) with respect to x is log2(e)(p+x)(1−p−x) .
D. Lemma 5
Proof:
log2
(g∗(1− p)p(1− g∗)
)= log2
(1− pp
)+ log2
(g∗
1− g∗
)= log2
(1− pp
)+ log2 (g∗)− log2 (1− g∗)
= log2
(1− pp
)+ log2 (p+ gapr)− log2 (1− p− gapr)
= log2
(1− pp
)+ log2 (p) + log2
(1 +
gapr
p
)− log2 (1− p)− log2
(1− gapr
1− p
)=
gapr
p ln(2)+
gapr
(1− p) ln(2)+ o(gapr)
=gapr
p(1− p) ln(2)+ o(gapr)
=gapr
p(1− p) ln(2)(1 + o(1)).
E. Lemma 6
Proof: Expand (3):
ε =
√1
K(p+ gapr)
√√√√log2
(2
h−1b (δ(G))
)
=
√1
ln(2)K(p+ gapr)
√ln(2)− ln
(h−1b (δ(G))
)≥
√1
ln(2)K(p+ gapr)
√ln(2)− r ln(gap) + ln(2)−K2 ln(2) + o(1)
=
√1
ln(2)K(p+ gapr)
√r ln
(1
gap
)+ (2−K2) ln(2) + o(1)
=
√1
ln(2)K(p+ gapr)
√r ln
(1
gap
)(1 + o(1)).
October 29, 2010 DRAFT
34
and similarly
ε =
√1
ln(2)K(p+ gapr)
√ln(2)− ln
(h−1b (δ(G))
)≤
√1
ln(2)K(p+ gapr)
√(2−K2) ln(2) +
d
d− 1r ln
(1
gap
)+ o(1)
=
√rd
ln(2)(d− 1)K(p+ gapr)
√ln
(1
gap
)(1 + o(1)).
All that remains is to show that K(p + gapr) converges to K(p) as gap → 0. Examine (4). The
continuity of D(g+η||g)η2 is clear in the interior η ∈ (0, 1 − g) and for g ∈ (0, 1
2). All that remains is
to check the two boundaries. limη→0D(g+η||g)
η2 = 1g(1−g) ln 2 by the Taylor expansion of D(g + η||g) as
done in the proof of Lemma 4. Similarly, limη→1−gD(g+η||g)
η2 = D(1||g) = log2
(1
1−g
). Since K is
a minimization of a continuous function over a compact set, it is itself continuous and thus the limit
limgap→0K(p+ gapr) = K(p). Converting from natural logarithms to base 2 completes the proof.
F. Approximating the solution to the quadratic formula
In (21), for g = g∗ = p+ gapr,
a = D(g∗||p)
b = ε log2
(g∗(1− p)p(1− g∗)
)c = log2 (〈Pe〉)− log2
(h−1b (δ(g∗))
)+ 1.
The first term, a, is approximated by Lemma 4 so
a = gap2r
(1
2p(1− p) ln(2)+ o(1)
). (81)
Applying Lemma 5 and Lemma 6 reveals
b ≤
√rd
(d− 1)K(p)
√log2
(1
gap
)gapr
p(1− p) ln(2)(1 + o(1))
=1
p(1− p) ln(2)
√rd
(d− 1)K(p)
√gap2r log2
(1
gap
)(1 + o(1)) (82)
b ≥ 1
p(1− p) ln(2)
√r
K(p)
√gap2r log2
(1
gap
)(1 + o(1)). (83)
October 29, 2010 DRAFT
35
The third term, c, can be bounded similarly using Lemma 3 as follows,
c = β log2 (gap)− log2
(h−1b (δ(g∗))
)+ 1
≤ (d
d− 1r − β) log2
(1
gap
)+K3 + o(1) (84)
Also c ≥ (r − β) log2
(1
gap
)+K4 + o(1). (85)
for a pair of constants K3,K4. Thus, for gap small enough and r < β(d−1)d , we know that c < 0.
The lower bound on√n is thus
√n ≥
√b2 − 4ac− b
2a
=b
2a
(√1− 4ac
b2− 1
). (86)
Plugging in the bounds (81) and (83) reveals that
b
2a≥
√log2
(1
gap
)gapr
√r
K(p)(1 + o(1)) (87)
Similarly, using (81), (83), (84), we get
4ac
b2≤
4gap2r(
1p(1−p) ln(2)
)×((
dd−1r − β
)log2
(1
gap
)+K3
)(1 + o(1))(
1p(1−p) ln(2)
)2r
K(p)gap2r log2
(1
gap
)(1 + o(1))
= 4p(1− p)K(p) ln(2)
(d
d− 1− β
r
)+ o(1). (88)
This tends to a negative constant since r < β(d−1)d .
Plugging (87) and (88) into (86) gives:
n ≥
√ r
K(p)
log2
(1
gap
)gapr
(1 + o(1))
(√1 + 4p(1− p) ln(2)K(p)
(β
r− d
d− 1
)+ o(1)− 1
)2
=
log2
(1
gap
)gapr
2
1
K(p)
(√r + 4p(1− p) ln(2)K(p)
(β − rd
d− 1
)−√r
)2
(1 + o(1))
= Ω
((log2 (1/gap))2
gap2r
)(89)
for all r < min
βd
d−1
, 1
. By taking d arbitrarily large, we arrive at Theorem 4 for the BSC (the term
in the numerator can be absorbed in the ν for small enough gap).
October 29, 2010 DRAFT
36
APPENDIX E
APPROXIMATION ANALYSIS FOR THE AWGN CHANNEL
Taking logs on both sides of (10) for a fixed test channel G,
ln(〈Pe〉) ≥ ln(h−1b (δ(G))
)− ln(2)− nD(σ2
G||σ20)−
√n
(3
2+ 2 ln 2− 2 ln
(h−1b (δ(G)
))(σ2G
σ20
− 1
),
(90)
Rewriting this in the standard quadratic form using
a = D(σ2G∗ ||σ2
0), (91)
b =
(3
2+ 2 ln 2− 2 ln
(h−1b (δ(G))
))(σ2G
σ20
− 1
), (92)
c = ln(〈Pe〉)− ln(h−1b (δ(G))
)+ ln(2). (93)
it suffices to show that the terms exhibit behavior as gap → 0 similar to their BSC counterparts.
For Taylor approximations, we use the channel G∗, with corresponding noise variance σ2G∗ = σ2
0 + ζ,
where
ζ = gapr(
2σ20(PT + σ2
0)
PT
). (94)
Lemma 12: For small enough gap, for ζ as in (94), if r < 1 then C(G∗) < R.
Proof: Since C − gap = R > C(G∗), we must satisfy
gap ≤ 1
2log2
(1 +
PTσ2
0
− 1
2log2
(1 +
PTσ2
0
+ ζ
)).
So the goal is to lower bound the RHS above to show that (94) is good enough to guarantee that this is
bigger than the gap. So
=1
2
(log2
(1 +
ζ
σ20
− log2
(1 +
ζ
σ20
+ PT
)))=
1
2
(log2
(1 + 2gapr(1 +
σ20
PT)
)− log2
(1 + 2gapr
σ20
PT
))≥ 1
2
(cs
ln(2)2gapr(1 +
σ20
PT)− 1
ln(2)2gapr
σ20
PT
)= gapr
1
ln(2)
(cs − (1− cs)
σ20
PT
). (95)
For small enough gap, this is a valid lower bound as long as cs < 1. Choose cs so that 1 < cs <σ2
0
PT+σ20.
For ζ as in (94), the LHS is gaprK and thus clearly having r < 1 suffices for satisfying (95) for small
enough gap. This is because the derivative of gapr tends to infinity as gap → 0.
In the next Lemma, we perform the approximation analysis for the terms inside (91), (92) and (93).
October 29, 2010 DRAFT
37
Lemma 13: Assume that σ2G∗ = σ2
0 + ζ where ζ is defined in (94).
(a)σ2G∗
σ20
− 1 = gapr(
2(PT + σ20)
PT
). (96)
(b)
ln(δ(G∗)) = r ln(gap) + o(1)− ln(C). (97)
(c)
ln(h−1b (δ(G∗))) ≥ d
d− 1r ln(gap) + c2, (98)
for some constant c2 that is a function of d.
ln(h−1b (δ(G∗))) ≤ r ln(gap) + c3, (99)
for some constant c3.
(d)
D(σ2G∗ ||σ2
0) =(PT + σ2
0)2
P 2T
gap2r(1 + o(1)). (100)
Proof: (a) Immediately follows from the definitions and (94).
(b) We start with simplifying δ(G∗)
δ(G∗) = 1− C(G∗)
R
=C − gap − 1
2 log2
(1 + PT
σ2G∗
)C − gap
=
12 log2
(1 + PT
σ20
)− 1
2 log2
(1 + PT
σ20+ζ
)− gap
C − gap
=
12 log2
((σ
20+PTσ2
0)( σ2
0+ζPT+σ2
0+ζ ))− gap
C − gap
=
12 log2
(1 + ζ
σ20
)− 1
2 log2
(1 + ζ
PT+σ20
)− gap
C − gap
=
12ζσ2
0− 1
2ζ
PT+σ20
+ o(ζ)− gap
C − gap
=
12
(ζPT
σ20(PT+σ2
0) + o(ζ))− gap
C − gap
=1
C
(1
2
(gapr
2σ20(PT + σ2
0)
PT
PTσ2
0(P + σ20)
+ o(gapr)
)− gap
)(1− gap
C+ o(gap))
=gapr
C(1 + o(1)).
October 29, 2010 DRAFT
38
Taking ln(·) on both sides, the result is evident.
(c) follows from (b) and Lemma 2.
(d) comes from the definition of D(σ2G∗ ||σ2
0) followed immediately by the expansion ln(σ2G∗/σ
20) =
ln(1 + ζ/σ20) = ζ
σ20− 1
2( ζσ2
0)2 + o(gap2r). All the constant and first-order in gapr terms cancel since
σ2G∗
σ20
= 1 + ζσ2
0. This gives the result immediately.
Now, we can use Lemma 13 to approximate (91), (92) and (93).
a =(PT + σ2
0)2
P 2T
gap2r(1 + o(1)) (101)
b =
(3
2+ 2 ln 2− 2 ln(h−1
b (δ(G)))
)gapr
2(PT + σ20)
PT
≤ 2d(PT + σ20)
(d− 1)PTr ln
(1
gap
)gapr(1 + o(1)) (102)
b ≥ 2(PT + σ20)
PTr ln
(1
gap
)gapr(1 + o(1)) (103)
c ≤(
d
d− 1r − β
)ln
(1
gap
)(1 + o(1)) (104)
c ≥ (r − β) ln
(1
gap
)(1 + o(1)). (105)
Therefore, in parallel to (87), we have for the AWGN bound
b
2a≥ rPT
(PT + σ20)
ln(
1gap
)gapr
(1 + o(1)). (106)
Similarly, in parallel to (88), we have for the AWGN bound
4ac
b2≤ (1 + o(1))
1
r2
(d
d− 1r − β
)1
ln(
1gap
) .This is negative as long as r < β(d−1)
d , and so for every cS < 12 for small enough gap, we know that√
1− 4ac
b2− 1 ≥ cs
1
r2
(β − d
d− 1r
)1
ln(
1gap
)(1 + o(1)).
Combining this with (106) gives the bound:
n ≥ (1 + o(1))
cs 1
r2
(β − d
d− 1r
)1
ln(
1gap
) rPTPT + σ2
0
ln(
1gap
)gapr
2
(107)
= (1 + o(1))
(cs
PTr(PT + σ2
0)(β − d
d− 1r)
(1
gapr
))2
. (108)
October 29, 2010 DRAFT
39
Since this holds for all 0 < cs <12 and all r < min
1, β(d−1)
d
for all d > 1, Theorem 4 for AWGN
channels follows.
APPENDIX F
LOWER BOUND ON AVERAGE NEIGHBORHOOD SIZE
of Theorem 5: From Lemma 8, for any bit i of neighborhood size ni,
〈Pe,i〉 ≥ 〈Pe,i〉G2−niD(g||p)(p(1− g)
g(1− p)
)ε(〈Pe,i〉G)√ni
. (109)
Now, the average neighborhood n = 1k
∑ki=1 ni. Applying Markov’s inequality on the counting measure,
the fraction of bits that have neighborhood size larger than ηni for some η is smaller than nηn = 1
η . Thus
there exists a set S, with |S| ≥ k(
1− 1η
)such that ni ≤ ηn for all i ∈ S. Thus,
〈Pe,i〉 ≥ 〈Pe,i〉G2−ηnD(g||p)(p(1− g)
g(1− p)
)ε(〈Pe,i〉G)√ηn
, (110)
for all i ∈ S. Also notice that since 1k
∑ki=1 〈Pe,i〉G ≥ h−1
b (δ(G)), there exists a set T of size |T | ≥h−1b (δ(G))
2 such that for all i ∈ T , 〈Pe,i〉 ≥ h−1b (δ(G))
2 (this follows readily from the observation that
〈Pe,i〉 ≤ 1). Choose η satisfying
k
(1− 1
η
)= k
(1−
h−1b (δ(G))
4
), (111)
i.e., η = 4h−1b (δ(G))
. Then,
|S ∪ T |+ |S ∩ T | = |S|+ |T |
≥ k
(1− 1
η
)+ k
h−1b (δ(G))
2
= k
(1−
h−1b (δ(G))
4
)+ k
h−1b (δ(G))
2
= k
(1 +
h−1b (δ(G))
4
).
Thus,
|S ∩ T | ≥ |S|+ |T | − |S ∪ T | ≥ kh−1b (δ(G))
4, (112)
October 29, 2010 DRAFT
40
since |S ∪ T | ≤ k. Now,
〈Pe〉 =1
k
k∑i=1
〈Pe,i〉
≥ 1
k
∑i∈S∩T
〈Pe,i〉G2−niD(g||p)(p(1− g)
g(1− p)
)ε(〈Pe,i〉G)√ni
≥ 1
k
∑i∈S∩T
〈Pe,i〉G2−ηnD(g||p)(p(1− g)
g(1− p)
)ε(〈Pe,i〉G)√ηn
(a)
≥ |S ∩ T |k
h−1b (δ(G))
22−ηnD(g||p)
(p(1− g)
g(1− p)
)ε(h−1b
(δ(G))
2)√ηn
(b)
≥(h−1b (δ(G))
)28
2− 4n
h−1b
(δ(G))D(g||p)
(p(1− g)
g(1− p)
)ε(h−1b
(δ(G))
2
)√4n
h−1b
(δ(G))
.
where (a) follows from the monotonicity of fG(·) in Lemma 8, and the observation that 〈Pe,i〉G ≥h−1b (δ(G))
2 for all i ∈ T , and hence for all i ∈ S ∪ T . Inequality (b) follows from (112).
For performing the approximation analysis, we take log(·) of both sides of the lower bound,
log2 (〈Pe〉) ≥ 2 log2
(h−1b (δ(G))
)−3− 4n
h−1b (δ(G))
D(g||p)−ε
(h−1b (δ(G))
2
)√4n
h−1b (δ(G))
log2
(g(1− p)p(1− g)
)(113)
Choosing 〈Pe〉 = gapβ ,
4n
h−1b (δ(G))
D(g||p)+ε
(h−1b (δ(G))
2
)√4n
h−1b (δ(G))
log2
(g(1− p)p(1− g)
)+β log2 (gap)−2 log2
(h−1b (δ(G))
)+3 ≥ 0.
(114)
In this quadratic equation, a = D(g∗||p), b = ε log2
(g(1−p)p(1−g)
), and c = β log2 (gap)−2 log2
(h−1b (δ(G))
)+
3, while the variable is 4nh−1b (δ(G))
. Again, the constant 3 can be ignored at small gap.
c = β log2 (gap)− 2 log2
(h−1b (δ(g∗))
)+ 3
≤(
2d
d− 1r − β
)log2
(1
gap
)+K3 + o(1) (115)
Also c ≥ (2r − β) log2
(1
gap
)+K4 + o(1). (116)
for a pair of constants K3,K4. Thus, for gap small enough and r < β(d−1)2d , we know that c < 0.
Following the analysis in Theorem 4, we obtain
4n
h−1b (δ(G))
≥ Ω
((log2 ((1/gap)))2
gap2r
)(117)
for all r < min
β2d
d−1
, 1
. From Lemma 3, log2
(h−1b (δ(G))
)≥ d
d−1r log2 (gap)−1 +K1 + o(1). Thus,
by taking d arbitrarily large, we arrive at the approximation result.
October 29, 2010 DRAFT
41
REFERENCES
[1] C. Shannon, R. Gallager, and E. Berlekamp, “Lower bounds to error probability for coding on discrete memoryless channels.
I,” Information and Control, vol. 10, no. 1, pp. 65–103, 1967.
[2] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp, “Lower bounds: part I,” Information and Control, vol. 25, no. 3, pp.
222–266, Jul. 1974.
[3] R. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory, vol. 11,
no. 1, pp. 3–18, 2002.
[4] A. Valembois and M. Fossorier, “Sphere-packing bounds revisited for moderate block lengths,” IEEE Trans. Inform. Theory,
vol. 50, no. 12, pp. 2998–3014, 2004.
[5] G. Wiechman and I. Sason, “An improved sphere-packing bound for finite-length codes over symmetric memoryless
channels,” IEEE Trans. Inform. Theory, vol. 54, no. 5, pp. 1962–1990, 2008.
[6] Y. Polyanskiy, H. Poor, and S. Verdu, “Dispersion of Gaussian channels,” in IEEE International Symposium on Information
Theory (ISIT). IEEE, 2009, pp. 2204–2208.
[7] P. Elias, “Coding for noisy channels,” IRE National Convention Record, vol. 4, pp. 37–46, 1955.
[8] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Inform.
Theory, vol. 13, no. 2, pp. 260–269, Apr. 1967.
[9] G. D. Forney, “Convolutional codes II. maximum-likelihood decoding,” Information and Control, vol. 25, no. 3, pp. 222–
266, Jul. 1974.
[10] I. M. Jacobs and E. R. Berlekamp, “A lower bound to the distribution of computation for sequential decoding,” IEEE
Trans. Inform. Theory, vol. 13, no. 2, pp. 167–174, Apr. 1967.
[11] G. D. Forney, “Convolutional codes III. sequential decoding,” Information and Control, vol. 25, no. 3, pp. 267–297, Jul.
1974.
[12] F. Jelinek, “Upper bounds on sequential decoding performance parameters,” IEEE Trans. Inform. Theory, vol. 20, no. 2,
pp. 227–239, Mar. 1974.
[13] E. Arikan, “Sequential decoding for multiple access channels,” Ph.D. dissertation, Massachusetts Institute of Technology,
Cambridge, MA, 1985.
[14] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press, 2007.
[15] R. G. Gallager, “Low-density parity-check codes,” Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge,
MA, 1960.
[16] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,”
IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.
[17] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533–547, Sep.
1981.
[18] I. Sason and R. L. Urbanke, “Parity-check density versus performance of binary linear block codes: New bounds and
applications,” IEEE Trans. Inform. Theory, vol. 53, no. 2, pp. 550–579, 2007.
[19] H. D. Pfister, I. Sason, and R. Urbanke, “Capacity-achieving ensembles for the binary erasure channel with bounded
complexity,” IEEE Trans. Inform. Theory, vol. 51, no. 7, pp. 2352–2379, Jul. 2005.
October 29, 2010 DRAFT
42
[20] H. D. Pfister and I. Sason, “Accumulaterepeataccumulate codes: Capacity-achieving ensembles of systematic codes for the
erasure channel with bounded complexity,” IEEE Trans. Inform. Theory, vol. 53, no. 6, pp. 2088–2115, Jun. 2007.
[21] C.-H. Hsu and A. Anastasopoulos, “Capacity-achieving codes for noisy channels with bounded graphical complexity and
maximum likelihood decoding,” submitted to IEEE Transactions on Information Theory, 2006.
[22] A. Khandekar, “Graph-based codes and iterative decoding,” Ph.D. dissertation, California Institute of Technology, Pasadena,
CA, 2002.
[23] A. Khandekar and R. McEliece, “On the complexity of reliable communication on the erasure channel,” in IEEE
International Symposium on Information Theory, 2001.
[24] I. Sason, “Bounds on the number of iterations for turbo-like ensembles over the binary erasure channel,” submitted to
IEEE Transactions on Information Theory, 2007.
[25] J. Jiang and K. Narayanan, “Iterative soft decision decoding of Reed Solomon codes based on adaptive parity check
matrices,” in Proceedings of the 2004 IEEE Symposium on Information Theory, Chicago, Illinois, Jun. 2004, p. 261.
[26] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input
memoryless channels,” Jul. 2008. [Online]. Available: http://arxiv.org/abs/cs/08073917
[27] P. Grover and A. Sahai, “Green codes: Energy-efficient short-range communication,” in Proceedings of the 2008 IEEE
Symposium on Information Theory, Toronto, Canada, Jul. 2008.
[28] ——, “Time-division multiplexing for green broadcasting,” in Proceedings of the 2009 IEEE Symposium on Information
Theory, Seoul, South Korea, Jul. 2009.
[29] P. Grover, “Bounds on the tradeoff between rate and complexity for sparse-graph codes,” in 2007 IEEE Information Theory
Workshop (ITW), Lake Tahoe, CA, 2007.
[30] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: John Wiley, 1971.
[31] D. Burshtein, M. Krivelevich, S. Litsyn, and G. Miller, “Upper bounds on the rate of LDPC codes,” IEEE Trans. Inform.
Theory, vol. 48, Sep. 2002.
[32] I. Sason and R. Urbanke, “Parity-check density versus performance of binary linear block codes over memoryless symmetric
channels,” IEEE Trans. Inform. Theory, vol. 49, no. 7, pp. 1611–1635, Jul. 2003.
[33] P. Grover and A. K. Chaturvedi, “Upper bounds on the rate of LDPC codes for a class of finite-state Markov channels,”
IEEE Trans. Inform. Theory, vol. 53, pp. 794–804, Feb. 2007.
[34] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithmns, 1989.
[35] C. D. Thompson, “Area-time complexity for VLSI,” in STOC ’79: Proceedings of the eleventh annual ACM symposium
on Theory of computing. New York, NY, USA: ACM, 1979, pp. 81–88.
[36] A. Sahai, “Why block-length and delay behave differently if feedback is present,” IEEE Trans. Inform. Theory, pp. 1860
– 1886, May 2008. [Online]. Available: http://arxiv.org/abs/cs/0610138
[37] M. S. Pinsker, “Bounds on the probability and of the number of correctable errors for nonblock codes,” Problemy Peredachi
Informatsii, vol. 3, no. 4, pp. 44–55, Oct./Dec. 1967.
[38] R. M. Corless, G. H. Gonnet, D. E. G. Hare, and D. E. Knuth, “On the Lambert W function,” Advances in Computational
Mathematics, vol. 5, pp. 329–359, 1996.
[39] H. Palaiyanur, personal communication, 2007.
[40] C. D. Thompson, “A complexity theory for VLSI,” Ph.D. dissertation, Pittsburgh, PA, USA, 1980.
[41] P. Grover, A. Sahai, and S. Y. Park, “The finite-dimensional witsenhausen counterexample,” in Proceedings of the 7th
October 29, 2010 DRAFT
43
International Symposium on Modeling and Opitmization in Ad-Hoc and Wireless Networks (WiOpt), Workshop on Control
over Communication Channels (ConCom), Seoul, Korea, Jun. 2009.
[42] E. A. Haroutunian, “Lower bound for error probability in channels with feedback,” Problemy Peredachi Informatsii, vol. 13,
no. 2, pp. 36–44, 1977.
[43] C. Chang, personal communication, Nov. 2007.
October 29, 2010 DRAFT