Download - Hossein Taghavi : Codes on Graphs
Codes on Graphs: Introduction and Recent Advances
Mohammad Hossein TaghaviIn Collaboration with Prof. Paul H. Siegel
University of California, San Diego
E-mail: [email protected]
Hossein Taghavi 2
Outline
1. Introduction to Coding Theory
• Shannon’s Channel Coding Theorem
• Error-Correcting Codes – State-of-the-Art
2. Low-Density Parity-Check Codes
• Design and Message-Passing Decoding
• Performance
3. Linear Programming Decoding
• LP Relaxation
• Properties
• Improvements
4. Conclusion and Open Problems
Hossein Taghavi 3
Outline
1. Introduction to Coding Theory
• Shannon’s Channel Coding Theorem
• Error-Correcting Codes – State-of-the-Art
2. Low-Density Parity-Check Codes
• Design and Message-Passing Decoding
• Performance
3. Linear Programming Decoding
• LP Relaxation
• Properties
• Improvements
4. Conclusion and Open Problems
Hossein Taghavi 4
A Noisy Communication System
INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION
MESSAGE
SIGNAL RECEIVED SIGNAL
MESSAGE
NOISE SOURCE
CHANNEL
Hossein Taghavi 5
Common Channels
0 0
1 1
1-p
1-p
p
p
•Binary Erasure Channel BEC(ε)
•Binary Symmetric Channel BSC(p)
PP
)1|(yf)1|( yf
?ε
ε
1- ε
1- ε
0
1
0
1
•Additive White Gaussian Noise Channel
Hossein Taghavi 6
Coding
Code rate: nkR
kxxx ,,, 21 x
kxxx ˆ,,ˆ,ˆˆ21 x
Source Encoder
DecoderSink
Channel
nccc ,,, 21 c
nyyy ,,, 21 y
•We add redundancy to the message for protection against noise.
• Coding Theory:
1. Design Good Codes (mappings)
2. Design Good Encoding and Decoding Algorithms
Hossein Taghavi 7
Shannon’s Coding Theorem
• Every communication channel is characterized by a single number C, called the channel capacity.
• If C is a code with rate R > C, then the probability of error in decoding this code is bounded away from 0.
• For any information rate R < C and any δ > 0, there exists a code C of length nδ and rate R, such that the probability of error in maximum likelihood decoding of this code is at most δ.
CRdef
use channel
bitsn informatio #
•Bottomline: Reliable communication is possible if and only if
Hossein Taghavi 8
Designing Binary Codes
• Coding generally involves mapping each of the 2k vectors in {0,1}k to some vector (codeword) in {0,1}n.
• A good mapping places the codewords at the maximum possible distances.
• The proof of Shannon’s theorem is based on (long) random codes and optimal decoding.– i.e. a random mapping from {0,1}k to {0,1}n
– The decoder uses a look-up table (practically infeasible)
• Algebraic coding theory studies highly-structured codes – e.g. Reed-Solomon codes have efficient decoders but cannot get very close to
the capacity of binary channels.
Hossein Taghavi 9
State-of-the-Art
• Solution– Long, structured, “pseudorandom” codes– Practical, near-optimal decoding algorithms
• Examples– Turbo codes (1993)– Low-density parity-check (LDPC) codes (1960, 1999)
• State-of-the-art– Turbo codes and LDPC codes have brought Shannon limits to within
reach on a wide range of channels.
Hossein Taghavi 10
Evolution of Coding Technology
from Trellis and Turbo Coding, Schlegel and Perez, IEEE Press, 2004
LDPC codes
Hossein Taghavi 11
Outline
1. Introduction to Coding Theory
• Shannon’s Channel Coding Theorem
• Error-Correcting Codes – State-of-the-Art
2. Low-Density Parity-Check Codes
• Design and Message-Passing Decoding
• Performance
3. Linear Programming Decoding
• LP Relaxation
• Properties
• Improvements
4. Conclusion and Open Problems
Hossein Taghavi 12
Binary Linear Codes
• Codewords are chosen to satisfy a number of (binary) linear constraints.
• Parameters of binary linear block code C
k = number of information bits
n = number of code bits
R = k / n
dmin = minimum distance
• There are many ways to describe C
– Codebook (list)
– Parity-check matrix / generator matrix
– Graphical representation (“Tanner graph”)
Hossein Taghavi 13
• A binary linear code C is the collection of points x(i) in {0,1}n that satisfy
Hmxn x(i) = 0 mod 2.
• If H is full rank, there will be 2n-m codewords.
• The code can be described by a Tanner Graph:
• Neighborhood Nj: The set of nodes directly connected to j.
• Degree dj : The size of the neighborhood, Nj.
• Example:
Linear Block Codes on Graphs
x1 x2 x3 x4 x5 x6 x7
m check nodes
n variable nodes
x1+ x3+ x6= 0mod 2
2 mod
0
0
0
0
0110010
1011001
1001110
0100101
7
6
5
4
3
2
1
x
x
x
x
x
x
x
xH
Hossein Taghavi 14
Low-Density Parity-Check Codes
• Proposed by Gallager (1960)
• “Sparseness” of matrix and graph descriptions
– Number of 1’s in H grows linearly with block length
– Number of edges in Tanner graph grows linearly with block length
• “Randomness” of construction in:
– Placement of 1’s in H
– Connectivity of variable and check nodes
• Iterative, message-passing decoder
– Simple “local” decoding at nodes
– Iterative exchange of information (message-passing)
• Other families of graph-based codes:
– Repeat Accumulate Codes
– Fountain Codes
Hossein Taghavi 15
Code Construction
• LDPC codes are generally constructed randomly, according to certain distributions on the degrees of variable and check nodes.
– Once the node degrees are selected, connections are made randomly
1. Regular LDPC: Each check node has degree dc, and each variable
node has degree dv
2. Irregular LDPC: The degrees of variable and check nodes need not be constant. (generalization of regular LDPC)
• Ensemble defined by “node degree distribution” functions.
vd
i
ii xx
1
)(
cd
i
ii xx
2
)(
ii
degree of nodes
variableofnumber i
i
degree of nodes
check ofnumber
Hossein Taghavi 16
Optimal Bit Decoding
• Consider transmission of binary inputs X{1} over a memoryless
channel using linear code C.
• Assume codewords are transmitted equiprobably.• Maximum a posteriori (MAP) bit decoding rule minimizes bit error
probability:
• where is 1 if x is a codeword of C, and 0 otherwise.
)()|(max arg
)()|(max arg
)|(max arg
)|(maxarg)(ˆ
~ 1|
1
~|
1
~|
1
|1
xfxyP
xPxyP
yxP
yxPyx
i
jji
ii
ii
ii
i
x
n
jjjXY
x
Xx
XYx
xYX
x
iYXx
MAP
C
)(xfC
Hossein Taghavi 17
Belief Propagation
• If the Tanner graph is cycle-free, there is a message-passing approach to bit-wise MAP decoding.
• The nodes of the Tanner graph exchange updated messages, i.e.
conditional bit distributions, denoted u = [u(1), u(-1)].
• The initial messages presented by the channel to the variable nodes are of the form
• The variable-to-check and check-to-variable message updates are determined by the “sum-product” update rule.
)]1|(),1|([)]1(),1([ ||,,, iXYiXYichichich ypypuuuiiii
x1 x2 x3 x4 x5 x6 x7
Hossein Taghavi 18
Sum-Product Update Rule
• Variable to check
• Check to Variable
– where f is the parity-check indicator function.
1
1
}1{for , )()(d
k
kch bbuubv
,)(),,,,()(},,,{
1
1
121
121
dxxx
d
k
kkd xvxxxbfbu
from channel
2du
1du
chu1u
v 0u
u
v1
vd-1vd
v0
Hossein Taghavi 19
Log-Likelihood Formulation• The sum-product update is simplified using log-likelihoods• For message u, define
• Variable-to-check update
• Check-to-variable update
)1(
)1(log)(
u
uuL
1
1
1
2
)(tanhtanh2)(
d
k
kvLuL
edge e
L(vd-2)
L(vd-1)
L(v1)
from channel
)( 2duL
)( 1duL
)( 0uL )( 1uL
L(v)
1
0
)()(d
k
kuLvL
L(u)
Hossein Taghavi 20
Performance Analysis
• In the spirit of Shannon, we can analyze the performance of message-passing decoding on ensembles of LDPC codes
with specified degree distributions (λ , ρ).• The results provide criteria for designing LDPC codes that
transmit reliably with MP decoding at rates very close to the Shannon capacity.
• The analysis can assume the all-0’s codeword is sent.
• The results are asymptotic, i.e. hold for very large block lengths.
Hossein Taghavi 21
Key Results
• Concentration– With high probability, the performance of ℓ rounds of BP decoding on
a randomly selected code converges to the ensemble average
performance as the length n→∞.
• Convergence to cycle-free performance
– The average performance of ℓ rounds of BP decoding on the (n, λ, ρ) ensemble converges to the performance on a graph with no cycles of
length ≤ 2ℓ as the length n→∞.
Hossein Taghavi 22
Cycle-free Performance: Density Evolution
• We can get asymptotic results by looking at the cycle-free case.
• The cycle-free performance can be computed using density evolution.
• For an ensemble of LDPC code:– the incoming messages to each node are i.i.d.
– the channel observations, L(u0), at different variable nodes are i.i.d.
edge e
L(vd-2)
L(vd-1)
L(v1)
from channel
)( 2duL
)( 1duL
)( 0uL )( 1uL
L(v)
Hossein Taghavi 23
Density Evolution, cont.
So at each iteration, we can compute the pdf of the outgoing messages in terms of those of the incoming messages.
– Having the pdf of the LLRs after ℓ iterations, we can compute the bit error probability.
• Threshold calculation
– There is a threshold channel parameter p*(λ,ρ) such that, for any “better” channel parameter p, the cycle-free error probability approaches 0 as the number of iterations ℓ→∞.
• For some channels, we can optimize the degree distributions of irregular LDPC codes for the best p*.
• This technique has produced rate 1/2 LDPC ensembles with thresholds within 0.0045dB of the Shannon limit on the AWGN channel!
• A rate 1/2 code with block length 107 provided BER of 10-6 within 0.04 dB of the Shannon limit!
Hossein Taghavi 24
Finite-length Performance
• Remaining questions:
1. How does the waterfall region scale with block length?
2. Where does the error floor occur?
• In practice, the algorithm sometimes does not converge.
– We can see the MP decoding as an optimization algorithm that sometimes gets trapped in local minima.
– This events are analytically characterized for the BEC, but are still not fully understood in general.
• There are some promising results obtained by techniques from the statistical mechanics literature.
Hossein Taghavi 25
Outline
1. Introduction to Coding Theory
• Shannon’s Channel Coding Theorem
• Error-Correcting Codes – State-of-the-Art
2. Low-Density Parity-Check Codes
• Design and Message-Passing Decoding
• Performance
3. Linear Programming Decoding
• LP Relaxation
• Properties
• Improvements
4. Conclusion and Open Problems
Hossein Taghavi 26
Linear Programming Decoding: Motivation
• Linear Programming (LP) decoding is an alternative to MP decoding for Turbo and LDPC codes.
• Its performance has connections to that of MP decoding.
• Advantages:– Amenability to finite-length analysis
– Potential for improvement
– Detectable failures
• Major drawback:
– Higher complexity than MP
• Sometimes seen as tool tocharacterize MP decoding.
The figure is courtesy of Jon Feldman.
Hossein Taghavi 27
Maximum-likelihood (ML) Decoding of Binary Linear Codes
• ML (MAP) Block Decoding: Find the sequence, x, that maximizes the likelihood of the received vector.
• Optimization with linear objective function, but nonlinear constraints
tn
0 1
1 –1
txtrChannel
EncoderChannel Decoder
ty
const
iii
i ii
iii
x
iii
x
xii
x
xx
iii
xn
xrxr
xrx
xrxr
xrxxx
i
i
i
i
i
]0|Pr[ln]1|Pr[
]0|Pr[lnmaxarg
]1|Pr[ln1]0|Pr[ln1maxarg
]|Pr[maxarg)ˆ,...,ˆ(ˆ
]1[
1
]0[
1
C
C
C
xT
Cx
Minimize x
Subject to
Hossein Taghavi 28
Feldman’s LP Relaxation of ML
• LP decoding:
1.Each binary parity-check constraint:
is replaced by linear inequalities
2.Each binary condition
is relaxed to a box constraint
• These linear inequalities define the fundamental polytope, P.
• Linear optimization:
1|| ,odd is||.. \
VxxVtsNVVNi
iVi
ij
j
xT
Px
Minimize x
Subject to
2 mod 0 jNi
ix
10 ix
}1,0{ix
x1 x2 x3 x4 x5 x6 x7
Nj
cj
Hossein Taghavi 29
The Fundamental Polytope
• Properties:
• The integer vertices of P are exactly the codewords.
• There are additional “non-integral” vertices, too. Pseudo-Codewords (PCW)
• γ determines a direction to search for the minimum cost vertex, x.
• Possible algorithm outputs:
• ML codeword (if the solution is integral)
• A non-integral vector (PCW) => declare failure
We always know if LP finds the ML solution.
-γ
Codeword
Pseudo-codeword
Hossein Taghavi 30
Some Performance Results
• For binary linear codes, WER is independent of the transmitted codeword.
• Motivated by the Hamming weight, we can define the pseudo-weight wp.
• For AWGN:
– For binary vectors, it is equal to the Hamming weight.
• [Koetter et al.]: If 1n is transmitted and r is received, the Euclidean distance in the
signal space between 1n and r is wp(r).
– So, the performance can be described by the pseudo-weight spectrum.
– The minimum pseudo-weight describes the error floor.
• More results:
– The minimum wp increases sublinearly n for regular codes.
– Bounds on the threshold of LP decoding
2
2( )i
ip
ii
x
w xx
Hossein Taghavi 31
Size of the LP Decoding Problem
• For every check node with a neighborhood N of size d:
• constraints of the form
• Total of constraints.
• High-density codes :
•
• The complexity is exponential in n.
• [Feldman et al.]: An equivalent relaxation with O(n3) constraints.
• Do we need all the constraints?
1||\
VxxVNi
iVi
i
j
12 d
)2(maxd
mnO
)(max
nOd
Hossein Taghavi 32
• Definition:
A constraint of the form is a cut at point if it is violated at
• i.e. .
• Theorem 1: At any given point , at most one of the constraints introduced by each parity-check can be violated.
• There is a O(md max) way to find all these cuts, (if any).
• d max is the maximum check-node degree.
Properties
iTi bxa
iTi bxa ˆ
nx Rˆ
nx ]1,0[ˆ
x̂
Hossein Taghavi 33
• Reduce the complexity by decreasing the number of constraints/vertices.
• Do not use a constraint until it is violated.
• Algorithm 1 (Adaptive LP):
1.Set up the problem with a minimal number of constraints to guarantee boundedness of the result.
2.Find the optimum point x(k) by linear programming.
3.For each check node,
• Check if this check node introduces a cut; if so, add it to the set of constraints.
4. If at least one cut is added, go to step 2; otherwise, we have found the LP solution.
Adaptive LP Decoding
Hossein Taghavi 34
• Theorem 2: Algorithm 1 converges in at most n iterations.
• Proof Outline: The final solution can be determined by n independent constraints
where
Each intermediate solution, x(k) , should violate at least one of κi’s.
Hence, at most n intermediate solutions/iterations.
• Corollary 2: Algorithm 1 has at most n+nm constraints at the final iteration.
• Proof: We start with n constraints, and each of the m parity checks add at most one constraint per iteration.
Upper Bound on the Complexity
nibxa iTii ,...,1,:
0iT
ia
Hossein Taghavi 35
Numerical Results at Low SNR
10 20 30 40 50 60 70 80 900
20
40
60
80
100
120
140
Number of Parity Checks, m
Nu
mb
er
of C
on
stra
ints
Average # ConstraintsMaximum # Constraints
Fixed length n = 360 and rate R = 0.5 .
Fixed length n = 120 and variable node degree dv = 3 .
0 5 10 15 20 25 30 35 4010
1
102
103
104
105
106
107
Check Node Degree, dc
Nu
mb
er
of C
on
stra
int U
sed
Standard LP DecodingCorollary 2 Upper BoundMaximum # ConstraintsAverage # Constraints
• Observation: Adaptive LP decoding converges with O(1) iterations and less than 2 constraints per parity check.
Hossein Taghavi 36
Numerical Results: Gain in Running Time
• Reducing the number of constraints translates into a gain in running time:
• ~10 2 times faster than standard implementation for dc = 8 .
• Even faster if we use a “warm start” at each iteration.
• The time remains constant
even with a high-density
code ( dc=O(n) )
The LP solver is not making
use of the sparsity!
Dashed lines: dc=6
Solid lines: dc=8
Hossein Taghavi 37
Open Problems
• Finite-length analysis of ensembles of LDPC codes under LP decoding
• Computing the thresholds under LP decoding
• Design LP solvers that exploit the properties of the decoding problem
• Given a code, find its best Tanner graph representation for LP decoding
Hossein Taghavi 38
Outline
1. Introduction to Coding Theory
• Shannon’s Channel Coding Theorem
• Error-Correcting Codes – State-of-the-Art
2. Low-Density Parity-Check Codes
• Design and Message-Passing Decoding
• Performance
3. Linear Programming Decoding
• LP Relaxation
• Properties
• Improvements
4. Conclusion
Hossein Taghavi 39
Conclusion
• LDPC codes are becoming the mainstream in coding technology.– Already implemented in 3G and LAN standards
• Many new applications, beyond LDPC decoding, are being studied for MP algorithms:– Joint equalization and decoding
– Classical combinatorial problems, e.g. SAT
• Connections to other fields are being made– Coding theory is being invaded by statistical physicists!
• Many important questions are still open, waiting for you!
Hossein Taghavi 40
Some References
1. Gallager, R. G., Low-Density Parity-Check Codes, M.I.T. Press, Cambridge, Mass: 1963.
2. T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press (Preliminary version available online at Urbanke’s webpage at EPFL)
3. Special Issue on Codes on Graphs and Iterative Algorithms, IEEE Transactions on Information Theory, February 2001.
Thanks!