hossein taghavi : codes on graphs

Codes on Graphs: Introduction and Recent Advances

Mohammad Hossein TaghaviIn Collaboration with Prof. Paul H. Siegel

University of California, San Diego

E-mail: [email protected]

Hossein Taghavi 2

Outline

1. Introduction to Coding Theory

• Shannon’s Channel Coding Theorem

• Error-Correcting Codes – State-of-the-Art

2. Low-Density Parity-Check Codes

• Design and Message-Passing Decoding

• Performance

3. Linear Programming Decoding

• LP Relaxation

• Properties

• Improvements

4. Conclusion and Open Problems

Hossein Taghavi 3

Outline






• Performance


• LP Relaxation

• Properties

• Improvements


Hossein Taghavi 4

A Noisy Communication System

INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION

MESSAGE

SIGNAL RECEIVED SIGNAL

MESSAGE

NOISE SOURCE

CHANNEL

Hossein Taghavi 5

Common Channels

0 0

1 1

1-p

1-p

p

p

•Binary Erasure Channel BEC(ε)

•Binary Symmetric Channel BSC(p)

PP

)1|(yf)1|( yf

?ε

ε

1- ε

1- ε

0

1

0

1

•Additive White Gaussian Noise Channel

Hossein Taghavi 6

Coding

Code rate: nkR

kxxx ,,, 21 x

kxxx ˆ,,ˆ,ˆˆ21 x

Source Encoder

DecoderSink

Channel

nccc ,,, 21 c

nyyy ,,, 21 y

•We add redundancy to the message for protection against noise.

• Coding Theory:

1. Design Good Codes (mappings)

2. Design Good Encoding and Decoding Algorithms

Hossein Taghavi 7

Shannon’s Coding Theorem

• Every communication channel is characterized by a single number C, called the channel capacity.

• If C is a code with rate R > C, then the probability of error in decoding this code is bounded away from 0.

• For any information rate R < C and any δ > 0, there exists a code C of length nδ and rate R, such that the probability of error in maximum likelihood decoding of this code is at most δ.

CRdef

use channel

bitsn informatio #

•Bottomline: Reliable communication is possible if and only if

Hossein Taghavi 8

Designing Binary Codes

• Coding generally involves mapping each of the 2k vectors in {0,1}k to some vector (codeword) in {0,1}n.

• A good mapping places the codewords at the maximum possible distances.

• The proof of Shannon’s theorem is based on (long) random codes and optimal decoding.– i.e. a random mapping from {0,1}k to {0,1}n

– The decoder uses a look-up table (practically infeasible)

• Algebraic coding theory studies highly-structured codes – e.g. Reed-Solomon codes have efficient decoders but cannot get very close to

the capacity of binary channels.

Hossein Taghavi 9

State-of-the-Art

• Solution– Long, structured, “pseudorandom” codes– Practical, near-optimal decoding algorithms

• Examples– Turbo codes (1993)– Low-density parity-check (LDPC) codes (1960, 1999)

• State-of-the-art– Turbo codes and LDPC codes have brought Shannon limits to within

reach on a wide range of channels.

Hossein Taghavi 10

Evolution of Coding Technology

from Trellis and Turbo Coding, Schlegel and Perez, IEEE Press, 2004

LDPC codes

Hossein Taghavi 11

Outline






• Performance


• LP Relaxation

• Properties

• Improvements


Hossein Taghavi 12

Binary Linear Codes

• Codewords are chosen to satisfy a number of (binary) linear constraints.

• Parameters of binary linear block code C

k = number of information bits

n = number of code bits

R = k / n

dmin = minimum distance

• There are many ways to describe C

– Codebook (list)

– Parity-check matrix / generator matrix

– Graphical representation (“Tanner graph”)

Hossein Taghavi 13

• A binary linear code C is the collection of points x(i) in {0,1}n that satisfy

Hmxn x(i) = 0 mod 2.

• If H is full rank, there will be 2n-m codewords.

• The code can be described by a Tanner Graph:

• Neighborhood Nj: The set of nodes directly connected to j.

• Degree dj : The size of the neighborhood, Nj.

• Example:

Linear Block Codes on Graphs

x1 x2 x3 x4 x5 x6 x7

m check nodes

n variable nodes

x1+ x3+ x6= 0mod 2

2 mod

0

0

0

0

0110010

1011001

1001110

0100101

7

6

5

4

3

2

1

x

x

x

x

x

x

x

xH

Hossein Taghavi 14

Low-Density Parity-Check Codes

• Proposed by Gallager (1960)

• “Sparseness” of matrix and graph descriptions

– Number of 1’s in H grows linearly with block length

– Number of edges in Tanner graph grows linearly with block length

• “Randomness” of construction in:

– Placement of 1’s in H

– Connectivity of variable and check nodes

• Iterative, message-passing decoder

– Simple “local” decoding at nodes

– Iterative exchange of information (message-passing)

• Other families of graph-based codes:

– Repeat Accumulate Codes

– Fountain Codes

Hossein Taghavi 15

Code Construction

• LDPC codes are generally constructed randomly, according to certain distributions on the degrees of variable and check nodes.

– Once the node degrees are selected, connections are made randomly

1. Regular LDPC: Each check node has degree dc, and each variable

node has degree dv

2. Irregular LDPC: The degrees of variable and check nodes need not be constant. (generalization of regular LDPC)

• Ensemble defined by “node degree distribution” functions.

vd

i

ii xx

1

)(

cd

i

ii xx

2

)(

ii

degree of nodes

variableofnumber i

i

degree of nodes

check ofnumber

Hossein Taghavi 16

Optimal Bit Decoding

• Consider transmission of binary inputs X{1} over a memoryless

channel using linear code C.

• Assume codewords are transmitted equiprobably.• Maximum a posteriori (MAP) bit decoding rule minimizes bit error

probability:

• where is 1 if x is a codeword of C, and 0 otherwise.

)()|(max arg

)()|(max arg

)|(max arg

)|(maxarg)(ˆ

~ 1|

1

~|

1

~|

1

|1

xfxyP

xPxyP

yxP

yxPyx

i

jji

ii

ii

ii

i

x

n

jjjXY

x

Xx

XYx

xYX

x

iYXx

MAP

C

)(xfC

Hossein Taghavi 17

Belief Propagation

• If the Tanner graph is cycle-free, there is a message-passing approach to bit-wise MAP decoding.

• The nodes of the Tanner graph exchange updated messages, i.e.

conditional bit distributions, denoted u = [u(1), u(-1)].

• The initial messages presented by the channel to the variable nodes are of the form

• The variable-to-check and check-to-variable message updates are determined by the “sum-product” update rule.

)]1|(),1|([)]1(),1([ ||,,, iXYiXYichichich ypypuuuiiii

x1 x2 x3 x4 x5 x6 x7

Hossein Taghavi 18

Sum-Product Update Rule

• Variable to check

• Check to Variable

– where f is the parity-check indicator function.

1

1

}1{for , )()(d

k

kch bbuubv

,)(),,,,()(},,,{

1

1

121

121

dxxx

d

k

kkd xvxxxbfbu

from channel

2du

1du

chu1u

v 0u

u

v1

vd-1vd

v0

Hossein Taghavi 19

Log-Likelihood Formulation• The sum-product update is simplified using log-likelihoods• For message u, define

• Variable-to-check update

• Check-to-variable update

)1(

)1(log)(

u

uuL

1

1

1

2

)(tanhtanh2)(

d

k

kvLuL

edge e

L(vd-2)

L(vd-1)

L(v1)

from channel

)( 2duL

)( 1duL

)( 0uL )( 1uL

L(v)

1

0

)()(d

k

kuLvL

L(u)

Hossein Taghavi 20

Performance Analysis

• In the spirit of Shannon, we can analyze the performance of message-passing decoding on ensembles of LDPC codes

with specified degree distributions (λ , ρ).• The results provide criteria for designing LDPC codes that

transmit reliably with MP decoding at rates very close to the Shannon capacity.

• The analysis can assume the all-0’s codeword is sent.

• The results are asymptotic, i.e. hold for very large block lengths.

Hossein Taghavi 21

Key Results

• Concentration– With high probability, the performance of ℓ rounds of BP decoding on

a randomly selected code converges to the ensemble average

performance as the length n→∞.

• Convergence to cycle-free performance

– The average performance of ℓ rounds of BP decoding on the (n, λ, ρ) ensemble converges to the performance on a graph with no cycles of

length ≤ 2ℓ as the length n→∞.

Hossein Taghavi 22

Cycle-free Performance: Density Evolution

• We can get asymptotic results by looking at the cycle-free case.

• The cycle-free performance can be computed using density evolution.

• For an ensemble of LDPC code:– the incoming messages to each node are i.i.d.

– the channel observations, L(u0), at different variable nodes are i.i.d.

edge e

L(vd-2)

L(vd-1)

L(v1)

from channel

)( 2duL

)( 1duL

)( 0uL )( 1uL

L(v)

Hossein Taghavi 23

Density Evolution, cont.

So at each iteration, we can compute the pdf of the outgoing messages in terms of those of the incoming messages.

– Having the pdf of the LLRs after ℓ iterations, we can compute the bit error probability.

• Threshold calculation

– There is a threshold channel parameter p*(λ,ρ) such that, for any “better” channel parameter p, the cycle-free error probability approaches 0 as the number of iterations ℓ→∞.

• For some channels, we can optimize the degree distributions of irregular LDPC codes for the best p*.

• This technique has produced rate 1/2 LDPC ensembles with thresholds within 0.0045dB of the Shannon limit on the AWGN channel!

• A rate 1/2 code with block length 107 provided BER of 10-6 within 0.04 dB of the Shannon limit!

Hossein Taghavi 24

Finite-length Performance

• Remaining questions:

1. How does the waterfall region scale with block length?

2. Where does the error floor occur?

• In practice, the algorithm sometimes does not converge.

– We can see the MP decoding as an optimization algorithm that sometimes gets trapped in local minima.

– This events are analytically characterized for the BEC, but are still not fully understood in general.

• There are some promising results obtained by techniques from the statistical mechanics literature.

Hossein Taghavi 25

Outline






• Performance


• LP Relaxation

• Properties

• Improvements


Hossein Taghavi 26

Linear Programming Decoding: Motivation

• Linear Programming (LP) decoding is an alternative to MP decoding for Turbo and LDPC codes.

• Its performance has connections to that of MP decoding.

• Advantages:– Amenability to finite-length analysis

– Potential for improvement

– Detectable failures

• Major drawback:

– Higher complexity than MP

• Sometimes seen as tool tocharacterize MP decoding.

The figure is courtesy of Jon Feldman.

Hossein Taghavi 27

Maximum-likelihood (ML) Decoding of Binary Linear Codes

• ML (MAP) Block Decoding: Find the sequence, x, that maximizes the likelihood of the received vector.

• Optimization with linear objective function, but nonlinear constraints

tn

0 1

1 –1

txtrChannel

EncoderChannel Decoder

ty

const

iii

i ii

iii

x

iii

x

xii

x

xx

iii

xn

xrxr

xrx

xrxr

xrxxx

i

i

i

i

i

]0|Pr[ln]1|Pr[

]0|Pr[lnmaxarg

]1|Pr[ln1]0|Pr[ln1maxarg

]|Pr[maxarg)ˆ,...,ˆ(ˆ

]1[

1

]0[

1

C

C

C

xT

Cx

Minimize x

Subject to

Hossein Taghavi 28

Feldman’s LP Relaxation of ML

• LP decoding:

1.Each binary parity-check constraint:

is replaced by linear inequalities

2.Each binary condition

is relaxed to a box constraint

• These linear inequalities define the fundamental polytope, P.

• Linear optimization:

1|| ,odd is||.. \

VxxVtsNVVNi

iVi

ij

j

xT

Px

Minimize x

Subject to

2 mod 0 jNi

ix

10 ix

}1,0{ix

x1 x2 x3 x4 x5 x6 x7

Nj

cj

Hossein Taghavi 29

The Fundamental Polytope

• Properties:

• The integer vertices of P are exactly the codewords.

• There are additional “non-integral” vertices, too. Pseudo-Codewords (PCW)

• γ determines a direction to search for the minimum cost vertex, x.

• Possible algorithm outputs:

• ML codeword (if the solution is integral)

• A non-integral vector (PCW) => declare failure

We always know if LP finds the ML solution.

-γ

Codeword

Pseudo-codeword

Hossein Taghavi 30

Some Performance Results

• For binary linear codes, WER is independent of the transmitted codeword.

• Motivated by the Hamming weight, we can define the pseudo-weight wp.

• For AWGN:

– For binary vectors, it is equal to the Hamming weight.

• [Koetter et al.]: If 1n is transmitted and r is received, the Euclidean distance in the

signal space between 1n and r is wp(r).

– So, the performance can be described by the pseudo-weight spectrum.

– The minimum pseudo-weight describes the error floor.

• More results:

– The minimum wp increases sublinearly n for regular codes.

– Bounds on the threshold of LP decoding

2

2( )i

ip

ii

x

w xx

Hossein Taghavi 31

Size of the LP Decoding Problem

• For every check node with a neighborhood N of size d:

• constraints of the form

• Total of constraints.

• High-density codes :

•

• The complexity is exponential in n.

• [Feldman et al.]: An equivalent relaxation with O(n3) constraints.

• Do we need all the constraints?

1||\

VxxVNi

iVi

i

j

12 d

)2(maxd

mnO

)(max

nOd

Hossein Taghavi 32

• Definition:

A constraint of the form is a cut at point if it is violated at

• i.e. .

• Theorem 1: At any given point , at most one of the constraints introduced by each parity-check can be violated.

• There is a O(md max) way to find all these cuts, (if any).

• d max is the maximum check-node degree.

Properties

iTi bxa

iTi bxa ˆ

nx Rˆ

nx ]1,0[ˆ

x̂

Hossein Taghavi 33

• Reduce the complexity by decreasing the number of constraints/vertices.

• Do not use a constraint until it is violated.

• Algorithm 1 (Adaptive LP):

1.Set up the problem with a minimal number of constraints to guarantee boundedness of the result.

2.Find the optimum point x(k) by linear programming.

3.For each check node,

• Check if this check node introduces a cut; if so, add it to the set of constraints.

4. If at least one cut is added, go to step 2; otherwise, we have found the LP solution.

Adaptive LP Decoding

Hossein Taghavi 34

• Theorem 2: Algorithm 1 converges in at most n iterations.

• Proof Outline: The final solution can be determined by n independent constraints

where

Each intermediate solution, x(k) , should violate at least one of κi’s.

Hence, at most n intermediate solutions/iterations.

• Corollary 2: Algorithm 1 has at most n+nm constraints at the final iteration.

• Proof: We start with n constraints, and each of the m parity checks add at most one constraint per iteration.

Upper Bound on the Complexity

nibxa iTii ,...,1,:

0iT

ia

Hossein Taghavi 35

Numerical Results at Low SNR

10 20 30 40 50 60 70 80 900

20

40

60

80

100

120

140

Number of Parity Checks, m

Nu

mb

er

of C

on

stra

ints

Average # ConstraintsMaximum # Constraints

Fixed length n = 360 and rate R = 0.5 .

Fixed length n = 120 and variable node degree dv = 3 .

0 5 10 15 20 25 30 35 4010

1

102

103

104

105

106

107

Check Node Degree, dc

Nu

mb

er

of C

on

stra

int U

sed

Standard LP DecodingCorollary 2 Upper BoundMaximum # ConstraintsAverage # Constraints

• Observation: Adaptive LP decoding converges with O(1) iterations and less than 2 constraints per parity check.

Hossein Taghavi 36

Numerical Results: Gain in Running Time

• Reducing the number of constraints translates into a gain in running time:

• ~10 2 times faster than standard implementation for dc = 8 .

• Even faster if we use a “warm start” at each iteration.

• The time remains constant

even with a high-density

code ( dc=O(n) )

The LP solver is not making

use of the sparsity!

Dashed lines: dc=6

Solid lines: dc=8

Hossein Taghavi 37

Open Problems

• Finite-length analysis of ensembles of LDPC codes under LP decoding

• Computing the thresholds under LP decoding

• Design LP solvers that exploit the properties of the decoding problem

• Given a code, find its best Tanner graph representation for LP decoding

Hossein Taghavi 38

Outline






• Performance


• LP Relaxation

• Properties

• Improvements

4. Conclusion

Hossein Taghavi 39

Conclusion

• LDPC codes are becoming the mainstream in coding technology.– Already implemented in 3G and LAN standards

• Many new applications, beyond LDPC decoding, are being studied for MP algorithms:– Joint equalization and decoding

– Classical combinatorial problems, e.g. SAT

• Connections to other fields are being made– Coding theory is being invaded by statistical physicists!

• Many important questions are still open, waiting for you!

Hossein Taghavi 40

Some References

1. Gallager, R. G., Low-Density Parity-Check Codes, M.I.T. Press, Cambridge, Mass: 1963.

2. T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press (Preliminary version available online at Urbanke’s webpage at EPFL)

3. Special Issue on Codes on Graphs and Iterative Algorithms, IEEE Transactions on Information Theory, February 2001.

Thanks!

hossein taghavi : codes on graphs

Technology

codes fountain codes

binary codes coding

structured codes

based codes

linear block codes

binary linear codes

stateoftheart turbo

code construction ldpc