eﬃcient architectures for error control using low-density parity-check...

Efficient Architecturesfor Error Control

usingLow-Density Parity-Check Codes

David Haley

Thesis submitted for the degree of

Doctor of Philosophy

Institute for Telecommunications Research

July 2004

To my wife Jane,my mother Verna, and my father Peter.

Without my family anything that I achieve is meaningless.

Contents

List of Figures v

List of Tables vii

List of Algorithms viii

Glossary ix

Notation and Symbols xi

Summary xv

Publications xvi

Declaration xvii

Acknowledgments xviii

1 Introduction 1

1.1 The Digital Communication System . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.3 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.4 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.5 The Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Codec Design Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Overview of Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Contributions of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Low-Density Parity-Check Codes 11

2.1 Historical Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . . . . . . 14

i

2.4 Factor Graph Representation of LDPC Codes . . . . . . . . . . . . . . . 16

2.5 Iterative Decoding of LDPC Codes . . . . . . . . . . . . . . . . . . . . . 18

2.5.1 The Sum-Product Algorithm . . . . . . . . . . . . . . . . . . . . . 19

2.5.2 Hard Decision Decoding . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Alternative Representations and Algorithms . . . . . . . . . . . . . . . . 25

2.7 Encoding LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7.1 Structured Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.7.2 Code Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8 Analysis of Codes and Decoding on Graphs . . . . . . . . . . . . . . . . . 33

2.8.1 Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.8.2 Stopping Sets and Extrinsic Message Degree . . . . . . . . . . . . 35

2.8.3 Graph Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.8.4 Near-Codewords . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.8.5 The Computation Tree . . . . . . . . . . . . . . . . . . . . . . . . 37

2.8.6 Finite Graph-Covers . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.9 Developments in LDPC Code Design . . . . . . . . . . . . . . . . . . . . 43

2.9.1 Irregular Constructions . . . . . . . . . . . . . . . . . . . . . . . . 43

2.9.2 Non-Binary Construction . . . . . . . . . . . . . . . . . . . . . . . 43

2.9.3 Algebraic LDPC Constructions . . . . . . . . . . . . . . . . . . . 44

2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Iterative Encoding of LDPC Codes 46

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 The Sum-Product Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Encoding via Iterative Matrix Inversion . . . . . . . . . . . . . . . . . . . 49

3.4 Reversible LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4.1 Building Iteratively Encodable Circulants . . . . . . . . . . . . . . 52

3.4.2 Enforcing the Overlap Constraint . . . . . . . . . . . . . . . . . . 55

3.4.3 Building Reversible LDPC Codes from Circulants . . . . . . . . . 57

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Performance Analysis 61

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Performance on the AWGN Channel . . . . . . . . . . . . . . . . . . . . 62

4.3 Performance on the Binary Erasure Channel . . . . . . . . . . . . . . . . 68

4.4 Analysis of Codes and Decoder Behaviour . . . . . . . . . . . . . . . . . 69

4.4.1 Minimum Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.4.2 Stopping Sets and Extrinsic Message Degree . . . . . . . . . . . . 76

4.4.3 Cycles and Near-Codewords . . . . . . . . . . . . . . . . . . . . . 78

ii

4.4.4 Finite Graph-Covers . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.4.5 Graph Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5 Improved Reversible LDPC Codes 88

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2 Simple Metrics for Implementation Complexity . . . . . . . . . . . . . . 89

5.3 High Rate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4 Codes Built From Improved Expanders . . . . . . . . . . . . . . . . . . . 92

5.5 Recursive Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.6 Simulation Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Analog Decoding 99

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.2 Potential Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.3 Existing Work and Remaining Challenges . . . . . . . . . . . . . . . . . . 101

6.4 The Subthreshold CMOS Analog Approach . . . . . . . . . . . . . . . . . 103

6.4.1 Subthreshold Operation . . . . . . . . . . . . . . . . . . . . . . . 103

6.4.2 The Gilbert Multiplier . . . . . . . . . . . . . . . . . . . . . . . . 105

6.4.3 Probability Normalisation . . . . . . . . . . . . . . . . . . . . . . 105

6.5 Analog Computation using Soft-Logic Gates . . . . . . . . . . . . . . . . 106

6.5.1 The Analog Soft-XOR Gate . . . . . . . . . . . . . . . . . . . . . 107

6.5.2 The Analog Soft-Equal Gate . . . . . . . . . . . . . . . . . . . . . 107

6.5.3 Building Variable and Check Nodes . . . . . . . . . . . . . . . . . 108

6.6 The Analog Sum-Product Decoder . . . . . . . . . . . . . . . . . . . . . 109

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7 The Reversible LDPC Codec 113

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.2 Mode-Switching Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.3 Codec Core Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.3.1 Decode Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.3.2 Encode Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.3.3 Estimate of Encoder Implementation Overhead . . . . . . . . . . 120

7.4 Circuit Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8 Conclusion 125

8.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

iii

8.2 Suggestions for Further Work . . . . . . . . . . . . . . . . . . . . . . . . 128

Bibliography 130

iv

List of Figures

1.1 Digital communication system model. . . . . . . . . . . . . . . . . . . . . 3

2.1 (3,6)-regular LDPC construction using random permutation matrices. . . 16

2.2 Factor graph representation of the global function g(x1, x2, x3). . . . . . . 17

2.3 Factor graph representation of a (2,4)-regular code. . . . . . . . . . . . . 17

2.4 Soft gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 An upper-triangular parity-check matrix. . . . . . . . . . . . . . . . . . . 28

2.6 A staircase structure for Hp. . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7 A quasi-cyclic parity-check matrix. . . . . . . . . . . . . . . . . . . . . . 29

2.8 Systematic quasi-cyclic form of H. . . . . . . . . . . . . . . . . . . . . . . 30

2.9 Shift register based encoder for a quasi-cyclic code. . . . . . . . . . . . . 30

2.10 The parity-check matrix rearranged into approximate lower-triangular form. 32

2.11 A cycle of length four. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.12 A size five stopping set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.13 Computation tree example. . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.14 A simple graph G and a double cover. . . . . . . . . . . . . . . . . . . . . 39

2.15 An m-fold cover of G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1 Jacobi algorithm as message-passing. . . . . . . . . . . . . . . . . . . . . 52

4.1 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n = 64, on the AWGN channel. . . . . . . . . . . . . . 64

4.2 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 500, on the AWGN channel. . . . . . . . . . . . . 65

4.3 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 1000, on the AWGN channel. . . . . . . . . . . . 66

4.4 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 4000, on the AWGN channel. . . . . . . . . . . . 67

4.5 Shifting the error floor for Rev4096, at Eb/N0 = 2dB, by varying η. . . . 68

4.6 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n = 64, on the binary erasure channel. . . . . . . . . . 70

4.7 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 500, on the binary erasure channel. . . . . . . . . 71

v

4.8 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 1000, on the binary erasure channel. . . . . . . . 72

4.9 Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 4000, on the binary erasure channel. . . . . . . . 73

4.10 Occurrence of maximal stopping sets for Rev4096, when ǫ = 0.375. . . . . 76

4.11 Parity bits in the |S| = 11 stopping set. . . . . . . . . . . . . . . . . . . . 78

4.12 Graph of Hp viewed as a lattice. . . . . . . . . . . . . . . . . . . . . . . . 79

4.13 Error vector weight histograms for Rev4096 at Eb/N0 = 2dB. . . . . . . . 80

4.14 Cycles in a variable row of the lattice for Hp. . . . . . . . . . . . . . . . . 80

4.15 Building a pseudo-codeword on the lattice for Hp. . . . . . . . . . . . . . 82

4.16 Comparison of expansion for random and reversible structures. . . . . . . 83

4.17 Loci of the spectra for Hp of the reversible codes. . . . . . . . . . . . . . 85

5.1 Comparing the performance of some r ≈ 0.763 LDPC codes. . . . . . . . 91

5.2 Expansion of type-II reversible structures. . . . . . . . . . . . . . . . . . 94

5.3 Performance of type-II reversible (3,6)-regular LDPC codes. . . . . . . . 96

5.4 Performance of rate r = 0.6 type-II reversible LDPC codes. . . . . . . . . 97

6.1 An n-type MOSFET. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.2 An n-type differential pair. . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.3 Vector multiplication core. . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.4 Vector normalisation using a biased current mirror. . . . . . . . . . . . . 107

6.5 Subthreshold CMOS soft-XOR gate. . . . . . . . . . . . . . . . . . . . . 108

6.6 Subthreshold CMOS soft-equal gate. . . . . . . . . . . . . . . . . . . . . 109

6.7 Voltage-mode soft-XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . 110

6.8 Factor graph representation of bi-directional 3-port soft-logic gates. . . . 111

6.9 Check and variable nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.10 Analog decoder circuit for H. . . . . . . . . . . . . . . . . . . . . . . . . 112

7.1 Mode-switching XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2 Codec circuit for H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.3 Information variable node structure for x9 . . . . . . . . . . . . . . . . . 117

7.4 Parity variable node structure for x1 . . . . . . . . . . . . . . . . . . . . 118

7.5 Shift register clock phases. . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.6 Decoder output with bits x7 and x11 corrected. . . . . . . . . . . . . . . . 122

7.7 Encoder parity bit outputs. . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.8 Encoder output for parity bit x1. . . . . . . . . . . . . . . . . . . . . . . 123

vi

List of Tables

4.1 Reversible LDPC codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Randomly constructed benchmark LDPC codes. . . . . . . . . . . . . . . 62

4.3 Partial weight distribution of Rev64, for w (xu) ≤ 5. . . . . . . . . . . . . 74

4.4 Partial weight distribution of Rev512, for w (xu) ≤ 3 (first 14 entries only). 75

4.5 Bounds on dmin for the reversible codes. . . . . . . . . . . . . . . . . . . . 75

4.6 Stopping set configurations. . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Comparison of high rate code performance and complexity. . . . . . . . . 92

7.1 Mode-switching gate settings. . . . . . . . . . . . . . . . . . . . . . . . . 114

7.2 Transistor overhead for encoding. . . . . . . . . . . . . . . . . . . . . . . 120

7.3 Codec core simulation parameters. . . . . . . . . . . . . . . . . . . . . . 121

7.4 Example iterative solution. . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.5 Codec core specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

vii

List of Algorithms

2.1 Sum-Product Decoder (Probability Domain) . . . . . . . . . . . . . . . . . 23

2.2 Sum-Product Decoder (Log-Likelihood Domain) . . . . . . . . . . . . . . . 24

2.3 Erasure Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 Message-Passing Jacobi Method Over F2 . . . . . . . . . . . . . . . . . . . 51

3.2 Construction of Hp for a Type-I Reversible Code . . . . . . . . . . . . . . 57

5.1 Construction of Hp for a Type-II Reversible Code . . . . . . . . . . . . . . 93

viii

Glossary

The following acronyms are used in this thesis.

A/D analog to digital page 101

ARQ automatic-repeat-request page 130

AWGN additive white Gaussian noise page 6

BCJR Bahl-Cocke-Jeinek-Raviv page 101

BEC binary erasure channel page 7

BER bit error rate page 4

BiCMOS bipolar complementary metal-oxide-silicon page 105

BIST built-in self-test page 103

BPSK binary phase shift keyed page 5

BSG bi-directional soft-logic gate page 20

BSGC bi-directional soft-logic gate count page 89

CD compact disc page 2

CMOS complementary metal-oxide-silicon page 102

CRC cyclic redundancy check page 3

DVD digital versatile disc page 2

EG-LDPC Euclidean-geometry low-density parity-check page 90

EMD extrinsic message degree page 35

EXIT extrinsic information transfer page 43

FEC forward error correction page 4

FET field-effect transistor page 103

FFT fast Fourier transform page 18

ix

LDPC low-density parity-check page 11

LLR log-likelihood ratio page 5

MAP maximum a posteriori page 4

MIFG multiple-input floating-gate page 102

ML maximum-likelihood page 4

MOSFET metal-oxide-silicon field-effect transistor page 103

MSG mode-switching gate page 115

RS Reed-Solomon page 46

RTL resistor-transistor-logic page 115

SG soft-logic gate page 19

SiGe silicon-germanium page 104

SNR signal to noise ratio page 4

SOC system-on-a-chip page 101

SOR successive over relaxation page 102

TSMC Taiwan semiconductor manufacturing company page 122

T-SPICE Simulation program with integrated circuit emphasis page 122

VLSI very large scale integration page 99

WER word error rate page 62

XOR exclusive-or page 19

x

Notation and Symbols

This thesis employs the following notation and symbol set.

Notation relating to coding theory

X Matrix notation page 13

[X | Y] Matrix concatenation page 13

|X| Determinant page 83

X−1 Matrix inverse page 29

X ⊗ Y Matrix Kronecker product page 93

X⊤ Matrix transpose page 13

X Set notation page 16

X\Y Set subtraction page 20

|X | Order of set page 36

x Vector notation page 3

[x | y] Vector concatenation page 13

x⊤ Vector transpose page 13

|·| Absolute value page 36

A Adjacency matrix page 34

b Intermediate encoding vector page 27

c Check node in factor graph page 14

C Code page 12

x∗ Complex conjugate of x page 84

dmin Minimum distance page 13

Eb Bit energy page 6

xi

Eb/N0 Signal to noise ratio page 6

ǫ Channel erasure probability page 7

Es Symbol energy page 5

η Decoder clipping parameter page 24

⌊·⌋ Floor function page 13

F2 The binary Galois field page 3

G Factor graph page 39

Γ (s) Set of nodes adjacent to node s on the factor graph page 20

G Factor graph cover page 39

G Generator matrix page 13

Gsys Systematic form of generator matrix page 13

H Parity-check matrix page 13

Hsys Systematic form of parity-check matrix page 13

Hp Section of parity-check matrix relating to parity bits page 27

Hu Section of parity-check matrix relating to information bits page 27

h(x) First row polynomial of Hp for type-I reversible codes page 51

i Row weight of parity-check matrix page 15

[S] Iverson’s convention indicating truth of statement S page 17

j Column weight of parity-check matrix page 15

k Code dimension page 12

κ Jacobi encoder iterations page 49

λ Log-likelihood ratio page 5

m Number of rows in parity-check matrix page 13

M(·) Map operator page 5

m Degree of graph cover page 39

µδ Normalised spectral gap of graph page 36

µ Eigenvalue page 36

µs→r Message passed from node s to node r page 19

xii

n Code length page 12

N0 Noise spectral density page 6

ω Pseudo-codeword page 40

pX(X = x) Probability distribution/mass for random variable X at value x page 5

pX(x) Abbreviated form of pX(X = x) page 5

r Rate of information transfer (code rate) page 13

R The real numbers page 34

sgn (·) Sign operator page 24

σ2 Noise variance page 6

u Information vector page 3

u Decoded estimate of information vector page 3

v Variable node in factor graph page 14

w (x) Weight of vector x page 13

wp (ω) Pseudo-weight of pseudo-codeword ω page 41

wminp Minimum pseudo-weight page 41

x Codeword vector page 3

xp Parity section of codeword vector page 12

xu Information section of codeword vector page 12

x Decoded estimate of codeword page 4

y Received vector page 3

Z The integers page 43

ζ Graph expansion factor page 36

z(x) Syndrome of vector x page 36

xiii

Notation relating to electronics

Gnd Ground page 103

ID Transistor drain current page 104

Iu Unit current page 103

I0 Specific current page 104

MUX Multiplexer page 117

NDIFF Differential pair (n-type) page 119

NORM Normalisation circuit page 119

Vdd Supply voltage page 103

Vdiff Differential reference voltage page 104

Vds Transistor drain-to-source voltage page 104

Vgs Transistor gate-to-source voltage page 104

VrefN Reference voltage (n-type) page 105

VrefP Reference voltage (p-type) page 105

VT Thermal voltage page 104

Vth Transistor threshold voltage page 104

xiv

Summary

Recent designs for low-density parity-check (LDPC) codes have exhibited capacity ap-

proaching performance for large block length, overtaking the performance of turbo codes.

While theoretically impressive, LDPC codes present some challenges for practical imple-

mentation. In general, LDPC codes have higher encoding complexity than turbo codes

both in terms of computational latency and architecture size. Decoder circuits for LDPC

codes have a high routing complexity and thus demand large amounts of circuit area.

There has been recent interest in developing analog circuit architectures suitable

for decoding. These circuits offer a fast, low-power alternative to the digital approach.

Analog decoders also have the potential to be significantly smaller than digital decoders.

In this thesis we present a novel and efficient approach to LDPC encoder/decoder

(codec) design. We propose a new algorithm which allows the parallel decoder architec-

ture to be reused for iterative encoding. We present a new class of LDPC codes which are

iteratively encodable, exhibit good empirical performance, and provide a flexible choice

of code length and rate.

Combining the analog decoding approach with this new encoding technique, we

design a novel time-multiplexed LDPC codec, which switches between analog decode

and digital encode modes. In order to achieve this behaviour from a single circuit we

have developed mode-switching gates. These logic gates are able to switch between

analog (soft) and digital (hard) computation, and represent a fundamental circuit design

contribution. Mode-switching gates may also be applied to built-in self-test circuits for

analog decoders. Only a small overhead in circuit area is required to transform the

analog decoder into a full codec. The encode operation can be performed two orders of

magnitude faster than the decode operation, making the circuit suitable for full-duplex

applications. Throughput of the codec scales linearly with block size, for both encode

and decode operations. The low power and small area requirements of the circuit make

it an attractive option for small portable devices.

xv

Publications

• D. Haley, A. Grant, and J. Buetefuer,“Iterative encoding of low-density parity-check codes,”in Proc. 3rd Australian Commun. Theory Workshop, Canberra, Australia,Feb. 2002, pp. 15–17.

• D. Haley, A. Grant, and J. Buetefuer,“Iterative encoding of low-density parity-check codes,”in Proc. GLOBECOM 2002, Taipei, Taiwan, Nov. 2002, vol. 2, pp. 1289–1293.

• D. Haley, C. Winstead, A. Grant, and C. Schlegel,“Architectures for error control in analog subthreshold CMOS,”in Proc. 4th Australian Commun. Theory Workshop, Melbourne, Australia,Feb. 2003, pp. 75–80.

• D. Haley, C. Winstead, A. Grant, and C. Schlegel,“An analog LDPC codec core,”in Proc. Int. Symp. on Turbo Codes, Brest, France, Sept. 2003, pp. 391–394.

• D. Haley, C. Winstead, A. Grant, V. Gaudet, and C. Schlegel,“Reusing analog decoders for encoding,”in 2nd Analog Decoder Workshop, Zurich, Switzerland, Sept. 2003,available at http://www.isi.ee.ethz.ch/adw/slides/haley.pdf.

• D. Haley and A. Grant,“High rate reversible LDPC codes,”in Proc. 5th Australian Commun. Theory Workshop, Newcastle, Australia,Feb. 2004, pp. 114–117.

• D. Haley, C. Winstead, A. Grant, V. Gaudet, and C. Schlegel,“Robust analog decoder design with mode-switching cells,”in 3rd Analog Decoder Workshop, Banff, Canada, June 2004,available at http://www.analogdecoding.org/docs/Haley ADW04.pdf.

xvi

Declaration

I declare that this thesis does not incorporate without acknowledgment any material

previously submitted for a degree or diploma in any university, and that to the best

of my knowledge it does not contain any materials previously published or written by

another person except where due reference is made in the text.

David Haley

xvii

Acknowledgments

I first thank my principal supervisor Prof. Alex Grant. Thankyou Alex for giving me the

chance to learn the value of research, analysis and the language of mathematics. Thanks

for being one of the most hard working fund raisers in the field, and for giving me the

chance to visit three other continents on multiple occasions. Finally, thanks for all of the

good times and for the bass riffs that you’ve hammered out in our band. Rock On Alex.

I also acknowledge the support of my associate supervisor, Dr. Paul Alexander, and

the financial assistance of Southern Poro Communications.

I thank Prof. Bill Cowley for approaching me to start a doctorate, pointing me in

the direction of LDPC codes, and for introducing me to Alex and Paul. My research has

been primarily supported by the Australian Government under ARC SPIRT C00002232.

I also thank the Institute for Telecommunications Research for their financial support.

I thank the South Australian Section of the IEEE for providing the financial assis-

tance that allowed me to attend Globecom 2002 in Taiwan.

It has been a pleasure to collaborate with Dr. John Buetefuer during my candida-

ture. Thankyou John, for some very useful discussions about LDPC codes, and for some

motivational advice during my time writing up.

The next time you enjoy a concert, after applauding the band, go and thank the

person behind the sound mixing desk. We usually take this person for granted. The

same can often be said about good network administrators. I wish to sincerely thank

Bill Cooper for maintaining a reliable network at ITR. In particular I would like to

acknowledge the help that he has provided with my perpetually problematic laptop.

Cheers Governor.

I thank all of the other friendly ITR staff with whom I have had the chance to work

with, in either a research or development role.

I cannot thank Prof. Christian Schlegel enough for hosting me at the University

of Utah in 2002, and again last year at the University of Alberta. Thankyou for your

xviii

generous support and supervision. I also thank Professors Chris Myers and Reid Harrison

for their supervision during my stay in Utah.

I thank Prof. Vincent Gaudet and the rest of the HCDC lab in Alberta for their

support and for being such a great group of people. Thanks Charmaine for being so

good at organising everything. Thankyou Vince, Tony, Anne, Sheryl and Walt for the

great times in Edmonton and France last year. I look forward to collaborating with you

again in the future. I also thank Canadian Microelectronics Corporation for providing

the device models that were used to simulate the circuits described in this thesis.

The majority of knowledge that I have about analog decoding, I have learned from

Chris Winstead. His thorough understanding of this field (and many others) comes

without any hint of arrogance. I consider myself privileged to have had the opportunity

to collaborate with him, and look forward to future collaborations. Thankyou Chris for

the fun times in North America, Europe and down here in Australia. Thanks also for

proof reading sections of this thesis. As we both now come to the conclusion of our

postgraduate roles “The Aussie” wishes you and Erin all of the success and happiness

that you so richly deserve.

I thank Dr. Pascal Vontobel for his insightful and valuable suggestions regarding the

finite graph-cover analysis presented in this thesis. Thankyou Pascal for being so friendly

and helpful. Thankyou also for proofreading sections of this thesis.

I thank my friends for putting up with my sleep deprived, preoccupied, and rare

company over the last six months.

I wish to thank my parents for never getting mad at me when I pulled things apart

to find out what made them work, even in the cases when they didn’t quite make it back

together again. Your emphasis on my efforts, regardless of the results, founded the drive

that I have today. From the first time I connected a light bulb to a battery, through my

teenage interests in chemistry1, and now to the proofreading of this thesis. Your support

and love have always been there and will always be returned.

I come now to the hardest part of all. The part where I must try to sum up in

mere words, all of the appreciation that I have for my wife, her ever constant support,

encouragement and love. I simply haven’t worked out how to do it. It is a much harder

task than any problem addressed by this thesis. Jane, like everything that I have achieved

since we met, and may achieve in the future, this book is as much yours as it is mine. It

simply wouldn’t exist without you, and I don’t think that I could either. I love you.

1Sorry about blowing up the garden.

xix

Chapter 1

Introduction

In recent times mobile (cell) phones have become smaller, more reliable, and more afford-

able. Modern communication systems are now able to offer a wide range of features, and

this has changed the way many of us live our daily lives. For example, it is now common

for friends to communicate by sending simple text messages, or even high resolution pho-

tographs. As a result, many school teachers have been forced to ban the use of mobile

phones in the classroom. With this in mind, we now consider the following example in

which two students wish to communicate, without the use of their mobile phone.

Graeme wishes to send a message to his friend Tim, who is on the other side of the

room. Their teacher does not appreciate such interruptions, so Graeme carefully chooses

a short message. They do not want the other classmates to know what they are talking

about. Therefore, they use a predetermined method to jumble the ordering of words in

the message, so that it no longer makes sense. Graeme tells the person sitting next to

him the message, and then they pass it on, until it reaches Tim. A boy who is known to

cause mischief in such situations, Bill, is sitting in the same row. They expect that Bill

may try to disrupt the communication by switching one of the words in the message for

something completely different. However, the friends have previously agreed on a way

to handle Bill. Once the short, jumbled message has been prepared, Graeme tells it to

the person sitting next to him. In order to combat the presence of Bill, he repeats each

word. As Tim receives the message he checks that each word appears twice. If not then

he knows that Bill has altered the message, and he shakes his head in view of Graeme.

Otherwise, he waves to Graeme to signal that the message was received intact.

The above example demonstrates some simple techniques that assist in the secure

and reliable communication of information. However, weaknesses in Graeme and Tim’s

approach are easily identified. For example, if Bill modifies a word then Tim is able to

1

Chapter 1. Introduction

detect that an error has been introduced but he cannot correct it. He signals to Graeme,

and the entire message must be resent. Moreover, if Bill modifies two words then the

error may pass undetected. By sending each word three times they could combat this.

However, such an approach is hardly efficient, and they run a higher risk of being caught

by the teacher.

Outside of the classroom, if Graeme needs to send Tim some information he may do

so using a text message from his mobile phone. The challenges of providing fast, secure

and reliable communication are no longer his concern but rather that of the communi-

cation systems engineer. In this thesis we focus upon the reliable transfer and storage

of data through the use of error control codes. These codes appear in many applications

that we take for granted in our daily lives. In addition to increasing the reliability of

modern communication networks, error control codes are also used in digital data stor-

age systems, e.g. compact disc (CD) and digital versatile disc (DVD) players, and hard

disk drive systems. In this chapter we introduce the general model of a modern commu-

nication system, outline the properties of a good error control scheme, and provide an

overview of this thesis.

1.1 The Digital Communication System

The basic model that we use to describe the components of a digital communication

system is shown in Figure 1.1. A textbook introduction to this model can be found in [1].

This model shows only one direction of communication. It is common for communication

to occur in both directions, on either a full-duplex (simultaneous) or half-duplex (time

multiplexed) basis. Hence we consider each component and its inverse operation as a

functional pair. In this dissertation we focus specifically upon the channel encoder and

decoder, or codec.

The overall aim of the system is to ensure that the data at the sink matches that at

the source. The source and sink may be in different physical locations, e.g. a satellite

and ground station, or the same location at different times e.g. a hard disk drive. We

now describe the functional components that are used to meet this aim in an efficient,

secure and reliable manner.

2


Modem

Channel

Encoder

Data

Source

Source

Encoder

Transmitter/

Modulator

Codec

Data

Sink

Source

DecoderDecryption

Channel

Decoder

Source

Sink

Encryption

Noisy

Channel

Receiver/

Demodulator

u

u

x

y

Figure 1.1: Digital communication system model.

1.1.1 Source Coding

The source encoder is responsible for compressing the source data by removing uncon-

trolled redundancy. The decoder at the sink then applies the inverse algorithm to regen-

erate the original message. Data compression is achieved by making the distribution of

the source information as uniform as possible.

1.1.2 Encryption

In order to provide a secure transmission we may apply an encryption algorithm at the

source, and then decrypt upon reception. The aim of this stage is to ensure that the

transmitted message can be interpreted only by the intended recipient. In this thesis we

do not focus upon the components discussed to this point. Hence we consider the data

source, source encoder and encryption components to be lumped together and label this

larger component the source. We consider this source to present an information vector,

u, to the following stage of the system. Moreover, in this dissertation we focus upon the

case when u takes values from the binary Galois field, F2.

1.1.3 Channel Coding

It is the task of the channel coding scheme to provide detection and correction of errors

introduced during transmission.

Error detection schemes, such as the cyclic redundancy check (CRC) [1], inform the

receiver that an error has occurred during transmission. This is done by generating a

3


checksum value based on the data to be transmitted. The checksum is appended to the

message. The receiver then uses the received data to generate its own checksum according

to the same algorithm as the sender. If this checksum does not match the one received

from the sender then an error is declared. The CRC adds very little overhead in terms

of both transmission size and processing.

A forward error correction (FEC) scheme allows us not only to detect corrupted data

but to repair it. In contrast to error detection algorithms, which require the corrupted

message to be resent, FEC schemes protect the data prior to transmission. Increased

reliability is provided by controlled redundancy that is appended to the message. We

denote the ratio of the source information to the total size of the transmission inclusive

of redundancy, as the rate of information transfer.

The encoder maps the information vector, u, onto a codeword vector, x. In this thesis

we consider binary codes, such that the elements of x come from F2. The codeword is

transmitted across the channel, resulting in the received vector y. The alphabet used

for the elements of y is dependent upon the channel and demodulator. Considering the

received vector and the set of all valid codewords, the decoder then produces an estimate

of the transmitted codeword x, with elements coming from F2. From this estimate, we

obtain the estimate of the transmitted information, u.

The maximum-likelihood (ML) decoder selects the estimate x that maximises the

probability p(y|x). The maximum a posteriori (MAP) rule selects x such that p(x|y) is

maximised. Recall that it is the goal of the source encoder to make the distribution of

the source information as uniform as possible. In the case of a uniform distribution of

source information, the ML and MAP decision rules are identical.

We distinguish between two possible types of error event. If the decoder arrives at

an estimate x which is not a valid codeword, then we know that an error has occurred,

without having explicit knowledge of the transmitted codeword. Hence we label this

a detected error event. However, we cannot make such a deduction when the decoder

estimate represents a different valid codeword to x. We therefore label this an undetected

error event.

The performance of an error control code is measured in terms of the probability of

error for transmission at a given signal to noise ratio (SNR). We define the probability

of bit error, or bit error rate (BER), as the ratio of the number of erroneous bits at the

sink to the total number of bits transmitted.

Decoder implementation can often be simplified if we map probabilities into the log

4


domain [2]. Multiplication operations in the probability domain map to addition oper-

ations in the log domain, thus reducing digital implementation complexity. A common

representation is the log-likelihood ratio (LLR), for which we have the following definition.

Definition 1.1 (Log-Likelihood Ratio). Consider the binary random variable X with

probability mass function1 pX(x). The log-likelihood ratio for X is defined as

λ = loge

(

pX(0)

pX(1)

)

.

1.1.4 Modulation

We modulate elements from the codeword, also called codeword symbols, in order to

transmit them across a physical channel. Many modulation schemes are available for use

in a modern communication system, each presenting a tradeoff involving parameters such

as bandwidth requirement and power consumption. In this thesis we consider only the

binary phase shift keyed (BPSK) approach2 [1, 3]. Using BPSK modulation the binary

elements x from x each undergo an antipodal mapping, M(x), onto a physical signal as

follows.

M(0) = s0(t) =

√

2Es

Ts

cos(2πfct) 0 ≤ t ≤ Ts

M(1) = s1(t) =

√

2Es

Ts

cos(2πfct + π) 0 ≤ t ≤ Ts

Here the signal s1(t), representing the case x = 1, is π radians out of phase with the

signal s0(t), which represents x = 0. The spectrum of the transmitted signal is centred

about the frequency fc. Each symbol is transmitted over a period of time Ts, with symbol

energy Es. One bit, of either information or redundancy, is transmitted during this

period. The received signal is then demodulated, in order to reverse the BPSK mapping.

The output of the demodulator is a real number, or member of a discrete alphabet,

representing the transmitted bit value.

1.1.5 The Channel

The channel is the medium over which we transmit. The real world channel may atten-

uate, amplify, distort and/or contaminate the transmission with noise. There are many

1Here the probability mass function pX(x) is an abbreviated form of p(X = x), representing theprobability of the random variable X taking the particular value x.

2Note however that many of the codes that are studied in this thesis may be used in conjunction withmore general modulation schemes.

5


causes of noise in the modern communication environment. Hence the topic of channel

modelling alone represents an entire field of research.

In the late 1940s Claude Shannon introduced the concept of channel capacity [4]. He

showed that error control codes exist which allow transmission across the channel with

arbitrary low probability of error, when the rate of information transfer that is less than

capacity. This limit, which we now term the Shannon bound, tells us that such codes

exist, however it does not indicate how they are to be constructed. Shannon’s discovery

presents a challenge to communication systems engineers, to develop codes and decoding

algorithms that can meet its predictions.

For simplicity of analysis, in this dissertation we consider only the channel models

that follow. We assume that these channels are memoryless, in the sense that there is no

correlation between the noise applied to individual symbols.

The Additive White Gaussian Noise Channel

We assume a BPSK transmission and consider the discrete time additive white Gaussian

noise (AWGN) channel [1, 3] with binary input and real output. In the discrete time

domain we consider each binary bit x from x to be represented by a value x ∈ {±√Es},

where +√

Es and −√Es correspond to x = 0 and x = 1 respectively. The received value

corresponding to the bit is y = x + ρ. Here the random variable ρ represents normally

distributed noise with zero mean and variance σ2. The noise source is assumed to have

a single sided spectral density, N0 = 2σ2, which is independent of frequency. From the

Gaussian probability density function, we have

p(y|x) =1√

2πσ2e−(y−x)2/2σ2

.

For the purposes of empirical simulation we set Es = 1 and adjust the signal to

noise ratio by altering the noise variance. In order to compare two error control systems

that operate at a different rate of information transfer, we adjust the symbol energy

according to the rate r, to get the bit energy Eb = Es/r. The signal to noise ratio

is then calculated as the (dimensionless) ratio of received bit energy to noise spectral

density, Eb/N0 = Es/2σ2r. It is common to report the SNR in decibels, according to

Eb/N0(dB)=10 log10(Eb/N0).

6


The Binary Erasure Channel

The second type of channel that we consider in this dissertation is the binary erasure chan-

nel (BEC). This channel has input alphabet {−1, +1} and output alphabet {−1, 0, +1}.We again assume a BPSK transmission, where bit x from x is mapped according to

M(0) = +1, M(1) = −1 and presented at the channel input. The channel either leaves

x unaltered, or marks it as an erasure, to which we assign the value zero. We denote

the probability that x is erased by the channel as ǫ. This channel erasure probability is

assumed to be symmetric, i.e. independent of the transmitted value of x.

1.2 Codec Design Criteria

We now list some design criteria to be considered by the communication systems engineer,

when developing an error control codec.

Good Error Correcting Performance We aim to have the code provide a strong

error correcting capability. It is common to simulate operation of the error control

system and obtain empirical BER performance results, prior to implementation.

Flexibility of Code Design The code design approach should allow flexible choice of

parameters such as code length, i.e. the length of the codeword vector, and code

rate, i.e. the ratio of the length of the information vector to the code length.

Simplicity of Design It is desirable for the code to be easy to construct and repre-

sentable in a concise form.

High Speed The computational latency for both the decode and encode operations

should be low.

Low Power Consumption It is desirable that the codec is power efficient, especially

if it is to form part of a battery powered device.

Small Area It is desirable that a circuit implementation of the codec occupy only a

small area of silicon, especially if it is to form part of a portable device.

Ease of Verification The design and implementation should be easy to test.

The importance of each of the above parameters in the overall design will depend

upon the application. For example, small implementation size is more significant for

7


mobile handsets than for base stations. Throughput is a major motivation for many

applications, such as data storage and broadband communication. For some applications,

such as satellite transmission and portable devices, low power consumption is important

as it provides longer battery life and/or the requirement of a battery with lower weight.

Biomedical applications, such as implantable devices, call for very small and low power

circuits. The designer may choose to tradeoff one parameter for another, and the decisions

made will be motivated by the demands of the application.

1.3 Overview of Thesis Structure

This thesis may be considered to contain two components. In Chapters 2 through to 5 we

discuss coding theory aspects of the work, developing a new approach to codec design and

a new class of low-density parity-check codes. We then investigate practical approaches

to codec circuit implementation in Chapters 6 and 7. The remainder of this dissertation

is organised as follows.

Chapter 2 provides a review of the current state of the literature pertaining to error

control using low-density parity-check codes. We provide an overview of approaches to

code construction, graphical representations, and analytical tools. Existing methods for

encoding and decoding LDPC codes are also presented.

In Chapter 3 we explore new techniques for iterative encoding of LDPC codes which

allow the decoder architecture to be reused for encoding. A novel encoding algorithm is

presented which is based upon the Jacobi method for iterative matrix inversion over F2.

We define a convergence criterion for the algorithm and show how it can be viewed in

message-passing terms. We label any LDPC code that is encodable using the Jacobi

encoder as a reversible LDPC code, and show that any code with a triangular parity-

check matrix is reversible. An algorithm is presented for the algebraic construction of

4-cycle free reversible LDPC codes using circulant matrices. We label these codes type-I

reversible LDPC codes. The construction algorithm provides some flexibility in the choice

of code length and rate.

In Chapter 4 we present the empirical performance and a thorough analysis of some

type-I reversible codes. We investigate the performance of (3,6)-regular codes on both

the AWGN and binary erasure channels. We characterise and explain the empirical

observations, using current analytical tools. A weakness in the graphical structure of

these (3,6)-regular codes is identified, which leads to an error floor effect as we increase

8


code length.

In Chapter 5 we show that type-I reversible codes may be constructed which offer

good performance for high rate applications. We use results from the analysis in Chapter 4

to provide a design metric for developing a new class of reversible LDPC codes which

offer improved performance. A recursive algorithm is presented for constructing type-II

reversible LDPC codes which are 4-cycle free and are iteratively encodable using eight

iterations of the Jacobi encoder. This algorithm also provides greater flexibility for the

choice of code length and rate than the type-I approach. We demonstrate that (3,6)-

regular type-II reversible codes may be constructed that offer good empirical performance.

Since the proposal that analog circuits be used to build iterative decoders, interest

has begun to develop in this area. Analog Decoder Workshops have been held on an

annual basis for the last three years, with an increased participation and an impressive

progression of results being presented each year. In Chapter 6 we review the current

state of the literature in this area, and highlight some of the potential advantages that

analog decoder implementation can offer in comparison to digital implementation.

In Chapter 7 we extend the analog decoder so that it can also be used to perform

encoding. A novel circuit architecture is presented for the core of a reversible LDPC codec.

We implement a time multiplexed architecture which switches between analog decode and

digital encode modes. The encode operation is two orders of magnitude faster than the

decode operation, and hence the circuit is well suited to use in full-duplex communication

systems. In order to achieve this we introduce a new type of logic gate circuit, namely

the mode-switching gate, which is able to switch between digital and analog operation.

Encoding is performed using the Jacobi encoder presented in Chapter 3, requiring only

a small amount of circuit overhead.

In Chapter 8 we summarise the work presented in this dissertation, and discuss the

proposed reversible LPDC codec architecture in terms of the codec design criteria listed

in this introductory chapter. We also present some suggestions for further work.

1.4 Contributions of This Thesis

The following list summarises the main contributions made by this thesis.

Chapter 3

• Theorem 3.1 shows that we can use the sum-product decoder to iteratively

encode certain types of low-density parity-check codes.

9


• Theorem 3.2 provides a convergence criterion for the Jacobi iterative matrix

inversion method for matrices which have elements from F2.

• A new method for iteratively encoding LDPC codes is presented in Algo-

rithm 3.1. The algorithm employs the Jacobi method for iterative matrix

inversion over F2 and allows reuse of the decoder architecture for encoding.

• Theorem 3.5 provides an algebraic method for testing the existence of 4-cycles

in the factor graph corresponding to a circulant matrix, given its first row

polynomial and size.

• Algorithm 3.2 provides a method for algebraically constructing reversible LDPC

codes using circulant matrices. These 4-cycle free codes are suitable for high

rate applications, and are encodable using the iterative Jacobi method over F2.

Chapter 5

• Algorithm 3.2 provides a recursive construction method for reversible LDPC

codes which are 4-cycle free, and are encodable using eight iterations of the

iterative Jacobi method over F2. The algorithm is capable of, but not limited

to, generating (3,6)-regular codes.

Chapter 7

• In Section 7.2 a fundamental circuit design contribution is made, namely the

mode-switching gate. These logic gates are able to switch between analog (soft)

and digital (hard) computation. This contribution is the result of collabora-

tive work with the High Capacity Digital Communications Laboratory at the

University of Alberta.

• A novel circuit architecture for the core of a reversible LDPC codec is presented

in Chapter 7. This circuit results from collaborations with the Electrical Engi-

neering Department at the University of Utah, and the High Capacity Digital

Communications Laboratory at the University of Alberta.

10

Chapter 2

Low-Density Parity-Check Codes

2.1 Historical Summary

Just over a decade after Shannon’s founding work was published [4], Robert Gallager

introduced low-density parity-check (LDPC) codes [5, 6]. His approach would eventually

form the basis for a class of codes which perform extremely close to the Shannon bound.

Gallager’s codes, and iterative decoding algorithms, were however initially overlooked by

the coding community. That era was not one in which every researcher had the benefit

of a powerful computer at their fingertips, hence his work was not developed further.

Moreover, with the benefit of hindsight, Gallager’s approach challenged the paradigm of

coding at a time when the community was focussed upon minimum distance. It was not

until some three decades later, in the mid 1990s, that the true potential of low-density

parity-check codes was rediscovered.

A small number of researchers continued to work with Gallager’s codes during the

few decades after publication of his thesis, e.g. see [7, 8]. In the early 1980s Tanner

provided a graphical representation of LDPC and other coding schemes, now commonly

known as the Tanner graph [9]. He proposed that the decoding problem be approached by

factoring the codes into simpler component codes, and introduced what is now known as

the min-sum decoding algorithm. Tanner also foresaw the huge potential of parallelism,

allowing the simpler component codes to be processed simultaneously due to their low

complexity.

In the early 1990s, research in the area of channel coding became very popular due

to the introduction of turbo codes by Berrou et al. [10]. However, the initial develop-

ment of turbo codes was undertaken with very little connection being made to graphical

representations. The iterative algorithms used for decoding turbo codes have since been

11

Chapter 2. Low-Density Parity-Check Codes

linked to the principles of belief propagation described by Pearl [11], as have the algo-

rithms proposed by Gallager [12, 13]. Moreover, the structure of a turbo code is such

that it may be viewed as an LDPC code [14]. The attention drawn to iterative decoding

techniques through turbo coding made it almost inevitable that the work of Gallager

would eventually be revisited. In the mid 1990s LPDC codes were independently redis-

covered [15–17]. Shortly after, a surge of papers appeared in this area (see [18] and the

references therein).

Tanner’s work, and further developments from Wiberg et al. [15, 19], formed the

basis for the now commonly used factor graph representation of codes. These graphs can

be used to represent a wide range of algorithms and structures [20].

Several approaches to improving low-density parity-check codes have been proposed

since their rediscovery. Some of these codes, notably in the case of very long block sizes

(around 105 to 107 bits), are able to achieve performance within hundredths of a decibel

of the Shannon bound [21–23].

Most recently, there have been some developments in the area of analysing iterative

decoding on the graphs of finite length codes [24–28].

2.2 Linear Block Codes

Low-density parity-check codes fall into the class of linear block codes. We define the

following properties for binary linear block codes.

Source Information We partition a binary source sequence into row vectors of length

k prior to encoding. We label such a vector u.

Encoding The encoding process is undertaken on a block-wise basis, mapping u onto a

row vector codeword x. An (n, k) code has codeword length n.

Systematic Code A code in which the information vector appears as part of the code-

word is systematic. We label the information segment of the codeword xu = u, and

the length n − k appended parity bits xp.

Linearity Property The code C is a binary linear code if and only if C forms a vector

subspace over F2. Hence, the sum of any two codewords must itself be a codeword.

Code Dimension The dimension of the code is the dimension of its corresponding

vector space. A binary (n, k) code has dimension k, and thus has a total of 2k

12


codewords, each of length n.

All-Zero Codeword A direct consequence of the linearity property is that the all-zero

codeword, x = 0, is a member of every linear code.

Code Rate The rate of the code is r = k/n. The code rate reflects the proportion of

information transferred per channel use.

Generator Matrix The linearity property implies the existence of a basis for the code.

We may construct a generator matrix for the code, denoted G, by using the k

independent basis codewords as row vectors for G. The generator matrix has

dimension k×n and can be used to encode the information vector. We may consider

the systematic form Gsys = [P|Ik], such that the codeword parity bits prefix the

information bits. Here [P|Ik] denotes the concatenation of the k×k identity matrix

to P. Encoding is then performed according to x = [xp|xu] = uG. The generator

matrix describes the structure of the code, however it is not uniquely defined for

the code.

Parity-Check Matrix Another means of describing the code structure is provided by

the parity-check matrix, denoted H. In general, H has dimension m × n, where

m = n − k. All codewords must satisfy the condition Hx⊤ = 0, i.e. the set of

all codewords span the nullspace of H. The parity-check matrix is not uniquely

defined for the code, and is related to G by HG⊤ = 0. The systematic form of H

may be obtained from G, and vice versa, using Hsys = [Im|P⊤].

Weight Distribution We define the weight of a vector x, w (x), as the number of

nonzero elements that it contains. The weight distribution of a code is a list of the

number of codewords at each weight, for all codeword weights.

Hamming Distance The Hamming distance between two codewords is the number of

positions in which they differ, i.e. the weight of their binary sum.

Minimum Distance The minimum distance, dmin, is the smallest Hamming distance

between any two codewords in the codeword set. As the all-zero codeword is a

member of every linear code, the minimum distance of a linear code is equivalent

to the weight of the lowest weight codeword. In general, a code with minimum

distance dmin can correct ⌊(dmin − 1)/2⌋ errors1.

1The floor function, ⌊x⌋, returns the largest integer less than or equal to x

13


As the rows of the generator matrix are codewords, the lowest weight row represents

a simple upper bound on minimum distance. The following theorem relates dmin to the

structure of the code, through its parity-check matrix [29].

Theorem 2.1 (Massey). The minimum distance of a binary linear block code is equal

to the minimum nonzero number of columns in its parity-check matrix which sum to zero.

A general description of linear block codes is provided in [1]. Specific information

relating to LDPC codes is given in [30, 31].

2.3 Low-Density Parity-Check Codes

In general terms a low-density parity-check (LDPC) code is a linear block code which has

a sparsely populated parity-check matrix. In the remainder of this chapter, we will review

the development of LDPC codes, from the codes first presented by Gallager through to

recent capacity approaching structures.

As an example, we now consider the parity-check matrix H of a simple code (c.f. [20]).

H =

1 0 1 0 1 0 1 01 0 0 1 0 1 0 10 1 1 0 0 1 1 00 1 0 1 1 0 0 1

(2.1)

We assign each codeword bit to a variable, vs : s ∈ {1 . . . n}, where each variable also

corresponds to a column of H. Each row of H then represents a parity-check constraint

of the code, cr : r ∈ {1 . . . m}. The participation of variable vs in check constraint cr

is implied by a nonzero element at position (r, s) in H. Variable and check nodes can

also have values associated with them. We use the symbols vs and cr both to label these

nodes and to represent their value. We denote the number of nonzero elements in a row,

or column, as its weight.

Each row of H represents a parity-check constraint. Hence the full set of constraints,

assuming binary arithmetic, follows. If the values assigned to the set of variables represent

a valid codeword then cr = 0 for all r ∈ {1 . . . m}.

v1 + v3 + v5 + v7 = c1 (2.2)

v1 + v4 + v6 + v8 = c2 (2.3)

v2 + v3 + v6 + v7 = c3 (2.4)

v2 + v4 + v5 + v8 = c4 (2.5)

14


Gallager defined a regular structure for low-density parity-check codes as follows.

Definition 2.1 (Regular LDPC Code). A regular LDPC code with parameters (n, j, i)

is a block length n code having a parity-check matrix with column and row weights of

exactly j ones per column and i ones per row.

In contrast to the above, irregular LDPC codes permit variables to participate in

different numbers of checks, and allow check constraints to apply to different numbers

of variables. An irregular LDPC code can be described by the row and column vector

weight distributions of its parity check matrix. Techniques for designing row and column

weight distributions which provide significantly improved performance over regular codes

have been proposed [21–23, 32], and are discussed in Section 2.9.1.

We note that the example code in (2.1) has a regular (8, 2, 4) structure. It is also

common to label regular LDPC codes without specifying the block length, e.g. (2,4)-

regular. As we shall see in Section 2.5, it is the sparsity property of the parity-check

matrix which allows efficient decoding of these codes.

There are many ways to build a (j, i)-regular parity-check matrix. A common ap-

proach is to concatenate or superpose random permutation matrices, i.e. the identity

matrix with randomly permuted columns [5, 14]. In Figure 2.1 we use the notation

of [14], where a circled number represents that number of (non-overlapping) superposed

random permutation matrices. Figure 2.1(a) shows an example (3,6)-regular construc-

tion. Figure 2.1(b) shows how random permutation matrices may be concatenated to

build a (4,8)-regular code. Figure 2.1(c) shows how we may also view the superposed

construction as a random permutation of edge connections. A total of n left nodes of

degree j represent variables, and m right nodes of degree i represent check constraints.

Graphical code representations are discussed in more detail below.

If the rows of H are linearly independent then the rate of the code is r = (i − j)/i,

otherwise the rate is r = (n − l)/n, where l is the dimension of the row space of H over

the binary field which is used here.

The following two important results exist, for randomly generated regular LDPC

codes having column weight j ≥ 3 [5, 14]. Firstly, the minimum distance of such a code

increases linearly with block length. Secondly, if decoded by an optimal decoder, and

if the (nonzero) code rate is below some maximum rate, such a code will achieve an

arbitrarily low probability of error in the limit as block length tends to infinity.

15


3 3

(a) Superposed (MacKay 1A)

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

(b) Concatenated

....

....Π

(c) Edge permutation

Figure 2.1: (3,6)-regular LDPC construction using random permutation matrices.

2.4 Factor Graph Representation of LDPC Codes

We use the term compound code to classify any code which can be broken down into

constituent subcodes, e.g. an LDPC code [33]. The resurge of interest into compound

code structures and iterative decoding algorithms has brought with it new representations

for codes on graphs. These provide a useful tool for code design, and assist in the analysis

of decoding algorithms.

Consider a global function, i.e. a function of all n variables, g(x1, . . . , xn). Assume

that this global function factors into a product of local functions, fr(Xr). Here r ∈ R is

the local function index and Xr represents the subset of variables which participate in fr.

Hence we have

g(x1, . . . , xn) =∏

r∈R

fr(Xr). (2.6)

We may describe this factorisation using the following graphical model [20].

Definition 2.2 (Factor Graph). A factor graph is a bipartite graph which represents

the factorisation described in (2.6). The graph consists of variable and function nodes.

A variable node exists for each variable xs such that s ∈ {1 . . . n}. A function node exists

for each local function fr. An edge exists in the graph between the variable node xs and

function node fr if and only if xs ∈ Xr.

As a simple example, we consider a real valued global function of three variables,

g(x1, x2, x3), that can be written as a product of two local functions f1 and f2, as follows.

g(x1, x2, x3) = f1(x1, x2)f2(x1, x2, x3) (2.7)

The factor graph representation of (2.7), is shown in Figure 2.2. We represent variable

nodes and function nodes using circles and boxes respectively. Variable node xs, for

16


f1 f2

x1 x2 x3

Figure 2.2: Factor graph representation of the global function g(x1, x2, x3).

s ∈ {1, 2, 3}, is connected via an edge to the function node fr, for r ∈ {1, 2}, if and only

if xs is an argument of fr.

The bipartite factor graph structure is very well suited to representing LDPC codes.

Function nodes represent parity-check constraints of the code, and are therefore also

called check nodes. The (2,4)-regular structure described by (2.1) is mapped onto the

factor graph shown in Figure 2.3. The matrix H specifies the (bi-directional) edges of the

graph according to the position of its nonzero elements. Here the n = 8 variable nodes,

each having degree j = 2, represent codeword symbols. The m = 4 check nodes, each

having degree i = 4, represent parity-checks.

v1

v2

v3

v4

v5

v6

v7

v8

c1

c2

c3

c4

Figure 2.3: Factor graph representation of a (2,4)-regular code.

17


Consider a code C of length n with parity-check matrix H. Recall that a valid

codeword x satisfies Hx = 0, and that each row of H represents an individual parity-

check constraint. We assign a binary local indicator function to each check constraint.

The local function associated with check cr is satisfied when all variables participating

in the check, i.e. the set Xr, sum to zero over F2, i.e. [∑

xs∈Xrxs = 0]. Here we use

Iverson’s convention [34] to indicate the truth of a statement S. If S is true then [S] = 1,

otherwise [S] = 0. Hence, when the local function is satisfied it assumes the value 1,

otherwise it has value 0. The product of these local indicator functions is then called

the global indicator function. The vector x is a valid codeword, corresponding to a valid

configuration on the factor graph, if and only if it satisfies the global indicator function

of the code. Therefore the global function is also called the code set membership function

and denoted [(x1, . . . , xn) ∈ C] for a code C of length n. Consider the simple example

code given in (2.1) with constraint set listed in (2.5). Using arithmetic over F2, the global

indicator function for this code is given by

[(x1, . . . , x8) ∈ C] = [x1 + x3 + x5 + x7 = 0] · [x1 + x4 + x6 + x8 = 0]

· [x2 + x3 + x6 + x7 = 0] · [x2 + x4 + x5 + x8 = 0]. (2.8)

Tanner first introduced what is now known as the factor graph representation of

Gallager’s codes [9]. He also foresaw the parallel processing potential of breaking the long

codes into smaller constituent components for decoding. Wiberg, Loeliger and Kotter [15,

19] added state variable representation to the Tanner graph, allowing representation of

turbo and trellis codes2. Applications of factor graphs extend well beyond the field of

error control coding, and they may be used to describe many other algorithms, such as

the Kalman filter and certain fast Fourier transform (FFT) algorithms [20].

2.5 Iterative Decoding of LDPC Codes

The transmitted codeword x is subject to noise from the channel, as described in Sec-

tion 1.1.5. The decoder provides an estimate x of the transmitted information, based

upon code constraints, and the received vector y. In this section we review some decoding

approaches which employ iterative message-passing on the code’s factor graph. Message

values are computed at each node based upon the local code constraints. The sparse

nature of H admits low complexity computation.

2Factor graphs are sometimes referred to in the literature as Tanner graphs and/or TWLK graphs.

18


2.5.1 The Sum-Product Algorithm

Soft decision decoding algorithms make use of the received soft information. Considering

the channel in use, a vector of prior values is generated from y and input to the decoder.

When the decoding process is complete, a vector of posterior values is returned. Enforcing

a hard decision on this vector then yields x.

The sum-product algorithm can be used to perform soft decision decoding on the

factor graph, by considering each node as a local processor [15, 20]. The bi-directional

edges are used to carry messages between processors, during each iterative step. In

this section we limit our discussion of the sum-product algorithm specifically to the

application of decoding LDPC codes. More general descriptions of the algorithm are

provided in [15, 20].

The messages that are passed during decoding are binary probability mass functions,

or some representation thereof, e.g. a log-likelihood ratio. Each message µns→nrrepre-

sents a belief being passed from the source node ns to the receiving node nr, based upon

the constraints of the code. Hence the process is also referred to as belief propagation.

The value of each received symbol, ys, is used to calculate the prior message, ps,

for all s ∈ {1 . . . n}. In the probability domain, the prior message represents a proba-

bility mass function such that ps(0) = p(xs = 0|ys) and ps(1) = p(xs = 1|ys). All other

messages internal to the decoder are initialised to µ(0) = µ(1) = 0.5. We may decrease

computational complexity with only a small loss in performance, by using low precision

messages rather than real numbers [35].

As discussed in Section 2.4, the factor graph represents the factorisation of a global

function which enforces the code set membership constraint. Each check node enforces a

local exclusive-or (XOR) constraint, stating that the overall parity of messages entering

the node should be zero. Each variable enforces a local equality constraint on messages

entering the node, stating that they should all indicate the same value for the variable.

These constraints dictate the computation that occurs at the nodes, and hence the mes-

sages that result at the node outputs. As the messages represent soft probability values,

we use soft-logic gates [3, 20] to enforce the constraints. Moreover, the nodes calculate

an output message for each bi-directional edge. Initially we consider a 3-edge node,

describing the computation which generates an output message for one edge only, and

then extend this to the bi-directional case. We calculate the output message, i.e. binary

probability mass function, pZ(z) = (pZ(0), pZ(1)) using input messages pX(x) and pY (y),

19


which are assumed to be independent3.

Definition 2.3 (Soft-Logic Gate (SG)). A soft-logic gate is a single output, dual

input device. The inputs, pX(x) and pY (y), and output, pZ(z), represent the probability

mass of a binary random variable. The output is calculated according to some function

pZ(z) = f(pX(x), pY (y)) which returns a value in the range [0,1].

Two types of soft-logic gate are required to build a sum-product decoder, namely

the soft-XOR and soft-equal gate. These gates are used to construct check and variable

nodes respectively.

Definition 2.4 (Soft-XOR Gate). The soft-XOR gate calculates the output probability

mass pZ(z), using the inputs pX(x) and pY (y), as follows.

[

pZ(0)pZ(1)

]

=

[

pX(0)pY (0) + pX(1)pY (1)pX(0)pY (1) + pX(1)pY (0)

]

(2.9)

Definition 2.5 (Soft-Equal Gate). The soft-equal gate calculates the output probability

mass pZ(z) according to the following expression, where the normalisation factor γ is

chosen to ensure pZ(0) + pZ(1) = 1.

[

pZ(0)pZ(1)

]

= γ

[

pX(0)pY (0)pX(1)pY (1)

]

(2.10)

Figure 2.4 shows how we may use these gates to build a bi-directional gate, for which

we have the following definition.

Definition 2.6 (Bi-directional Soft-Logic Gate (BSG)). The bi-directional soft-

logic gate is a 3-port device for which each port has an output and input. The out-

put from each port is calculated using the two extrinsic edge inputs, according to the

same function, i.e. pX(x)out = f(pY (y)in, pZ(z)in), pY (y)out = f(pX(x)in, pZ(z)in), and

pZ(z)out = f(pX(x)in, pY (y)in). Hence a BSG may be constructed from three soft gates,

by aligning the output of one to each BSG output edge.

As a result of Forney’s normal graph realisations [36], we may break down the struc-

ture of a multiple edged node into constituent components. Variable and check nodes of

3Note that we have temporarily reused x and y for the soft-logic gate descriptions, and they do nothave their usual meaning, i.e. representing codeword and received symbols respectively. Here x, y, andz represent particular values for the random variables X, Y , and Z respectively.

20


pX(x)

pY (y)

pZ(z)f

(a) Single output soft gate.

pX(x)in

pX(x)out

pZ(z)out

pZ(z)in

pY (y)in pY (y)out

f

f

f

(b) Bi-directional 3-port soft gate.

Figure 2.4: Soft gates.

arbitrary size may be constructed by cascading bi-directional soft-equal and soft-XOR

gates respectively [3, 20]. This is discussed in more detail in Section 6.5.3.

Once the variable and check nodes have been constructed, we connect them using

bi-directional edges, according to the structure of the factor graph. We define Γ (vs)

as the set of all check nodes adjacent to variable node vs, specified by column s of H.

Similarly, the set of all variable nodes adjacent to check node cr, Γ (cr), is specified by row

r of H. Variable-check messages are denoted µvs→crand check-variable messages µcr→vs

.

We let Γ (vs) \c denote all checks Γ (vs) excluding check c, and similarly define Γ (cr) \v.

A single direction input edge, used to carry the prior message, ps, is appended to each

variable node in the factor graph. Similarly, a single direction output edge is appended

to carry the posterior message, qs.

There are many possible schedules for passing messages around the graph. In this

work we assume a flooding schedule [33], such that all messages µvs→crare passed, followed

by all messages µcr→vs, in a single iterative step.

The posterior value, qs, is generated for each codeword symbol, at the end of each

iteration. A hard decision is then performed on these values, to generate the codeword

estimate x. If this estimate satisfies all code constraints, then the process is halted,

otherwise it is repeated until some maximum number of iterations has expired.

Given the received values y and factor graph for H, we now summarise the soft

21


decision sum-product algorithm. Both probability and log-likelihood domain descriptions

are provided, assuming an AWGN channel and a BPSK mapped transmission M(0) =

+1, M(1) = −1. We also assume knowledge of the channel noise variance, σ2. The

probability domain check-variable and variable-check update rules follow from the above

soft-XOR and soft-equal gate descriptions respectively. The log-likelihood domain rules

are easily derived [20], using the LLR definition (Def. 1.1).

When implementing soft decision algorithms it is important to consider potential

numerical issues. In Step 3 of the log-likelihood domain algorithm, we must handle the

discontinuity of tanh−1(a) when a = 0. To do so we define the clipping parameter η.

For the floating-point simulations discussed in this work, we avoid numerical overflow

by clipping the input a, such that a = ±η when a ≷ η. We generally set the clipping

parameter value close to one, η = 1 − 10−10, in order to offer good dynamic range for

computation while protecting against overflow.

2.5.2 Hard Decision Decoding

We can perform hard decision decoding in the case of the AWGN, binary erasure, and

binary symmetric [1] channels. If soft information is carried with the received vector

then it is discarded. These algorithms do not offer the same level of performance as soft

decision algorithms. However, their simplicity makes them easier to implement, and has

also lead to some interesting analytical results [5, 21, 24, 35].

In this discussion we consider y to represent the hard received vector, coming from

M(ys < 0) = +1, M(ys ≥ 0) = −1. Gallager originally proposed the following two hard

decision algorithms [5].

Gallager Decoding Algorithm A

The message output from variable node vs along edge e is equal to the received value ys

for that node, unless all messages entering vs other than that along edge e disagree with

ys. In this case the opposite of ys is sent, i.e. if ys = −1 then a message value of +1 is

sent, and vice versa.

The message output from check cr to a connected variable along edge e, is the product

of all messages incoming on edges connected to cr, excluding the one from e.

22


Algorithm 2.1: Sum-Product Decoder (Probability Domain)

Initialisation:1

Initialise messages µvs→cr(0) = µvs→cr

(1) = 0.5,and µcr→vs

(0) = µcr→vs(1) = 0.5.

Set Priors:2

Using each received symbol ys, calculate ν = 2ys/σ2.

Set ps(0) = eν/(1 + eν), and ps(1) = e−ν/(1 + e−ν).Check - Variable:3

Let δvs= µvs→cr

(0) − µvs→cr(1) and calculate

δcr→v =∏

vs∈Γ(cr)\v

δvs.

From each check cr to each variable v ∈ Γ (cr) sendµcr→v(0) = 1

2(1 + δcr→v), and µcr→v(1) = 1

2(1 − δcr→v).

Variable - Check:4

From each variable vs to each c ∈ Γ (vs) send

µvs→c(0) = γps(0)∏

cr∈Γ(vs)\c

µcr→vs(0),

andµvs→c(1) = γps(1)

∏

cr∈Γ(vs)\c

µcr→vs(1).

The normalisation factor γ is chosen so that µvs→c(0) + µvs→c(1) = 1.Calculate Posteriors:5

For each symbol calculate

qs(0) = γps(0)∏

cr∈Γ(vs)

µcr→vs(0),

andqs(1) = γps(1)

∏

cr∈Γ(vs)

µcr→vs(1).

The normalisation factor γ is chosen so that qs(0) + qs(1) = 1.Stop/Continue:6

Calculate x from xs = qs(1) > 12.

If Hx = 0, then exit declaring success, else if iteration limit reached then exitdeclaring failure, otherwise return to Step 3.

23


Algorithm 2.2: Sum-Product Decoder (Log-Likelihood Domain)

Initialisation:1

Initialise messages µvs→cr= µcr→vs

= 0.Set Priors:2

Using each received symbol ys, set ps = 2ys/σ2.

Check - Variable:3

From each check cr to each variable v ∈ Γ (cr) send

µcr→v = 2 tanh−1

∏

vs∈Γ(cr)\v

tanh

(

1

2µvs→cr

)

.

Variable - Check:4


µvs→c = ps +∑

cr∈Γ(vs)\c

µcr→vs.

Calculate Posteriors:5

For each symbol calculate

qs = ps +∑

cr∈Γ(vs)

µcr→vs.

Stop/Continue:6

If all checks cr satisfy∏

vs∈Γ(cr)

qs > 0

then exit declaring success, else if iteration limit reached then exit declaringfailure, otherwise return to Step 3.

24


Gallager Decoding Algorithm B

This algorithm is a modified version of Algorithm A. In the variable message calculation,

the opposite of the received value will be sent if at least b incoming messages disagree.

The value of b is typically a function of the edge degrees and iteration number.

Erasure Decoding

Consider the binary erasure channel described in Section 1.1.5, with output mapping

M(0) = +1, M(1) = −1, M(erasure) = 0. Define the sign operator sgn (a) = ±1 for a ≷ 0,

and sgn (0) = 0. The message-passing erasure decoder operates, using real arithmetic,

as follows [37]. We may consider this algorithm to be a hard decision version of the

sum-product decoder (Algorithm 2.2).

Algorithm 2.3: Erasure Decoder

Initialisation:1

Set the value of each variable vs to the value of the corresponding receivedsymbol ys ∈ {0,−1, +1}. Initialise all messages to 0.Variable - Check:2


µvs→c = sgn

vs +∑

cr∈Γ(vs)\c

µcr→vs

Check - Variable:3

From each check cr to each variable v ∈ Γ (cr) send

µcr→v =∏

vs∈Γ(cr)\v

µvs→cr

For all variables vs, if at least one µcj→vs6= 0, then assign that value to vs.

Stop/Continue:4

If vs 6= 0 for all variables vs, then exit declaring success.If this is not the first iteration, and the set of values currently assigned tovariables vs is identical to that assigned at the last iteration, then exit declaringfailure. Otherwise return to Step 2.

2.6 Alternative Representations and Algorithms

In this dissertation we focus on factor graphs, however other code representations have

also been proposed. Bayesian networks may be used to represent error control codes [13,

25


33], and have a similar structure to that of the factor graph. However, Bayesian networks

are directed acyclic graphs which are not bipartite i.e. all nodes are of the same type.

At around the same time as Kschischang et al. were preparing the factor graph

approach [20], Aji and McEliece were developing an alternative view [38]. They presented

a generalised distributive law message-passing algorithm, and related it to several other

well known iterative decoding algorithms. They describe its operation using Bayesian

networks and junction trees.

Forney introduced normal graphs in [36]. These graphs are a modified form of factor

graph, such that all symbol variables have degree one and all internal state variables

have degree two. Symbol variables represent an external interface, while state edges are

used for message-passing. These edge restrictions imply that computation is performed

only at constraint nodes. Hence there is a well defined separation of tasks for each graph

component.

There are several well known decoding algorithms, e.g. the min-sum algorithm, which

are closely related to the sum-product algorithm. They use the same message-passing

approach but perform alternative node calculations [15].

Performance improvements can be obtained by altering the message schedule, so

that it accounts for structural properties of the factor graph which can cause problems

for iterative decoding [39]. Some investigation of the effects of attenuating messages has

also been undertaken [40–42].

Modified versions of the belief propagation decoder have been proposed, which offer a

tradeoff between error correcting performance and implementation complexity. Fossorier

et al. have presented a simplified decoding algorithm [43] which does not depend upon the

channel variance, and hence does not require channel parameter estimation. A two phase

iterative reliability based algorithm has also been proposed [44]. Alternative approaches

to practical decoding have been presented in [45–47].

Feldman has recently introduced a decoding algorithm which is based upon linear

programming [26]. In contrast to standard message-passing, the behaviour of this decoder

is well defined, even in the presence of cycles.

2.7 Encoding LDPC Codes

Low-density parity-check codes are a class of linear block code. They may therefore

be encoded using the generator matrix, as described in Section 2.2. In the absence of

26


structure built into the code, as the block length is increased this method places large

storage and processing demands on the encoder. Although H is designed to be sparse,

the generator matrix for the code, G, is generally not sparse4. Encoders may therefore

become considerably slower and larger when the block length n is increased, as matrix-

vector multiplication has complexity O (n2). This has motivated recent research into

finding computationally efficient encoders and structured codes which are amenable to

low complexity encoder implementation.

In this section we review several approaches to building a dedicated encoder archi-

tecture for linear time encoding of LDPC codes. We first provide an overview of existing

methods that may be used to build structured codes, which are specifically designed to

allow simple encoder implementation. We then describe a technique that may be used

to transform codes, so that an encoder with approximately linear time complexity may

be built.

In Chapter 3 we present novel approaches to low complexity encoding which reuse

the decoder architecture. The further motivation behind this approach is to reduce the

overall size of the communication system by allowing one circuit to perform both functions

on a time switched basis.

2.7.1 Structured Codes

Sparse Computation and Back Substitution

The design of LDPC codes with parity-check matrices which have an almost triangular

structure was proposed by MacKay et al. [32]. This method allows most of the parity

bits to be calculated in linear-time using sparse operations, i.e. functions which only

involve a small number of variables. Their experiments show that such codes have a

performance which approximates that of regular LDPC codes. A triangular parity-check

matrix is shown in Figure 2.5.

We consider codewords arranged as row vectors x = [xp | xu], where xu are the

information bits and xp are the parity bits. Similarly we have partitioned the parity-

check matrix, H = [Hp | Hu]. We define the intermediate vector b , Hux⊤u , which

may be evaluated in linear time via sparse matrix-vector computation. When Hp has

triangular form we may compute xp in linear time using Hp and b, via back-substitution.

For the above example this may be done in the following three steps.

4In fact, if G was sparse then we would not expect the code to have a good minimum distance. Recallthat the lowest weight row in G upper bounds dmin.

27


H =

1 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 10 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 00 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 00 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 10 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 00 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 00 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 00 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 10 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0

xp1 xp9 xu1 xu9. . . . . .

Hp Hu

Figure 2.5: An upper-triangular parity-check matrix.

1. xp4 = b4, xp6 = b6, xp7 = b7, xp8 = b8, xp9 = b9

2. xp2 = xp4 + xp8 + b2, xp3 = xp6 + xp9 + b3, xp5 = xp7 + b5

3. xp1 = xp2 + xp3 + xp5 + b1

A special case of triangularity occurs when Hp has the dual diagonal structure shown

in Figure 2.6. The structure was first proposed by Ping et al. [48], and has recently

reappeared in the literature [49, 50]. Here the back-substitution process follows a simple

pattern and may be performed using an accumulator.

Hp =

1 1 0 0 0 00 1 1 0 0 00 0 1 1 0 00 0 0 1 1 00 0 0 0 1 10 0 0 0 0 1

Figure 2.6: A staircase structure for Hp.

In Section 3.2 we show how the decoder architecture can be reused to perform encod-

ing, via back-substitution, when Hp is triangular. We note that if the parity-check matrix

has any columns of weight j > 1, and is strictly triangular, then the code structure is

necessarily irregular.

28


Cyclic Code Structures

Another approach to providing linear time encodability employs cyclic [51, 52] or quasi-

cyclic structures [53, 54]. A cyclic code has the property that any cyclic shift of a codeword

is itself a codeword. A quasi-cyclic code has the property that, for a fixed shift size, a

cyclic shift of any codeword by that size is itself a codeword. We can build the parity-

check matrix for a quasi-cyclic LDPC code by horizontally concatenating sparse circulant

matrices, each having dimension m × m [55].

Definition 2.7 (Circulant Matrix). A circulant matrix is a square matrix in which

each row is a cyclic shift of the previous row.

An example parity-check matrix for a quasi-cyclic code having n = 18 and k = 12,

and hence r = 2/3, is shown in Figure 2.7. This matrix may be easily rearranged into the

quasi-cyclic form presented in [53] by permuting its columns in the order 1,m + 1, 2m +

1, 2,m + 2, 2m + 2, . . . m, 2m, 3m. Note that permuting the rows or columns of H does

not alter the code structure but merely relabels checks or variables respectively.

H =

1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 10 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 10 0 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 01 0 0 1 1 0 0 1 0 1 0 1 0 1 1 0 0 10 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 01 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0

Hp Hu1 Hu2

Figure 2.7: A quasi-cyclic parity-check matrix.

We split H into three m×m circulant components. Here there are a total of n0 = 3

component matrices, with k0 = 2 of these corresponding to codeword information bits.

More specifically, Hp corresponds to the parity bits, while Hu1 and Hu2 correspond to the

information bits. If Hp is non-singular then we may pre-multiplying H by H−1

p to obtain

the systematic form Hsys = [Im|Hu(sys)] = [Im|H−1

p Hu1|H−1

p Hu2]. We then permute the

columns of Hu(sys), in the order 1,m + 1, 2,m + 2 . . . m, 2m, and relabel the information

bits accordingly. This places it into systematic quasi-cyclic form as shown in Figure 2.8.

Note that each row of Hu(sys) is a right cyclic shift of the previous row, by k0 places.

This allows quasi-cyclic codes to be encoded using the same method as that used for cyclic

codes [56]. To generate the parity bits for the above code we use the k-stage feedback

shift register architecture shown in Figure 2.9. In this diagram the source vector, u, is

29


Hsys =

1 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 0 10 1 0 0 0 0 0 1 1 0 0 1 1 1 0 1 1 10 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 0 10 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 10 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 10 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0

Im Hu(sys)

xu12xu1xp6xp1 . . .. . .

Figure 2.8: Systematic quasi-cyclic form of H.

input from the left and shifted to the right, starting with bit u1. Each box represents

a memory cell which holds the previously loaded value until a shift is applied, at which

point it takes the value of the cell preceding it. The switch s is initially connected to

the serial information source, to load the source vector into the register. This operation

takes place in k shifts. The first parity bit is then calculated using an exclusive-or (XOR)

operation on selected information bits, chosen by the shift register taps. The tap positions

are set according to the (reversed) position of nonzero terms in the first row of Hu(sys).

The switch is then closed into the feedback position and the data is shifted cyclically k0

times. The next parity bit is then calculated. This process is repeated until all m parity

bits have been generated.

suk . . . u1

xpk . . . xp1

Figure 2.9: Shift register based encoder for a quasi-cyclic code.

We can extend the parity-check matrix to build higher rate codes by horizontally

concatenating further circulant matrices. For the general case of n0 concatenated circu-

lant matrices the block length is n = mn0. Hence, the choice of code rate is restricted by

the relation r = (n0 − 1)/n0.

30


Other Code Structures

Sipser and Spielman have proposed a class of linear-time encodable and decodable ex-

pander codes in [17]. Their approach involves using cascaded graphs to recursively build

irregular codes based upon simple subcodes at each stage of the graph. Error correcting

codes are built by recursively combining weaker error-reducing codes. They have proven

that if the error-reducing codes are encodable/decodable in linear time, then this prop-

erty will carry through to the resulting error-correcting code [57]. Luby et al. have built

codes based upon these structures which exhibit very good performance [21].

Echard and Chang have presented an approach to building structured quasi-regular

LDPC codes, called π-rotation codes [58, 59]. The parity-check matrix for these codes is

built from component permutation matrices and their rotations. They may be encoded

in linear time using a flip-flop based encoder circuit.

Zhang and Parhi have presented a codec design for (3, i)-regular LDPC codes [60].

Their encoding algorithm is similar to that presented in the following section. However,

rather than transforming an existing code, they specifically construct the code to optimise

encoder speed.

2.7.2 Code Transformation

Richardson and Urbanke have shown that H can be manipulated for most LDPC codes,

so that the coefficient of the quadratic term in the encoding complexity is very small [61].

This is a general solution to the problem of time efficient encodability, as it relies not

upon a new structure of code but rather restructuring any existing (non-singular and

sparse) parity-check matrix. Motivated by the advantages of triangular computation, H

is first rearranged into the approximate triangular form shown in Figure 2.10. This is

done by exploiting the sparseness of H, using column and row reordering only, and hence

the density of the matrix remains unaltered.

The gap, g, is defined as the difference between the number of rows in the approximate

lower-triangular form and the number of rows in H. Encoding complexity is proportional

to n+g2. The problem of reducing encoding complexity therefore becomes one of reducing

g. For a (3,6)-regular LDPC code, the actual number of operations required is no more

than 0.0172n2 + O (n). Hence the very small constant term admits approximately linear

encoding complexity even in the case of large block lengths. Another key result is that

all provably good LDPC codes [62] have very small gaps, typically in the range from 1

31


to 3, and therefore have linear encoding complexity.

H =

[

A B TC D E

]

A B

C D E

T

0

kg

m

m

g

n

Figure 2.10: The parity-check matrix rearranged into approximate lower-triangular form.

The matrix is split into six components as shown. The matrix T is lower-triangular

and all diagonal elements of T are one. We consider the parity section of the (systematic)

codeword to be split into two segments, i.e. x = [u|xp1|xp2]. The parity bits are generated

as follows.

Pre-computation

Pre-compute the dense g × g matrix φ = −ET−1B + D and calculate its inverse.

Note that H must be full rank for φ to be invertible.

Compute First Parity Vector xp1

1. z1 = Au⊤ (linear time sparse multiplication).

2. z2 = T−1z1. As T is lower triangular and sparse this operation may be per-

formed by back-substitution in linear time.

3. z3 = −Ez2 (linear time sparse multiplication).

4. z4 = z3 + Cu⊤ (linear time sparse multiplication and addition).

5. xp1 = φ−1z4 (multiplication by dense g × g matrix).

Compute Second Parity Vector xp2

1. z5 = Bxp1⊤ (linear time sparse multiplication).

2. z6 = z1 + z5 (linear time sparse addition).

3. xp2 = −T−1z6 (linear time back-substitution).

32


2.8 Analysis of Codes and Decoding on Graphs

In this section we explore some of the analytical tools that have been developed for LDPC

codes, while paying particular attention to finite length codes. Recent work in this area

has followed from the foundations laid by Gallager [5], Tanner [9] and Wiberg [15].

2.8.1 Short Cycles

In the example segment of a code factor graph shown in Figure 2.11, a cycle of length

four is highlighted. A general definition follows.

Figure 2.11: A cycle of length four.

Definition 2.8 (Cycle). A cycle of length d in the factor graph representation of a code,

is a connected set of d/2 variable nodes and d/2 constraint nodes. The set is connected

such that, for each node, a path of d edges exists that connects the node back to itself, in

which all nodes are visited without traversing an edge twice. We use the term d-cycle to

denote a cycle of length d. In a bipartite graph, d is always even.

Definition 2.9 (Girth). The girth of a graph is the length of its shortest cycle.

It is well known that if the factor graph of a code is cycle free then the sum-product

decoder will converge to the maximum-likelihood code sequence [15]. Cycles, especially

those of short length, allow the feedback and reinforcement of incorrect values throughout

the network. This behaviour causes the decoder to stray from the maximum-likelihood

solution. However, we seek only a final hard decision, and are less concerned with the

exact soft solution. Despite the presence of cycles in the factor graph, it is now widely ac-

cepted that iterative message-passing algorithms can make good decoders (e.g. see [14]).

Some detailed investigations regarding the effect that cycles have on the behaviour of the

decoder appear in [33, 41, 63–65].

33


In contrast to the above, it is well known that cycle-free graphs cannot support good

codes [66, 67]. However, it is still widely accepted that removing short cycles from a

graph will improve performance when using message-passing decoding. Some theoretical

support for the removal, or avoidance, of short cycles has recently been presented [68].

In particular when constructing the parity-check matrix, it is common to constrain the

corresponding factor graph such that it is 4-cycle free. This appears to be especially

important for high rate codes [40]. A mapping of the 4-cycle free graph constraint onto

the parity-check matrix follows.

Definition 2.10 (Overlap Constraint). Constraining the parity-check matrix H, such

that the maximum overlap between any two columns is one, prevents 4-cycles from ap-

pearing in the graph of H.

By manipulating the adjacency matrix of a code’s factor graph, and in turn altering

H, McGowan and Williamson have presented an algorithm for detecting and removing

short cycles [69].

Definition 2.11 (Adjacency Matrix). The adjacency matrix of a graph, A, is a matrix

with each row and column labelled by the index of a vertex in the graph. If an edge exists

between two vertices vr and vs then the elements in A at position ars and asr are one,

otherwise they zero.

In this thesis we interpret the adjacency matrix over the reals. The adjacency matrix

for the factor graph of a code is related to its parity-check matrix as follows, where we

make the obvious mapping from F2 → R.

A =

[

0 HH⊤ 0

]

The algorithm breaks cycles by rearranging the entries in H, without altering the

row and column weight distributions. The experimental results presented in [69] suggest

that removing short cycles becomes more important when operating at high SNR, and

that there is proportionally more to be gained from raising the girth from four to six,

than from six to eight.

Campello et al. have proposed a bit filling approach to constructing LDPC codes [70,

71]. The algorithm generates random codes, allowing girth to be specified as a design

parameter.

34


2.8.2 Stopping Sets and Extrinsic Message Degree

Consider the case when we are using LDPC codes over the binary erasure channel. Here,

we are able to analyse the behaviour of the decoder using stopping sets. Such analysis

was first introduced by Richardson et al. [61]. Using stopping sets and combinatorial

arguments, Di et al. have developed algorithms for calculating the exact average bit and

block erasure probabilities for regular ensembles of LDPC codes [24]. A stopping set on

the graph of a code is defined as follows.

Definition 2.12 (Stopping Set). A stopping set S is a group of variable nodes, such

that all checks connected to S are connected to S at least twice.

Let the set of erasures made by the channel be denoted E , and those remaining when

the decoder fails be denoted S. When the iterative decoder fails, i.e. arrives at state

S, further iterations will not shift it from this state. The set of erased bits remaining

at this point represents the maximal stopping set from E . We note that the empty set

is a stopping set, and is the maximal stopping set of E when the decoder succeeds. An

example of a stopping set of size five is highlighted in Figure 2.12.

Figure 2.12: A size five stopping set.

In an analysis aimed at lowering the error floor for irregular codes, Tian et al. have

related stopping sets to cycles, and to the linear dependence of columns in H [25]. In a

graph where every variable has degree two or more, a stopping set must contain at least

one cycle. They have introduced the following code design metric.

Definition 2.13 (Extrinsic Message Degree (EMD)). The extrinsic message degree

of a variable node set is the number of checks that have only one edge connection into the

set.

35


They also present an efficient code construction algorithm which ensures that, for all

cycles up to a chosen length, the set of variables participating in such a cycle has some

minimum EMD. Their empirical results show that the error floor for irregular codes can

be significantly improved using this technique.

2.8.3 Graph Expansion

The above views imply that strong local connectivity in the factor graph can adversely

affect iterative decoder performance. It is therefore desirable that the graph have a good

expansion property. A set of nodes in a graph are said to expand well if they have a

large number of neighbours. More specifically, the expansion of a graph is characterised

as follows [17].

Definition 2.14 (Graph Expansion). Consider a set of nodes S on a graph of Nnodes. Let R denote the set of nodes neighbouring S. Every set of at most p nodes

expands by a factor ζ, if |R| ≥ ζ|S| for any set S ∈ N such that |S| ≤ p.

We label a graph in which all nodes have degree j, as j-regular. A common metric

for testing the expansion of a j-regular graph follows [72, 73].

Definition 2.15 (Normalised Spectral Gap). Consider a connected j-regular graph

GA, with a total of p nodes, having adjacency matrix A (Def. 2.11). As A is symmetric

it has real eigenvalues, which we order µ1 ≥ µ2 ≥ · · · ≥ µp. Since GA is connected and

regular, µ1 has multiplicity one and |µ1| = j. We define the normalised spectral gap of

GA as µδ , (j − |µ2|)/j.

If the normalised spectral gap is large, then the graph is a good expander. It is

well known from the Ramanujan bound, that |µ2| ≥ 2√

j − 1, in the limit of infinite

graph size [73]. Therefore, for large j-regular graphs, we expect that the best normalised

spectral gap attainable will be bound by µδ ≤ (j − 2√

j − 1)/j. A graph which achieves

this bound is said to have optimal expansion. A randomly generated regular graph is

likely to be a good expander [74]. Explicit construction of codes using graphs with good

expansion has also been investigated [17, 67, 75–78].

2.8.4 Near-Codewords

MacKay and Postol introduced the concept of near-codewords in [79], as follows.

36


Definition 2.16 (Near-Codeword). A (w, v) near-codeword is a vector e with weight

w (e) = w, such that the syndrome of e, z(e) = He has weight w (z(e)) = v.

A near-codeword e having low weight w and low syndrome weight v can lead to

convergence problems, and hence detected errors, for iterative decoding via belief propa-

gation on the AWGN channel [79]. We note the distinction between the decoding process

arriving at a near-codeword versus it arriving at a wrong codeword. The latter represents

an undetected error which may be attributed to a poor minimum distance property of

the code. In contrast near-codewords represent error states related to the structure of the

code’s factor graph, such as the connection of short cycles. If the decoder arrives at such

an error state, it is likely to remain there. The structure of a near-codeword is generally

different to that of a stopping set. A (w, v) near-codeword typically has v checks that are

connected into the set of w variables only once, giving the set an EMD of v, compared to

zero for a stopping set. However, this is not always the case, e.g. the stopping set shown

in Figure 2.12 also corresponds to a (5, 1) near-codeword.

2.8.5 The Computation Tree

As discussed in Section 2.5.1, the factor graph representation of a code provides a recipe

for constructing a message-passing decoder. We may consider any variable node as the

root of a tree which represents the unwrapped factor graph. This tree, which we call

the computation tree, was introduced by Wiberg [15] to model iterative message-passing

decoding.

Definition 2.17 (Computation Tree). The computation tree associated with a factor

graph G is a singly connected bipartite graph, rooted at some variable node vs, such that

s ∈ {1, . . . , n} for a code of length n. The computation tree of depth d is formed by

recursively unwrapping G, as follows.

1. Initialise the tree to be the variable root node vs, such that s ∈ {1, . . . , n}.

2. For each leaf node l in tier t of the tree, identify the corresponding node lG in G.

Label the parent node of l in the tree p. Let Γ (lG) \pG represent the set of nodes

adjacent to lG in G, excluding the node pG in G corresponding to the parent node p.

Create nodes at tier t + 1 of the tree, by attaching nodes representing each member

of this set to the leaf.

3. Repeat Step 2 a total of 2(d − 1) times.

37


In general, each node in the code’s factor graph has multiple instances in the tree.

This view allows us to analyse iterative decoding on a graph that is free of cycles. Fig-

ure 2.13 shows an example computation tree corresponding the factor graph shown in

Figure 2.3. In this case the root node is v1 and the tree has been expanded to depth

d = 3. Variable (bit) nodes are represented on the tree by circles and check (constraint)

nodes by boxes.

2 6 7 2 4 8 2 3 6 2 5 8 2 3 7 2 4 5

3 4 4 43 3

3 5 4 87 6

1 2

1

Figure 2.13: Computation tree example.

We consider the local parity-check constraints for each check node in the factor graph

to be mapped onto all corresponding check nodes in the computation tree. A configu-

ration, i.e. assignment ∈ {0, 1} to all variable nodes in the computation tree, is called

valid if all local parity-check constraints in the tree are satisfied. Valid configurations

on the computation tree represent pseudo-codewords. Analysis using pseudo-codewords

and pseudo-weight on the computation tree has been developed in [41, 42, 65]. We further

explore the concepts of pseudo-codewords and pseudo-weight in the following section.

2.8.6 Finite Graph-Covers

Explaining the behaviour of iterative decoding algorithms on the AWGN channel is a

challenging task. For the BEC case, stopping set analysis allows us to predict the final

state of the decoding process exactly, given the set of channel erasures and the code struc-

ture. The AWGN channel case is not as well understood. Here, error events can have

38


different characteristics. It is easy to see that poor minimum distance will lead to unde-

tected errors. However, accounting for the case when the decoder fails to converge, i.e. a

detected error, is more difficult. It has become common to employ stopping set analysis,

in conjunction with the argument that BEC performance is related to performance on

the AWGN channel, however this approach is only qualitative. Empirical analysis using

near-codewords appears more appropriate in the presence of AWGN, as it allows us to

account for problematic states during the decoding process. However, we do not expect

all near-codewords to cause problems. For example, when considering a (j, i)-regular

(n, k) code, a vector of length n containing a single one has a syndrome of weight j.

Thus it falls under the definition of a near-codeword. However, such a near-codeword is

present for every code, and we would not necessarily expect it to be problematic [27].

Kotter and Vontobel have recently introduced a new technique for analysing the

iterative decoding process for finite length codes [27, 28]. We now review their approach,

which is based upon finite graph-covers.

Let G denote the factor graph of a code C. We obtain an m-fold cover of G, denoted

G, by replicating each node in the graph m times, and then adding edges so that the

original local adjacency relationships are preserved (for an exact definition see [27]).

Figure 2.14 shows the graph, G, for the trivial code C = {(0, 0), (1, 1)}, and an example

2-fold (double) cover, G.

v1 v2

(a) The graph G of C.

v1,1 v1,2 v2,1 v2,2

(b) An example double cover of G.

Figure 2.14: A simple graph G and a double cover.

Note that G is not unique. Moreover, the number of valid covers grows quickly

with m. We may consider the general case of an m-fold cover of G, by replicating each

node m times and then permuting edge connections, as shown in Figure 2.15.

A locally operating iterative decoder cannot distinguish between G and G. Moreover,

G is itself the factor graph for a code C, having length mn. Every codeword x ∈ C also

39


...

......

...

v1,1 v1,2 v1,m v2,1 v2,2 v2,m

Π1

Π2 Π3

Π4 Π5

Π6

m

Figure 2.15: An m-fold cover of G.

has a valid representation on G. We may obtain a valid x by assigning the value of each

variable from x to each occurrence of it in the cover. We term this process lifting, and

use the symbol ˜ to distinguish objects relating to the cover from those in the underlying

graph. For the above double cover example C is lifted to {(0, 0, 0, 0), (1, 1, 1, 1)}. However,

there will be configurations on G, for which nodes corresponding to the same variable

in G, assume different values. In the above example, G also supports the configurations

x = (1, 0, 0, 1) and x = (0, 1, 1, 0). Hence, we require a means of characterising x in

n-dimensional space. For this we have the following definition of a pseudo-codeword, in

the context of finite graph-covers [27].

Definition 2.18 (Pseudo-Codeword). Consider a codeword x ∈ C and let ωs represent

the fraction of times that variable vs ∈ G assumes the value 1 on G,

ωs(x) ,|{l : xi,l = 1}|

m, where 0 ≤ ωs(x) ≤ 1

The vector ω = ω(x) = (ω1(x), ω2(x), . . . , ωn(x)) is a pseudo-codeword of C.

Assume that a BPSK mapped all-zero vector, 1, is transmitted and that the vector

y is received across an AWGN channel. We define a point, ωBPSK , representing the

pseudo-codeword in n-dimensional signal space, as follows [15].

40


Definition 2.19 (Generalised BPSK Mapping). For a BPSK mapping M(0) =

+1, M(1) = −1, on an AWGN channel, the generalised BPSK mapping of the nonzero

pseudo-codeword ω is

ωBPSK , 1 − 2

(∑

s ωs∑

s ω2s

)

ω

A decision boundary is formed in n-dimensional signal space, between 1 and ωBPSK .

The decision made by the maximum-likelihood decoder will be governed by which side

of this boundary y lies. While such analysis is, in general, not exact for the iterative

sum-product decoder, it can provide a very good prediction. The pseudo-weight of ω

determines the position of the decision boundary [15].

Definition 2.20 (Pseudo-Weight). The pseudo-weight of a nonzero pseudo-codeword

ω on the AWGN channel is

wp (ω) ,(∑

s ωs)2

∑

s ω2s

We note the importance of the minimum pseudo-weight, wminp , which is the lowest

pseudo-weight of any valid pseudo-codeword. This property is analogous to the minimum

distance of the code, dmin. However, where dmin is a function purely of the code, wminp

considers the factor graph structure and operation of the iterative decoding algorithm.

To analyse the effect of a valid configuration x on the cover G, we consider length mn

vectors y and λ, which represent lifted versions of the received vector and corresponding

log-likelihood vector. We assume transmission of the all-zero codeword 0, with lifted

representation 0. For BPSK mapping M(0) = +1, M(1) = −1 on the AWGN channel,

we have λ = 2y/σ2. Hence the LLRs represent the received values, scaled by a positive

constant. For this case, Proposition 2.1 dictates the decision that will be made by the

decoder [27].

Proposition 2.1 (Kotter and Vontobel). The decoder makes a decision between the

all-zero word 0 and any other valid configuration x on the graph-cover. Given the received

vector y ⇒ y and corresponding LLR vector λ ⇒ λ, the decision made relates to the

pseudo-codeword ω = ω(x), as follows.

p(x|λ) > p(0|λ) ⇐⇒ 〈ω(x), λ〉 < 〈ω(0), λ〉 ⇐⇒ 〈ω(x),y〉 < 0

Here the vector inner product is represented by 〈a,b〉 =∑

s asbs.

41


Recall that xs ∈ {0, 1}, and that G indicates the validity of x. A similar means

of testing pseudo-codeword validity is provided by modifying the definition of the local

indicator functions in G. The new indictor functions allow variable assignments ωs ∈ [0, 1].

The code C contains a discrete set of 2k binary codewords. However, the set of valid

pseudo-codewords is dense in the fundamental polytope, which is the convex hull in Rn

defined by the set of pseudo-codeword indicator functions [28].

Definition 2.21 (Pseudo-Codeword Indicator Function). Consider a check cr, cor-

responding to row r of H. Denote the total set of variable indices, i.e. the columns of

H, as S = {1, . . . , n}. Let Sr denote the indices of all variables connected to cr, i.e.

Sr = {s ∈ S|[H]r,s = 1}, and let Sr\p denote the set excluding p. Given a vector ω,

assign the value ωs to each variable vs, for all s ∈ S. The local indicator function for cr

is,

Ir =

1∑

s∈Sr\p

ωs ≥ ωp,∀p ∈ Sr,

0 otherwise.

The global indicator function is the product of all local indicator functions in the

graph. If it has value one for a given assignment ω, then ω is a valid pseudo-codeword.

We may use canonical completion to build a valid pseudo-codeword as follows [27].

Definition 2.22 (Canonical Completion). Consider a breadth-first spanning tree [80]

of the (j, i)-regular graph G, rooted at some arbitrary variable node vr. Construct the

pseudo-codeword ω, such that all variables vc at distance d edges from vr, are assigned

the value

ωc =1

(i − 1)d/2

Canonical completion always generates a valid pseudo-codeword, the weight of which

provides an upper bound on wminp . This simple technique demonstrates an important

fact. In the limit, as we increase the block length of a (j, i)-regular code to infinity,

wminp vanishes. This stands in sharp contrast to the good minimum distance property

of randomly generated LDPC codes discussed in Section 2.3. By modifying the process

slightly such that it is rooted at a set of variables, we are able to quantitatively assess

the effect of near-codewords. This technique is employed in Section 4.4.4.

42


2.9 Developments in LDPC Code Design

Since the introduction of regular binary LDPC codes, several modifications have been

proposed which can offer improved performance. We now review these approaches, some

of which have been considered breakthroughs within the coding community.

2.9.1 Irregular Constructions

By relaxing the regularity design constraint and allowing irregular node degree sequences,

we can improve upon the performance of Gallager’s original LDPC constructions.

Luby et al. introduced a class of irregular codes which exhibit considerable improve-

ment in performance over regular constructions [21]. Some example irregular construc-

tions have also been investigated by MacKay et al. [32].

Richardson et al. have shown how to determine convergence thresholds [35], and

optimise irregular degree distributions [22], using density evolution. A simplified Gaus-

sian approximation to density evolution has been provided by Chung et al. [81]. The

performance of long codes constructed according to these degree sequences approaches

capacity. For example, a block length n = 107 irregular LDPC code can achieve perfor-

mance within 0.04dB of the Shannon bound [23]. The technique is based upon the local

tree assumption, that the graph girth (Def. 2.9) will be large enough to sustain cycle free

local subgraphs during decoding. Hence the results degrade as block length is decreased.

Irregular codes also exhibit a higher error floor property than regular codes [32], and this

has been addressed by Tian et al. [25].

A method for selecting good degree distributions, using curve fitting on extrinsic

information transfer (EXIT) charts, has been presented by ten Brink and Kramer [82].

They also propose a technique for integrating the LDPC coding scheme with a modulator

and detector.

2.9.2 Non-Binary Construction

Davey and MacKay have generalised the LDPC coding scheme to build codes which use

symbols coming from finite fields of the form Fq, where q = 2b for some b ∈ Z+ [83]. The

iterative decoder is appropriately modified to deal with messages of higher cardinality.

Codes built over F4, F8 and F16, exhibit significant empirical improvement over the

performance of binary codes.

43


They have also demonstrated that incorporating irregular construction into these

code designs leads to further performance improvements [84].

2.9.3 Algebraic LDPC Constructions

Algebraic code construction has several potential advantages over the random approach.

From a theoretical perspective, we can often say more about code properties, such as

minimum distance, for algebraic constructions. This stands in contrast to the fact that

most of the results for random codes pertain to ensemble averages. We can incorporate

algebraic conditions, e.g. on girth (Def. 2.9), into the design process. Moreover, we can

provide concise descriptions for algebraic codes. For example, an algebraic code can often

be described by a polynomial which defines H, rather than requiring a verbose description

of H. From an implementation perspective they offer deterministic placement of edges

in the factor graph representation of the code, thus reducing the complexity of circuit

routing.

As a pioneer in the area, Margulis presented explicit code designs, by construct-

ing Ramanujan graphs [8, 75]. Independently, Lubotzky et al. explored a similar ap-

proach [76]. Rosenthal and Vontobel have since extended this approach, to build LPDC

codes which have a good expansion property (Def. 2.14) [67, 78].

Lucas et al. [51] and Kou et al. [52] have built LDPC codes based upon finite

projective geometries. These codes have good minimum distance and have been shown to

outperform Gallager codes. However, they have girth limited to six. Vontobel and Tanner

have proposed alternate designs, based upon generalised quadrangles [85], to which this

constraint does not apply.

The above codes fall under the category of partial geometry constructions, for which

Johnson and Weller have recently presented some new results [86]. They have also in-

vestigated the combination of algebraic constructions with irregular graphs [55] and non-

binary fields [87].

For further information regarding the algebraic construction of sparse graph codes

see [67] and the references therein.

2.10 Summary

Coding theory has undergone a paradigm shift in recent years, with the introduction of

iterative decoding and codes on graphs. The low-density parity-check codes introduced

44


by Gallager offer good error correcting performance. Since their rediscovery, several mod-

ifications have been proposed which further improve their performance. There are now

many impressive empirical results in the literature, especially in the case of long codes.

However, there is still work to be done in the area of analysing iterative decoding, and

in designing good codes of short and medium block length. The finite graph-cover anal-

ysis discussed in this chapter represents a recent step toward the analytical tools coming

into line with empirical results. This analysis accounts for the structure of a particular

parity-check matrix and considers the operation of the iterative decoding process. In the

case of iterative decoding, it has been shown that the traditional metric of code minimum

distance is less significant than the minimum pseudo-weight associated with the parity-

check matrix. Perhaps the most exciting step is still yet to be made, i.e. the design of

finite length codes with high minimum pseudo-weight.

Several techniques have been suggested to improve the performance of iterative de-

coding when the factor graph of a code has cycles. We may either constrain (or transform)

the parity-check matrix so that it does not have short cycles, or modify the decoding al-

gorithm so that it accounts for the presence of short cycles. New tools for analysing

LDPC codes are likely to assist in the development of both approaches.

Encoding LDPC codes is, in general, not a trivial problem. Several approaches have

been suggested to date, either based upon specific code designs, or for the transformation

of an existing code. These approaches may be separated into two categories. Firstly, codes

may be designed (or transformed) to exploit back substitution through either a triangular

(or approximately triangular) structure of the parity-check matrix. Secondly, cyclic and

quasi-cyclic structures may be encoded using a simple shift register approach, in a similar

manner to that used to encode turbo codes. Both techniques are based upon a serial

encoding operation, with computational latency that scales linearly with code length. In

the following chapter we introduce a novel approach to iterative encoding which reuses

the decoder. Moreover, the proposed approach allows encoding to be performed using a

parallel architecture. For this new technique, the number of iterations required to encode,

and hence latency, is fixed as we scale code length.

There are several challenges that must be faced in order to build an LDPC decoder

circuit which satisfies the design criteria introduced in Section 1.2. In Chapters 6 and 7 of

this thesis we see how such challenges may be met through the recent suggestion of using

analog circuits for iterative decoding. Finally, we propose a novel codec architecture and

outline the advantages offered by this circuit in terms of the codec design criteria.

45

Chapter 3

Iterative Encoding of LDPC Codes

3.1 Introduction

Inspired by the principle of iterative decoding, in this chapter we investigate the use of

iterative techniques for encoding LDPC codes. A previous study of high rate LDPC codes

with short block length [40] has shown that randomly generated codes (decoded using a

practical soft decision sum-product decoder) outperform comparable Reed-Solomon (RS)

codes (decoded by a hard input decoder). RS codes possess however an advantage over

randomly constructed LDPC codes, in that an RS erasure decoder can also be used for

encoding [1].

Motivated by this idea of decoder re-use, and the potential to encode on the factor

graph [9], we develop a class of reversible LDPC codes. We aim not only to provide an

encoding technique which has time complexity that varies linearly with block length but

more specifically one which reuses the sum-product message-passing decoder architecture

described in Section 2.5.1.

By reusing the decoder architecture for encoding, both operations can be performed

by the same circuit on a time switched basis [88]. The utilisation of area within practical

LDPC decoder implementations to date has been limited by routing congestion [89].

The area required for wire routing in a dedicated LDPC encoder implementation is also

significant. Thus, by eliminating the need for a separate dedicated encoder we aim to

reduce the overall size of the circuit.

Another benefit of reusing the decoder for encoding, is that we reduce the number

of individual components that must be tested. Hence, the overall burden of system

verification is reduced. This is of particular interest for codec circuit implementation.

The output of the encoder is deterministically defined by the input information vector.

46

Chapter 3. Iterative Encoding of LDPC Codes

Therefore an error in routing, or similar, will be exposed quickly when testing the encoder.

In contrast, it is the nature of the decoder to correct errors, and testing the decoder via

circuit simulation can be an arduous task. Hence an implementation error, such as that

described in [90], can be hard to detect. By using the same circuit to perform both tasks,

the encoder implicitly provides a further level of verification for the decoder.

We present a parallel encoding algorithm, which employs an adaption of the Jacobi

method for iterative matrix inversion. The standard Jacobi method, which operates over

R, is modified so that it instead operates over F2. We propose an algebraic construction

for reversible LDPC codes built from circulant matrices and show how the overlap con-

straint (Def. 2.10) may be viewed algebraically. Using these results we then develop an

algorithm for building iteratively encodable, 4-cycle free circulant matrices, suitable for

use in (3,6)-regular and high rate reversible codes. The codes have parity-check matrices

which combine circulant and random components, in a similar manner to that presented

by Bond et al. [91, 92]. However, we use different criteria for selecting the circulant

components, such that the codes are iteratively encodable.

As discussed in Section 2.7.1, we consider a binary systematic (n, k) code with code-

words arranged as row vectors x = [xp | xu], where xu are the information bits and xp are

the parity bits. Likewise the parity-check matrix is partitioned such that H = [Hp | Hu].

Thus x is a codeword iff [Hp | Hu][xp | xu]⊤ = 0, or equivalently Hpx

⊤p = Hux

⊤u . We

maintain our previous definition of b , Hux⊤u . Encoding then becomes equivalent to

solving

Hpx⊤p = b. (3.1)

Therefore, for an m × m non-singular Hp, the parity bits satisfy x⊤p = H−1

p b.

We investigate iterative solution methods for (3.1) and the corresponding convergence

criteria and constraints imposed on Hp. The idea behind using the code constraints

to perform encoding on the graph was originally suggested by Tanner [9]. The work

presented here forms a link between this concept and classical iterative matrix inversion

techniques, allowing the design of good codes that encode quickly.

This chapter contains original work, which was presented in part at the 2002 IEEE

Globecom conference, in Taipei, Taiwan [93]. The approach and code design techniques

were respectively presented at the 2002 and 2004 Australian Communications Theory

Workshops [94, 95].

47


3.2 The Sum-Product Encoder

As discussed in Section 2.7.1, if Hp is upper triangular then encoding (i.e. solving (3.1))

may be performed in m steps by simply performing back substitution. This implies a

solution for each of the parity bits in a particular order. Upper triangular matrices are

therefore of interest. For any upper triangular A with elements from F2,

A non-singular ⇐⇒ diag A = I (3.2)

(since the diagonal elements are the eigenvalues, none of which may be 0). Let T be the

set of all binary non-singular m×m matrices that may be made upper triangular, using

only row and column permutations.

The message-passing erasure decoder, introduced as Algorithm 2.3 in Section 2.5.2,

can be used for encoding certain types of LDPC codes, as we shall now show.

Theorem 3.1. Let binary A ∈ T and b be given. Algorithm 2.3 solves Ax = b in at

most m iterations, without regard to the actual order of node updates.

Proof. Without loss of generality assume that A is upper triangular. Obtain bM ∈{−1, +1}m from b, using M(0) = +1, M(1) = −1. Recall that Algorithm 2.3 employs

real arithmetic, and returns a vector xM ∈ {−1, +1}m. The solution, i.e. the binary form

of x, is obtained using the inverse mapping.

Construct the bipartite graph with variable nodes vs connected to checks cr according

to A. Also connected to each check cr is the additional variable node v′r. Initialise the

nodes (Step 1) v′s = M(bs) ∈ {−1, +1} with values from bM, and set all vs = 0.

Call vs active if at least one λcr→vs6= 0. An active vs is correct if λcr→vs

∈{sgn (M(xs)) , 0} for all adjacent checks cr ∈ Γ (vs). For any correct vs, sgn (λvs→cr

) ∈{sgn (M(xs)) , 0}. In the case that every node is either correct or not active, nodes can

only be made correct, left correct, or left inactive at the next Step 3, since each new

λcr→vs∈ {M(xs), 0}.

After the first Step 3, v1 will be correct (since the only nonzero incoming message

will be M(b1)). Similarly, any other nodes activated will be correct.

Assume there is a set of correct nodes C, such that |C| ≥ 1, and that every node

v 6∈ C is inactive. It remains only to show that at least one correct node is created at the

next Step 3. This is true since there will exist an integer r ≥ 1 such that v1, . . . , vr are

correct. At the next Step 3, vr+1 6∈ C will be correct since, by A triangular and (3.2),

48


cs ∈ Γ (vs) and cr 6∈ Γ (vs) for r < s. Likewise, vr ∈ Γ (cr) and vs 6∈ Γ (cr) for s > r. The

induction requires at most m steps before every node is correct.

Hence, if Hp ∈ T , we may perform encoding by applying Algorithm 2.3 to (3.1),

initialising the variables representing xu with ±1 and those representing xp with 0. The

idea of Theorem 3.1 is certainly not new, but we have not seen it made explicit.

The number of iterations required for convergence may be greatly reduced below the

upper bound of m for LDPC codes, as they are represented by sparse matrices. It is

possible to design Hp ∈ T using a tiered approach, similar to that described in [32]. In

this construction, the parity bits for one or more tiers will be evaluated at each iteration,

and therefore the total number of iterations may be set by the designer.

The selection of Hu is always arbitrary with respect to the sum-product encodability

of H.

3.3 Encoding via Iterative Matrix Inversion

Having reduced the encoding problem statement to one of matrix inversion, it is natural

to wonder whether classical iterative matrix inversion techniques, such as those described

in [96], can be applied in the case that A 6= T .

Suppose we wish to solve Ax = b. Split A according to A = S − T. We can then

write Sx = Tx + b, and try the iteration

Sxκ+1 = Txκ + b (3.3)

for some initial guess x0. In order to compute xκ+1 easily, S should be easily invert-

ible. The Gauss-Seidel method chooses S triangular, so for A ∈ T , we see that the

method of the previous section actually implements Gauss-Seidel (in this case simply

back-substitution). The classical Jacobi method for real matrices chooses S = diag A

and converges for any initial guess provided that the spectral radius, i.e. magnitude of

the largest eigenvalue, of the matrix S−1T is less than 1. We will consider the use of this

method for F2 matrices, necessitating different convergence criteria.

In order for S to be invertible, all values along its diagonal must be nonzero. Over

F2 this implies that S = I and diag A = I. Hence (3.3) becomes

xκ+1 = (A + I)xκ + b (3.4)

49


Theorem 3.2. For arbitrary x0, the iteration (3.4) with all matrices and vectors over

F2, yields xκ′ = A−1b for κ′ ≥ κ iff (A + I)κ = 0.

Proof. Let the error term at iteration κ be eκ = (x − xκ). Subtracting xκ+1 = Txκ + b

from x = Tx+b gives eκ+1 = Teκ. So eκ = Tκe0, where e0 is the error of the initialisation

x0. Hence the error term vanishes for iterations κ′ ≥ κ if Tκ = 0. Conversely, if Tκ 6= 0

for all κ, the algorithm will fail to universally converge since the error will be zero only

if e0 is in the null space of Tκ, which cannot be guaranteed independently of the initial

guess.

Denoting the eigenvalue of S−1T with the largest magnitude as µ1, the convergence

of the Jacobi method over R is governed by the spectral radius |µ1|. More precisely, the

rate of convergence is asymptotic, becoming slower in the limit |µ1| → 1−. In contrast,

we can say less about the rate of convergence for the Jacobi method over F2. The above

theorem implies only that the method either converges after a finite number of steps, or

that it never converges.

Based on Theorem 3.2, we can in principle construct codes that are iteratively en-

codable in κ iterations using (3.4) by selecting Hp such that

(Hp + I)κ = 0 (3.5)

We label these codes according to the following definition.

Definition 3.1 (Reversible Code). A reversible code is iteratively encodable using the

Jacobi method over F2.

We also say that such codes are Jacobi encodable. It is interesting to note that the

codes with Hp ∈ T mentioned in the last section are also Jacobi encodable.

Theorem 3.3. Any code with upper triangular Hp is Jacobi encodable over F2.

Proof. Let T = Hp + I. Hence diag T = 0. Each successive power of T will therefore

be upper triangular, with its first nonzero entry of each row occurring at least one place

later. Thus Tκ = 0 for some κ.

We may view the Jacobi iteration as message-passing on a bipartite graph formed as

follows. Let variable node vs correspond to xs and let nodes v′r correspond to br. The vs

are connected to checks cr according to A and the v′r are connected to cr. This is the same

50


Algorithm 3.1: Message-Passing Jacobi Method Over F2

Initialisation:1

Set all vs = +1 and v′r = br.

Variable - Check:2

Send µvs→c = vs to all c ∈ Γ (vs) \ cs.Check - Variable:3

From check cr send µcr→vr=∏

vs∈Γ(cr)\vrµvs→cr

to vr only.Let vs = µcs→vs

.Return to Step 2.

connection structure as required for sum-product decoding. The Jacobi message-passing

schedule, for a binary mapping: M(0) = +1, M(1) = −1 is defined as follows.

An example of how this algorithm operates on the graph is shown in Figure 3.1.

During each iteration variables may be updated in parallel. For clarity Figure 3.1 shows

only those messages used to update v2.

We note that Algorithm 3.1 has a strong resemblance to the sum-product decoder.

The update process for µcr→vrin the Jacobi method is identical to that used in the update

λcr→vrin the sum-product case, so the decoder architecture may be reused. It is also

worth noting that only one operation per node needs to be performed in each step of

the Jacobi method, compared to one per connected edge for each of the nodes in the

sum-product case.

3.4 Reversible LDPC Codes

In the following sections we will demonstrate the use of the F2 Jacobi convergence rule,

to design 4-cycle free codes which are iteratively encodable in κ iterations of the Jacobi

method. We therefore seek a matrix Hp which satisfies (3.5). Once Hp has been obtained

we may then complete H by randomly generating Hu, whilst blocking the introduction

of 4-cycles.

In this chapter we focus on the algebraic construction of reversible LDPC codes by

constructing Hp as an m × m circulant matrix (Def. 2.7). The first row of a circulant

matrix is specified by the polynomial c(x), where the coefficients of xs−1 represent the

entry in column s. The r-th row of the matrix is then specified by the polynomial

p(x) = xr−1c(x) mod (xm + 1).

The algebra of circulant matrices over F2 is isomorphic to the algebra of polynomials

modulo xm + 1 having coefficients from F2 [54]. Hence by choosing Hp to be a circulant

51


v4

v3

v2

v1

c4

c3

c2

c1

v′4

v′3

v′2

v′1

µv1→c

2

µc2→v2 µv′

2→c2µ v 4

→c 2

Figure 3.1: Jacobi algorithm as message-passing.

matrix we may define our constraints in terms of the algebraic manipulation of polyno-

mials. To this end we let h(x) denote the first row polynomial of Hp and then map the

iterative encodability constraint (3.5) into the polynomial domain as

(h(x) + 1)κ ≡ 0 mod (xm + 1) (3.6)

We refer to the number of nonzero terms in h(x) as the weight of the polynomial.

We will use j to denote both the weight of the polynomial h(x) and the column weight

of the corresponding circulant matrix.

3.4.1 Building Iteratively Encodable Circulants

We must first develop a technique for generating polynomials which satisfy (3.6). We will

then refine the approach by including the matrix overlap constraint to block the creation

of 4-cycles.

52


Consider an m × m circulant matrix Hp and let as denote the coefficient of xs in

h(x), where s ∈ {0 . . . m − 1}. We now introduce some further constraints on the choice

of the first row polynomial h(x) in order to simplify the algebra involved in searching for

candidate polynomials.

Lemma 3.1. If p(x) is a polynomial with binary coefficients of the form

p(x) = a0 + a1x + a2x2 + · · · + a(m−1)x

(m−1)

and κ = 2y for y ∈ Z+, then

pκ(x) = a0 + a1xκ + a2x

2κ + · · · + a(m−1)xκ(m−1)

Proof. F2 has characteristic two, hence

p2(x) =

(

m−1∑

s=0

asxs

)2

=m−1∑

s=0

a2sx

2s

All coefficients are binary and hence aκs = as for all s and κ. By recursively substituting

p(x) = p2(x) the lemma holds for all κ such that κ is a power of 2.

Henceforth, we restrict κ = 2y. By Lemma 3.1 the encodability constraint (3.6)

reduces to

hκ(x) ≡ 1 mod (xm + 1) (3.7)

The implementation of (3.4) is simplified if we have a direct connection for feedback

between variable vr and check cr on the factor graph representation of Hp. For this

reason, we further assume h0 = 1 and define g(x) = h(x) − 1, where g(x) represents the

remaining terms in h(x) once h0 has been assumed. The encodability constraint on g(x)

becomes

gκ(x) ≡ 0 mod (xm + 1) (3.8)

It is desirable to have κ as small as possible as it represents the number of iterations

that the encoder will take to converge.

As we are dealing with polynomials that have binary coefficients, g(x) must have an

even weight in order to satisfy (3.8), implying an odd weight for h(x). To construct g(x)

we must therefore select pairs of nonzero coefficients {ap, as} where p, s ∈ {1 . . . m − 1}and p 6= s, such that

apxκp + asx

κs ≡ 0 mod (xm + 1) (3.9)

53


We now seek a means of grouping candidate terms for g(x) so that we may choose

pairs which satisfy (3.9). To this end we constrain m such that it is some multiple

of κ, i.e. m = βκ, where β ∈ Z+. We will now show how the terms asx

s of h(x)

may be grouped into cosets, according to their exponents. Hence we now focus on the

set S = {0, 1, . . . ,m − 1} which are valid choices for coefficient indices, and therefore

exponents of x, in h(x). Note that S also represents the set of columns (indexed from

zero) in the first row of the matrix Hp.

Lemma 3.2. If m = βκ then, under modulo β addition, the set S = {0, 1, . . . ,m − 1}will be grouped into the cosets of {0, β, 2β, . . . ,m − β}, each of order κ.

Proof. Modulo β addition will group S into β distinct equivalence classes. The equiva-

lence class labelled zero will be {0, β, 2β, . . . ,m − β}. The remaining β − 1 classes then

form cosets of this, having the form {l, l+β, l+2β, . . . , l+m−β}, where l ∈ {1 . . . β−1}.As m is a multiple of β each coset must have the same size κ = m/β.

The following theorem shows how it is possible to construct g(x) by applying Lemma 3.2

and choosing pairs of terms from within the same coset.

Theorem 3.4. Let κ = 2y and m = βκ for y, β ∈ Z+. If p, s ∈ S are in the same coset

as specified by Lemma 3.2, then xκp + xκs ≡ 0 mod (xm + 1).

Proof. If p and s are in the same coset then s = p+δβ for some constant δ ∈ Z. Therefore

xκp + xκs = xκp + xκ(p+δβ)

= xκp + xκpxκδβ

= xκp(1 + xδm),

where, in the last line, the substitution β = m/κ has been made. By Lemma 3.1 note

that

1 + xδm = (1 + xm)δ

≡ 0 mod (xm + 1)

Hence, xκp + xκs ≡ 0 mod (xm + 1).

By choosing p and s from the same coset and then setting ap = as = 1 we may select

pairs of terms to include in g(x) such that (3.8) is satisfied.

54


3.4.2 Enforcing the Overlap Constraint

So far we have a means of selecting candidate terms for h(x) such that it satisfies (3.6).

We now require a means of enforcing the overlap constraint, i.e. blocking 4-cycle creation

in the graphical representation of Hp, as we build h(x).

We must constrain Hp such that the maximum overlap between any two rows is

one. Given that the remaining rows of Hp are cyclic shifts of the first row, the separation

between nonzero elements in the first row is therefore of interest. We now define a test for

4-cycle presence using h(x). We let S denote possible choices for the position of nonzero

terms in h(x) and H ⊂ S contain the chosen positions for nonzero terms in h(x).

Lemma 3.3. 1 A circulant matrix Hp with nonzero elements specified by H = {p, q, s, w}will contain 4-cycles if and only if

(s − w) ≡ (p − q) mod m (3.10)

for some p 6= q, s 6= w, p 6= s and q 6= w.

Proof. Due to the cyclic nature of Hp, if an overlap occurs it will be replicated through

the matrix and hence we need only to test the first row against all other rows. Assign the

index 0 to the first row of Hp. Row r ∈ {1 . . . m−1} is then a cyclic shift of row 0 modulo

m. In order for an overlap to occur between the first row and row r, it is sufficient that

(s + r) ≡ p mod m and (w + r) ≡ q mod m,

which, under modulo m addition, is equivalent to (3.10). Furthermore, in order for an

overlap to exist between two nonzero terms s and w it is necessary that

(s + r) ≡ w mod m and (w + r) ≡ s mod m,

which, under modulo m addition, reduces to the specific case of the initial statement

when p = w and q = s.

It is apparent from Lemma 3.3 that in order to detect the overlap between nonzero

terms p and q we must consider both the separation (p − q) mod m and also (q − p)

mod m. We therefore need a means of classifying the separation between two terms.

Definition 3.2 (Separation Class). Consider two pairs of nonzero terms at positions

{p1, q1} and {p2, q2} where p1, q1, p2, q2 ∈ S and p1 6= q1, p2 6= q2. These pairs are in

1While finalising this thesis we became aware of an independent discovery of Lemma 3.3 [97].

55


the same separation class, labelled by the unordered pair {t1, t2} ∈ S if t1 ≡ (p1 − q1) ≡(p2 − q2) mod m and t2 ≡ (q1 − p1) ≡ (q2 − p2) mod m.

The pair of elements {t1, t2} in a separation class label are additive inverses modulo

m. Hence, for even m, let T be the set of distinct separation class labels {{1,m −1}, {2,m − 2}, . . . , {m/2,m/2}}. Note that T does not contain {0, 0} as p 6= q by

definition, and hence |T | = m/2. From this point forward, with some abuse of notation,

we will use the term separation class to refer to a separation class label {t1, t2}.

Theorem 3.5. 2 Let H ⊂ S contain the chosen positions for nonzero terms in h(x). Let

D be the list of all separation classes that may be formed by selecting pairs of elements from

H. Note that |D| =(

|H|2

)

. Let Hp be the m × m circulant matrix corresponding to h(x),

where m ∈ 2y, y ∈ Z+. The overlap constraint will be violated for Hp if {m/2,m/2} ∈ D,

or if any member of T appears in D more than once.

Proof. If {m/2,m/2} ∈ D then there must be a pair of nonzero terms at positions

p, q ∈ H, p 6= q, such that (p − q) mod m = m/2 = (q − p) mod m. This violates the

overlap constraint according to Lemma 3.3.

Assume that a member {t1, t2} appears twice in D and hence | H |> 2. The elements

in H appear without repetition. So, there must exist an assignment of labels p, q, s, w ∈ Hto these elements, for p 6= q, s 6= w, p 6= s and q 6= w, such that

(p − q) ≡ (s − w) ≡ t1 mod m

(q − p) ≡ (w − s) ≡ t2 mod m

These equivalence relationships violate Lemma 3.3.

When building h(x) Theorem 3.5 may therefore be used to check for the introduction

of 4-cycles. We assume D based upon the contents of H and then let C ⊂ T be the set

of all separation classes that may be formed between a candidate nonzero element to

be added to H and the current contents of H. Hence if C ∩ D is not empty then the

introduction of the candidate element will violate the overlap constraint.

2While finalising this thesis we became aware of an independent discovery of Theorem 3.5 [97].

56


3.4.3 Building Reversible LDPC Codes from Circulants

Based upon the work presented in the previous sections, the following algorithm provides

a method for building circulant matrices suitable for use in reversible codes. We label

this subclass of reversible code as follows.

Definition 3.3 (Type-I Reversible Code). A type-I reversible code has Hp constructed

as a circulant matrix.

Algorithm 3.2: Construction of Hp for a Type-I Reversible Code

Initialise:1

Choose input parameters:j : column weight for Hp, such that j ≥ 3 and odd.κ : number of iterations required to encode, such that κ = 2y for y ∈ Z

+.m : size of Hp, such that m = βκ, where β ∈ Z

+.Initialise output parameters:Let the output polynomial h(x) be represented by the set H ⊂ S, containing thechosen positions for nonzero terms in h(x), where S = {0, 1, . . . ,m − 1}.Initialise H = {0} and h(x) = 1.Initialise internal parameter:Let D be the list of all separation classes that may be formed by selecting pairsof elements from H. Initialise D to be the empty set.Generate Cosets:2

Group S into cosets of {0, β, 2β, . . . ,m − β}.Select Candidate Terms:3

Select a candidate pair of terms from a coset created in Step 2.Test for Overlap4

Let C be the set of all separation classes (Def. 3.2) that may be formed betweenthe candidate terms to be added to H and the current contents of H. If C doesnot contain {m/2,m/2} and C ∩ D is empty, then add the candidate terms to Hand h(x), and update D.Stop/Continue:5

If |H| = j then exit declaring success. Otherwise, if all possible pairedcombinations from S have been tried then exit declaring failure. Otherwisereturn to step 3.

Theorem 3.6. If Algorithm 3.2 terminates with success, then the first row polynomial

corresponding to H will generate a 4-cycle free circulant matrix which is iteratively en-

codable in κ iterations.

Proof. In Step 3, candidate terms for H are selected from the cosets of {0, β, 2β, . . . ,m−β}. Hence the iterative encodability constraint will always be enforced, according to

Theorem 3.4, with κ iterations required for convergence. A candidate pair is rejected

57


if it would introduce an overlap in Step 4, according to Theorem 3.5. Hence if the

algorithm terminates successfully with | H |= j then the polynomial represented by Hwill correspond to an iteratively encodable 4-cycle free circulant matrix.

It is now natural to wonder what the minimum number of encoder iterations κ is,

for which we can build a polynomial of weight j.

Theorem 3.7. A circulant matrix Hp with odd column weight j, constructed using Al-

gorithm 3.2, will require κ > j iterations to encode, where κ = 2y for y ∈ Z+.

Proof. Consider a coset of {0, β, 2β, . . . ,m−β} generated in Step 2 of Algorithm 3.2. Let

the set of distinct separation classes (Def. 3.2) that can be formed by selecting pairs of ele-

ments from the coset, in Step 3, be E = {{β, (κ−1)β}, {2β, (κ−2)β}, . . . , {κβ/2, κβ/2}}.Once we have selected a pair of elements which form a separation class {t1, t2} ∈ E we

may not select another pair of elements from any coset which also form {t1, t2}. Doing

so would violate the overlap constraint test in Step 4. We also note that choosing a

pair of elements which form the separation class {κβ/2, κβ/2} = {m/2,m/2} will also

violate this constraint. Hence the total number of separation classes that are available

for selection without violating the overlap constraint is | E | −1 = κ/2 − 1.

In Step 1 we initialise the coefficient of x0 in h(x) to be h0 = 1 and hence the initial

weight of h(x) is one. So, to create a matrix of odd weight j, we will need to add (j−1)/2

pairs of terms. Each pair of terms will form a separation class. Therefore, to create a

weight j matrix without violating the overlap constraint we will need at least (j − 1)/2

separation classes. Hence we require κ/2 − 1 ≥ (j − 1)/2 and thus κ > j.

We may use the above theorem to select an appropriate value for κ in the initialisation

step of Algorithm 3.2, based upon the target weight for h(x).

We now present a method for constructing (3,6)-regular, 4-cycle free, reversible codes.

Using Algorithm 3.2 we may generate a 4-cycle free iteratively encodable matrix Hp with

column weight j = 3. By Theorem 3.7 we therefore seek a circulant which is encodable

in κ = 4 iterations.

Setting m = 8 and κ = 4, Algorithm 3.2 returns the candidate polynomial h(x) =

1 + x + x3. The circulant matrix Hp which corresponds to h(x) follows.

58


Hp =

1 1 0 1 0 0 0 00 1 1 0 1 0 0 00 0 1 1 0 1 0 00 0 0 1 1 0 1 00 0 0 0 1 1 0 11 0 0 0 0 1 1 00 1 0 0 0 0 1 11 0 1 0 0 0 0 1

We note that this polynomial has the form h(x) = 1+x+xm/4+1 and now show that

this general form may be used to generate larger matrices.

Theorem 3.8. If Hp is a binary m×m circulant matrix, where m = 2y for y ∈ Z+, y > 2,

built from cyclic rotations of the first row polynomial h(x) = 1 + x + xm/4+1, then Hp is

4-cycle free (3,3)-regular matrix, which is iteratively invertible using 4 iterations of the

Jacobi method over F2.

Proof. Given that the weight of h(x) is 3 and the transpose of a circulant matrix is also

circulant, it follows that Hp is (3,3)-regular.

By Lemma 3.1

h4(x) = xm+4 + x4 + 1

≡ 1 mod (xm + 1)

Hence h(x) satisfies (3.7) for κ = 4 and Hp is iteratively invertible using 4 iterations of

the Jacobi method over F2.

Let H correspond to the positions of nonzero terms in h(x). The set of separation

classes that may be formed by selecting pairs of elements from H is

D = {{1,m − 1}, {m/4, 3m/4}, {m/4 + 1, 3m/4 − 1}}The members of D are necessarily unique and the set does not include {m/2,m/2}.

Hence, by Theorem 3.5, Hp is 4-cycle free.

We complete H for a (3,6)-regular reversible code by randomly building a (3,3)

distributed Hu, whilst blocking the introduction of 4-cycles.

3.5 Summary

We have investigated iterative approaches to encoding LDPC codes. In particular we

have presented a new encoding algorithm, based upon the Jacobi method for iterative

59


matrix inversion. The new algorithm allows the decoder architecture to be re-used for

encoding. Our investigations have also drawn a link between iterative encoding/decoding

and classical iterative matrix inversion techniques. Re-use of the decoder architecture has

the potential to reduce the size of the codec. In particular, by encoding on the factor

graph of the code, we are able to reuse the wire routing in place for the decoder. Routing

represents a significant amount of area in an LDPC decoder implementation. Thus being

able to reuse it for encoding is advantageous. Moreover, the proposed algorithm allows

parallel implementation, such that the number of iterations required to encode is fixed

by design. Hence the computational latency of the encoder remains constant as we scale

code length.

As discussed in Chapter 2, we may separate existing algorithms for the practical

encoding of LDPC codes into two categories. The first approach is to design (or trans-

form) codes such that they have a triangular (or approximate triangular) form, allowing

the encoder to employ back substitution. The second approach involves the use of cyclic

or quasi-cyclic code designs, and allows encoding to be performed using a shift register.

Moreover, both of these approaches necessarily involve serial computation, implying a

latency that scales linearly with code length. This stands in contrast to the fact that the

algorithm proposed in this chapter allows an architecture to be built which has compu-

tational latency that is fixed as we scale code length.

All codes having parity-check matrices that are upper triangular are Jacobi encod-

able. However, this is a sufficient but not necessary condition for iterative encodability.

Furthermore, although the methods presented in this chapter focus upon the use of cir-

culant matrices, it is not a requirement that Hp be circulant. Hence, the iterative Jacobi

approach offers a structural alternative to the triangular and cyclic based approaches

discussed in Section 2.7.

We have shown how a class of reversible codes may be constructed that are Jacobi

encodable. Furthermore, the techniques developed for constructing these codes guarantee

that they are 4-cycle free. The iterative encodability constraint only applies to a section of

the parity-check matrix, thus providing flexibility for the design of code rate and length.

In the following two chapters we present the empirical performance and a thorough

analysis of some reversible codes. We first investigate (3,6)-regular codes, built using

the methods described above, and then explore alternative reversible code structures.

Following that, we present a novel codec circuit implementation, which extends the sum-

product decoder to include the iterative encoding algorithm.

60

Chapter 4

Performance Analysis

4.1 Introduction

Here we use the methods described in the previous chapter to build reversible LDPC

codes. We compare the performance of reversible LDPC codes to that of randomly

generated LDPC codes. Simulation results are provided for the additive white Gaussian

noise and binary erasure channels, as described in Section 1.1.5. An explanation of the

observed behaviour is then provided from several analytical tools.

We study the reversible LDPC codes listed in Table 4.1. These rate 1/2, (3,6)-regular

codes are all encodable using four iterations of the Jacobi algorithm. The parity-check

matrix of each code has the form [Hp|Hu]. Here Hp is a circulant matrix with the first row

polynomial h(x) coming from Theorem 3.8. In each case Hu is a (3,3)-regular randomly

generated matrix, chosen such that the graph of H is 4-cycle free.

The set of randomly constructed LDPC codes listed in Table 4.2 are used as a bench-

mark for comparison. Each of these rate 1/2, (3,6)-regular codes has a corresponding

entry in Table 4.1 of approximately the same block length. Where possible the ran-

dom codes have been obtained from MacKay’s online archive [98], in order to provide a

standard point of reference. All of these codes are 4-cycle free.

This chapter contains original work, which was presented in part at the 2002 IEEE

Globecom conference, in Taipei, Taiwan [93]. Some of the results were also presented at

Code Index Block Length (n) Rows in H (m) h(x)Rev64 64 32 1 + x + x9

Rev512 512 256 1 + x + x65

Rev1024 1024 512 1 + x + x129

Rev4096 4096 2048 1 + x + x513

Table 4.1: Reversible LDPC codes.

61

Chapter 4. Performance Analysis

Code Index Block Length (n) Rows in H (m) ConstructionRand64 64 32 MacKay 1A (see Figure 2.1(a))Rand504 504 252 MacKay 252.252.3.252Rand1008 1008 504 MacKay 504.504.3.504Rand4000 4000 2000 MacKay 4000.2000.3.243

Table 4.2: Randomly constructed benchmark LDPC codes.

the 2002 Australian Communications Theory Workshop, Canberra, Australia [94].

4.2 Performance on the AWGN Channel

In this section we empirically measure the performance of the codes over the AWGN

channel. The experiments were performed using the soft decision sum-product decoder

(Algorithm 2.2), for a maximum of 50 iterations.

We compare the performance of each reversible code to its corresponding random

benchmark code in the figures that follow. For the random codes, the performance

measured across the information bits matches that measured across the full codeword,

and hence it has not been plotted. However, this is not the case for the reversible codes.

Here we observe that, in general, the BER measured across the full codeword is higher

than that measured across the information bits alone. Hence the error rate curves are

shown for both the full codeword and information section. All simulation points shown

represent a minimum of 50 word errors, measured across the information section of the

codeword.

The reversible and random codes with length n = 64 exhibit similar performance, as

shown in Figure 4.1. The reversible code exhibits bit and word error rate floors at around

10−7 and 10−6 respectively. The BER and WER of the reversible code measured across

the full codeword matches that measured across the information bits alone. Figure 4.2

shows the performance of the Rev512 code matching that of the Rand504 code, until

flooring begins at bit and word error rates of around 10−5 and 10−3 respectively. As the

SNR is increased for the reversible code, the BER and WER measured across the full

codeword becomes higher than that measured across the information bits alone. Similar

behaviour is shown in Figure 4.3 for the Rev1024 code, in comparison to the Rand1008

code. Here the BER and WER of the reversible code begin to floor at around 10−5

and 10−2 respectively. This behaviour is also exhibited by the Rev4096 code, which is

compared to the Rand4000 code in Figure 4.4. Here we see the BER and WER of the

reversible code, measured across the information bits alone, begin to floor at around 10−5

62


and 10−2 respectively. Moreover, the BER and WER measured across the full codeword

are significantly higher, with the BER beginning to floor at around 10−3 and the WER

beginning to floor almost immediately.

Since collecting these results we have discovered that the choice of the check output

clipping parameter η, described in Section 2.5.1, can have a large affect upon the per-

formance of the reversible codes. All results for the reversible codes were obtained using

η = 1 − 10−4. In Figure 4.5 we see how the BER and WER vary in relation to η. The

values plotted here for the Rev4096 code were taken at an SNR of 2dB, with a minimum

of 100 word errors per point, measured across the information section of the codeword.

Hence we may lower the floor observed in Figure 4.4, by instead selecting η = 1 − 10−2.

In contrast to the above, we have observed that applying a hard limit to the check

outputs can slightly increase the error rate for the random benchmark codes at high SNR.

However, the effect was much smaller than that observed for the reversible codes, and

was not visible for the length 4000 code. Hence, when decoding the random codes we

have chosen a high clipping limit, η = 1−10−10, in order not to distort their performance.

This limit is imposed purely to prevent numerical overflow.

We now summarise the above results for the reversible codes. An analytical account

for the observations follows in Section 4.4.

Unequal Bit Error Protection We observe a much higher error rate for the parity bits

of the codeword than for the information bits. This indicates a potential weakness

for message-passing in the code subgraph corresponding to Hp.

Unequal Word Error Protection Although the unequal bit error protection is not

always detrimental to the information bit error rate of the code, it does increase

the word error rate. A word error is declared if the decoder runs for the full allowed

iteration count without converging, i.e. if the stopping criteria has not been met.

The WER measured across the full codeword is therefore higher than that measured

across the information bits alone.

Error Floor The reversible codes display an error floor which appears at a different

level for each of the codes. We note that the flooring becomes more significant as

the code length is increased.

Dependence upon Clipping The bit and word error rates appear to be dependent

upon the choice of the clipping parameter η. By lowering η we may lower the

63


1 2 3 4 5 6 710

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

Uncoded BPSKRev64: InfoRev64: FullRand64

Eb/N0(dB)

BE

R

(a) Bit error rate

1 2 3 4 5 6 710

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Rev64: InfoRev64: FullRand64

Eb/N0(dB)

WE

R

(b) Word error rate

Figure 4.1: Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n = 64, on the AWGN channel.

64


1 1.5 2 2.5 3 3.5 410

−7

10−6

10−5

10−4

10−3

10−2

10−1


Eb/N0(dB)

BE

R

(a) Bit error rate

1 1.5 2 2.5 3 3.5 410

−6

10−5

10−4

10−3

10−2

10−1

100


Eb/N0(dB)

WE

R

(b) Word error rate

Figure 4.2: Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 500, on the AWGN channel.

65


1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 310

−7

10−6

10−5

10−4

10−3

10−2

10−1


Eb/N0(dB)

BE

R

(a) Bit error rate

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

10−4

10−3

10−2

10−1

100


Eb/N0(dB)

WE

R

(b) Word error rate


66


1 1.5 2 2.5 3 3.510

−7

10−6

10−5

10−4

10−3

10−2

10−1


Eb/N0(dB)

BE

R

(a) Bit error rate

1 1.5 2 2.5 3 3.5

10−4

10−3

10−2

10−1

100


Eb/N0(dB)

WE

R

(b) Word error rate


67


1 2 3 4 5 6 7 8 9 1010

−7

10−6

10−5

10−4

10−3

10−2

10−1

Information bitsFull codeword

− log10(1 − η)

BE

R

(a) Bit error rate

1 2 3 4 5 6 7 8 9 1010

−7

10−6

10−5

10−4

10−3

10−2

10−1

Information bitsFull codeword

− log10(1 − η)

WE

R

(b) Word error rate

Figure 4.5: Shifting the error floor for Rev4096, at Eb/N0 = 2dB, by varying η.

information error rate floor by a greater margin than that for the full codeword.

Again this indicates a weakness specific to the subgraph for Hp.

Dependence upon Block Length In general, the shorter reversible codes fair better

against their respective benchmarks than the longer codes do.

No Undetected Errors Undetected errors were observed only in the case of the very

short codes, having block length n = 64. We did not observe undetected errors for

any of the other codes.

4.3 Performance on the Binary Erasure Channel

In the figures that follow, we compare the performance of each reversible code to its corre-

sponding random benchmark code, for transmission over the binary erasure channel. The

experiments were performed using the binary erasure decoder described in Section 2.5.2.

All simulation points shown represent a minimum of 50 word erasures, measured across

the information section of the codeword.

The random and reversible length 64 codes exhibit the same performance on the

BEC, as shown in Figure 4.6. The bit and word erasure rates of the reversible code,

measured across the full codeword, match those measured across the information bits

alone. Figure 4.7 shows that the bit erasure performance of the Rev512 code closely

matches that of the Rand504 code, above the channel bit erasure probability of ǫ ≈ 0.31.

Below this point, the bit erasure rate measured across the full codeword becomes higher

than that measured across the information bits alone. The word erasure rate measured

68


across the full codeword matches that measured across the information bits alone, and

exhibits a floor at around 10−5. Similar behaviour is shown in Figure 4.8 for the Rev1024

code, in comparison to the Rand1008 code. Here the bit erasure rate of the reversible code

matches that of the random code. For the Rev1024 code, the bit erasure rate measured

across the full codeword becomes higher than that measured across the information bits

alone, below ǫ ≈ 0.35. The word erasure rate of the reversible code begins to floor at

around 10−5. Performance of the Rev4096 code is compared to that of the Rand4000

code in Figure 4.9. The bit erasure rate measured across the information bits alone,

begins to floor at around 10−8. The bit erasure rate measured across the full codeword

becomes significantly higher than this at ǫ ≈ 0.39, beginning to floor at around 10−7.

The word erasure rate measured across the full codeword matches that measured across

the information bits alone, and begins to floor at around 10−4.

A summary of results for the reversible codes follows, with an analytical account

provided in Section 4.4.

Unequal Bit Erasure Protection As observed for the case of the AWGN channel, the

information bits appear to be more heavily protected than the parity bits. This

inequality begins to surface as we decrease the channel erasure probability and is

most visible for the longer codes.

Equal Word Erasure Protection In contrast to the AWGN case, the word erasure

rate measured across the information bits matches that measured across the full

codeword. This indicates that on average, when the decoder fails to correct all

erasures, the set of remaining erasures contains both information and parity bits.

Erasure Floor We observe a floor on the decoded erasure rate, which again appears to

be more significant for the longer codes.

Dependence upon Block Length The comparative performance of the reversible codes,

against their respective benchmark codes, appears to deteriorate with increased

block length. This observation is common to both channels.

4.4 Analysis of Codes and Decoder Behaviour

In this section we employ recently developed theoretical tools to analyse the behaviour of

the reversible LDPC codes presented above. We recognise that we have an error control

system that represents the pairing of a code to a sub-optimal decoder. It is therefore

69


0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100


Channel bit erasure probability, ǫ

Dec

oded

bit

eras

ure

rate

(a) Bit erasure rate

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

wor

der

asure

rate

(b) Word erasure rate

Figure 4.6: Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n = 64, on the binary erasure channel.

70


0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

bit

eras

ure

rate


0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

wor

der

asure

rate


Figure 4.7: Comparing the performance of reversible and random (3,6)-regular LDPCcodes, with length n ≈ 500, on the binary erasure channel.

71


0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

bit

eras

ure

rate


0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

wor

der

asure

rate



72


0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

bit

eras

ure

rate


0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

100



Dec

oded

wor

der

asure

rate



73


important not only to analyse the code but to also consider how the structure of the code

is suited to the operation of the decoding algorithm. We note that the analytical tools

each have a different purpose. For example, poor minimum distance exposes a weakness

of the code without considering the decoder, stopping set analysis is more appropriately

applied to the BEC than the AWGN channel, and so on. For an introduction to the

analytical methods used in this section the reader is referred to Section 2.8.

4.4.1 Minimum Distance

We first explore bounds on the minimum distance for the above reversible codes. The

problem of evaluating the minimum distance of an LDPC code is intractable, except

for codes with very small block length [99]. Some analytical bounds on dmin are well

known [67, 100]. However, we show here that some very simple bounds on dmin are

sufficient to suggest a weakness for the larger circulant based reversible structures.

Recall that the minimum distance of a linear block code is equal to the weight of its

lowest weight nonzero codeword. For a small code we may search for such a codeword, by

basing the search on low weight information vectors. We generate all possible information

vectors with weight w (xu) ≤ ν and then encode this set. We choose ν to be small enough

that the search is computationally feasible. Let de denote the weight of the member in

the encoded set which has the lowest weight. This value provides an upper bound on

dmin. If de ≤ ν, then this gives dmin = de. All codewords in a (3,6)-regular code have even

weight [5]. Hence, if ν is odd and de > ν then dmin ≥ ν + 1. Moreover, if de = ν + 1, and

ν is odd, then we may conclude that dmin = ν + 1.

The Rev64 code is small enough that we are able to find its lowest weight codeword

using such a search. Table 4.3 shows a partial weight distribution, corresponding to a

subset of codewords generated from all nonzero information vectors of weight w (xu) ≤ 5.

We see that the code has dmin = 6. Using the same approach for the Rev512 code, this

time for w (xu) ≤ 3, yields the partial weight distribution given in Table 4.4. From this

partial distribution we can conclude that 4 ≤ dmin ≤ 14.

w (x) 6 8 10 12 14 16 18 20 22 24 26Count 1 444 1260 5989 13293 32043 60560 68045 48590 11803 796

Table 4.3: Partial weight distribution of Rev64, for w (xu) ≤ 5.

As a result of Theorem 2.1, it is well known that preventing 4-cycles in a (j, i)-

regular LDPC code lower bounds dmin ≥ j + 1. The above (3,6)-regular reversible codes

74


w (x) 14 16 18 20 22 24 26 28 30 32 34 36 38 40Count 1 3 4 8 43 41 94 332 284 629 988 1757 3279 4923

Table 4.4: Partial weight distribution of Rev512, for w (xu) ≤ 3 (first 14 entries only).

are 4-cycle free by design, and hence have dmin ≥ 4.

Recall that the rows of the generator matrix, G, are themselves codewords. Hence,

an upper bound on minimum distance comes from the weight of the lowest weight row

in G. These bounds are summarised in Table 4.5.

Code Index Lower From Upper FromRev64 6 search 6 searchRev512 4 search 14 searchRev1024 4 4-cycle free 16 min row weight in GRev4096 4 4-cycle free 16 min row weight in G

Table 4.5: Bounds on dmin for the reversible codes.

Finally, we provide an upper bound on the minimum distance of any (3,6)-regular

reversible code with a circulant Hp that is encodable in four iterations.

Theorem 4.1. Consider a (3,6)-regular reversible LDPC code, with parity-check matrix

H constructed such that Hp is an m × m circulant matrix with first row polynomial

h(x) = 1+xr +xs. If the code is encodable using 4 iterations of the Jacobi algorithm then

its minimum distance is upper bounded by dmin ≤ 28, independently of the block length.

Proof. As Hp is encodable in 4 iterations we have H4 = I ⇒ H−1

p = H3p. Expanding

h3(x) gives the polynomial c(x) = 1 + xr + xs + x2r + x2s + x2r+s + xr+2s + x3r + x3s

mod (xm + 1). H−1

p is therefore a circulant matrix having a maximum possible column

weight of 9, when no terms in c(x) cancel. Consider Hsys = [Im|P⊤] = H−1

p H. Given

that Hu is (3,3)-regular, the maximum weight of a column in P⊤ = H−1

p Hu is 27. Hence

the maximum row weight of G = [P|Ik], and thus an upper bound on dmin, is 28.

From Theorem 4.1, reversible codes that are encodable in four iterations and built

using a circulant matrix for Hp, have a minimum distance that does not scale well with

block length. However, this does not fully explain our experimental observations. We

expect a poor minimum distance to cause the decoder to make undetected errors, yet

none occurred for the reversible codes with n > 64. For the case of n = 64 they were

observed for the random code also, as is expected to be the case for such a short block

length. Instead, the observed performance floor appears to be related to problems with

75


decoder convergence. Minimum distance is a property of the code which is independent

of the chosen decoding algorithm. In the following analysis we consider the structure of

the reversible codes in relation to the operation of the decoder.

4.4.2 Stopping Sets and Extrinsic Message Degree

In Section 2.8.2 we introduced stopping set analysis, as an appropriate tool for investi-

gating the behaviour of message-passing decoding of LDPC codes over the binary erasure

channel [24]. Let the set of erasures made by the channel be denoted E , and those remain-

ing when the decoder fails be denoted S. Recall that S represents the maximal stopping

set from E . We now consider the simulation results for the Rev4096 code shown in Fig-

ure 4.9. We empirically observe the final state of the decoder for each word erasure event

in the result sample, when the channel bit erasure probability is ǫ = 0.375. Five different

stopping set configurations exist in the sample, each having a different size. Figure 4.10

shows how the occurrence of these sets is distributed over the fifty word erasure events

in the empirical sample space.

10 11 12 13 14 15 16 17 180

5

10

15

20

25

Stopping set size (|S|)

Cou

nt

Figure 4.10: Occurrence of maximal stopping sets for Rev4096, when ǫ = 0.375.

Table 4.6 lists each stopping set configuration observed for Rev4096 when ǫ = 0.375.

Each set of size |S| consists of |S| − 1 parity bits and only one information bit. This

structure is in agreement with the observed inequality in bit erasure protection. It is also

worth noting that it is the same information bit which appears in all five stopping set

structures.

Figure 4.11 shows a subset of the graph corresponding to Hp for the Rev4096 code.

76


|S| Parity Info11 484, 486, 996, 997, 998, 999, 1508, 1509, 2020, 2021 324512 484, 485, 486, 996, 997, 998, 999, 1508, 1509, 2020, 2021 324514 483, 485, 486, 995, 996, 997, 998, 999, 1507, 1509, 2019, 2020, 2021 324516 482, 484, 485, 486, 994, 995, 996, 997, 998, 999, 1506, 1509, 2018, 3245

2019, 202117 480, 485, 486, 992, 993, 997, 998, 999, 1504, 1506, 1509, 2016, 2017, 3245

2018, 2019, 2021

Table 4.6: Stopping set configurations.

Here we see that the cyclic nature of Hp leads to a lattice1 structure in the graph. Variable

(bit) nodes are represented on the graph by circles and check (constraint) nodes by boxes.

Each node is labelled by its index, and members of the size |S| = 11 stopping set are

highlighted. Checks 999 and 1509 have only a single connection into the parity bit subset

of this stopping set. Information bit 3245 is connected to checks 999, 1509 and 2019.

Note that this information bit does not appear in Figure 4.11, as it lies on the graph of

Hu. By considering the complete graph of H and connecting this information bit, we

close the stopping set. Similar structures appear on the lattice for all of the configurations

listed in Table 4.6. In each case, the information bit plays a crucial role in closing the

stopping set.

The lattice may be extended to completely represent the graph of Hp, as shown

in Figure 4.12. This general graph represents Hp for all codes listed in Table 4.1, by

assigning node labels such that β = m/4. Each row of the lattice contains four nodes,

and consists entirely of either variables or checks. There are a total of β rows of each

type.

The lattice structure contains small sets of variables which have a low extrinsic

message degree [25]. An example set shown in Figure 4.12 contains nine parity bits.

All checks connected to the set are connected twice, with the exception of the check

labelled 3β + 4. Hence the set has an extrinsic message degree (Def. 2.13) of one. The

repetitive form of the lattice causes this undesirable structure, and others like it, to be

replicated throughout the graph. These structures increase the probability of stopping

set creation and contribute to the observed floor in performance. However, as they have a

nonzero EMD they require the inclusion of one or more information bits for stopping set

closure. This accounts for the observation that the word erasure rate measured for the full

1We use the term lattice here to describe the appearance of the graph with respect to the repeatedpattern of node connectivity. The term is not used in accordance with its mathematical meaning, as thismeaning does not make sense in the context of our discussion.

77


483 995 1507 2019

20201508996484

484 996 1508 2020

20211509997485

485 997 1509 2021

20221510998486

486 998 1510 2022

20231511999487

487 999 1511 2023

Figure 4.11: Parity bits in the |S| = 11 stopping set.

codeword matches that measured for the information bits alone. At high channel erasure

rates the maximal stopping set for a given set of erased bits consists of smaller component

stopping sets. The overlapping of such sets accounts for the observed symmetry across

the number of information and parity bits in the maximal stopping set. However, at low

channel erasure rates, the maximal stopping set is often also the minimal stopping set

(ignoring the empty set). In these cases, the heavy weighting of parity bits within the

set causes the observed bit erasure rate measured across the information bits to be less

than that measured across the full codeword.

4.4.3 Cycles and Near-Codewords

We now turn our attention to the experiments performed in presence of AWGN noise. In

this case we are interested in the behaviour of the message-passing sum-product decoder,

in particular at high SNR where the performance of the reversible codes begins to diverge

from that of the random codes.

In contrast to the equal word error protection that we observed on the BEC, here we

observe error events which only affect parity bits. We investigate results for the Rev4096

code at an SNR of 2dB, setting the decoder clipping parameter η = 1 − 10−4. There

78


β 2β 3β

3β+12β+1β+11

1 β+1 2β+1 3β+1

3β+22β+2β+22

2 β+2 2β+2 3β+2

3β+32β+3β+33

3 β+3 2β+3 3β+3

3β+42β+4β+44

4 β+4 2β+4 3β+4

3β2ββ4β

4β3β2ββ

β 2β 3β 4β

4β

Figure 4.12: Graph of Hp viewed as a lattice.

are a total of 931 word errors, although only 100 of these affect information bits. The

histograms in Figure 4.13 correspond to the observed error vector weight, i.e. a count of

the number of positions in which the transmitted and decoded vectors differ. We compare

the histogram for the full codeword to that taken across only the parity section of the

codeword.

The distribution of low weight error vectors is dominated by those which corrupt

only parity bits. In particular we see a large number of errors vectors, e, with weight

w (e) = 4. A closer inspection of these vectors shows that they all conform to the pattern

e(x) = xs + xβ+s + x2β+s + x3β+s mod (xm + 1), (4.1)

where β = m/4 and s ∈ {0 . . . m − 1}. Here, the algebraic representation e(x) of e has

79


0 5 10 15 20 25 30 350

50

100

150

200

250

300

350

Error vector weight

Cou

nt

(a) Full codeword

0 5 10 15 20 25 30 350

50

100

150

200

250

300

350

Error vector weight

Cou

nt

(b) Parity section only

Figure 4.13: Error vector weight histograms for Rev4096 at Eb/N0 = 2dB.

the same form as that used to represent the first row vector of Hp in Section 3.4. This

error pattern corresponds to a row of variables in the lattice representation of Hp, as

shown in Figure 4.12. Although the lattice is 4-cycle free, it contains many 6-cycles2.

Each variable in a row of the lattice is connected to its neighbours on either side via a

6-cycle, while an 8-cycle connects all variables in the row, as shown in Figure 4.14.

3β+s2β+sβ+ss

s β+s 2β+s 3β+s

8-cycle

6-cycle

Figure 4.14: Cycles in a variable row of the lattice for Hp.

As discussed in Section 2.8.4, MacKay and Postol have proposed that near-codewords

can cause convergence problems for the message-passing decoder [79]. Near-codewords

can arise from the connection of short cycles in the graph of H, and hence it is not

surprising that the error pattern e(x) corresponds to a near-codeword. More precisely, if

e is a vector with ones positioned according to e(x), having zero valued entries elsewhere,

then e is a (4, 4) near-codeword. In this case, the weight of the syndrome for e corresponds

2In fact, any graph built from circulants having row weight i ≥ 3 has at most girth six [101].

80


to the extrinsic message degree of a row of variables in the lattice.

A similar method may be used to account for other low weight error vectors which

dominate the distribution. In particular, the pattern of parity bits with an EMD of one,

shown in Figure 4.12, represents a (9, 1) near-codeword.

In order for these structures to cause a word erasure on the BEC, it is necessary to

include at least one information bit to close the stopping set. However, on the AWGN

channel the existence of these near-codewords alone appears sufficient to inhibit decoder

convergence and thus cause a detected word error. Hence we see separation of the infor-

mation section and full codeword word error rates on the AWGN channel, in contrast to

the BEC observations.

4.4.4 Finite Graph-Covers

Our empirical results suggest that the decoder arrives at near-codewords, due to decoder

convergence problems on the AWGN channel. However, we do not yet have a precise

explanation for this. To this end, we now extend our analysis by using the finite graph-

cover techniques of Kotter and Vontobel [27, 28], introduced in Section 2.8.6.

In order to simplify the analysis in this section, our discussion will assume the trans-

mission of the all-zero codeword. As suggested in [27], we may build a pseudo-codeword

ωe using a procedure similar to that of canonical completion (Def. 2.22), rooted at the

(4, 4) near-codeword e defined by (4.1). At this point we ignore Hu and operate only

on the graph corresponding to Hp. This process is illustrated in Figure 4.15, using the

general lattice for the reversible codes. Variable tiers of the lattice are labelled consec-

utively, starting at the root tier, t = 0, which corresponds to the nonzero elements of

e. The lattice is constructed to a depth t = β − 1, such that all parity variables appear

exactly once. Each variable vs on tier t is then assigned the value ωs = 1/2t. These

values are shown to the right of each variable. The assignment ensures that all local

indicator functions (Def. 2.21) are satisfied based upon the parity variables alone. Thus,

appending an all-zero information sequence generates a valid pseudo-codeword.

It is straightforward to show that the pseudo-weight (Def. 2.20) of the pseudo-

codeword ωe constructed using the above technique, has the property that wp (ωe) < 12

independently of the code length. A similar process can be used to find other pseudo-

codewords, e.g. by rooting the canonical completion at the (9, 1) near-codeword discussed

in the previous section. Furthermore, these low weight pseudo-codewords are replicated

through the structure of the lattice.

81


1/2

β 2β 3β

3β+12β+1β+11

1 β+1 2β+1 3β+1

3β+22β+2β+22

2 β+2 2β+2 3β+2

3β+32β+3β+33

3 β+3 2β+3 3β+3

4β3β2ββ

β 2β 3β 4β

4β

1111

1/21/21/21/2

1/41/41/41/4

1/21/21/2 β−1β−1β−1β−1

t=0

t=1

t=2

t=β−1

Figure 4.15: Building a pseudo-codeword on the lattice for Hp.

To build this example pseudo-codeword we have considered only the graph of Hp.

Hence ωe has an all-zero information section. We have found that the reversible codes

support other pseudo-codewords which have lower pseudo-weight than ωe, by considering

the full graph of H and allowing nonzero elements in the information section3.

The analysis in this section provides a quantitative account for the near-codewords

observed through empirical investigation. We have identified that the structure of Hp

admits pseudo-codewords with low pseudo-weight, thus leading to convergence problems

for iterative sum-product decoding in the case of the AWGN channel.

4.4.5 Graph Expansion

For both channel models, the reversible code performance degrades with respect to that

of the random codes, as the block length is increased. We now consider graph expansion

(Def. 2.14), and show how the normalised spectral gap (Def. 2.15) may be used to account

for this general observation.

In contrast to the good expansion of random graphs, the expansion of the 3-regular

3Such a pseudo-codeword has been found for the Rev512 code, using a heuristic search performed byPascal Vontobel.

82


graphs of Hp for the reversible codes, is limited by its lattice structure. In Figure 4.16

we compare the normalised spectral gap for the graph of Hp, for each of the reversible

codes, to that of a randomly generated graph of the same size.

200 400 600 800 1000 1200 1400 1600 1800 20000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1ReversibleRandomRamanujan Bound

Matrix size, m

Nor

mal

ised

spec

tral

gap,µ

δ

Figure 4.16: Comparison of expansion for random and reversible structures.

The random graphs exhibit a good expansion property, which appears to approach

the Ramanujan bound as they get larger. In contrast, the expansion metric for the

circulant based reversible structures vanishes as they get larger. We now show that µδ

can be obtained directly from a circulant Hp. This will allow us to investigate the spectral

properties of these matrices, without considering the adjacency matrix.

Lemma 4.1. Let C be an m × m circulant matrix, having eigenvalues µp ordered such

that |µ1| ≥ |µ2| ≥ · · · ≥ |µm|, with corresponding adjacency matrix

A =

[

0 CC⊤ 0

]

Denoting the (real) eigenvalues of A as αs, and ordering them α1 ≥ α2 ≥ · · · ≥ α2m, we

have µδ = (j − |α2|)/j = (j − |µ2|)/j.

Proof. The characteristic polynomial of A is |A − αI| = |α2I − C⊤C|. Therefore, if χ is

an eigenvalue of C⊤C then ±√χ are eigenvalues of A.

The singular values of C are the square roots of the eigenvalues of C⊤C. Moreover,

as C is circulant, its singular values correspond to the magnitude of its eigenvalues.

If µ is an eigenvalue of C, then |µ| is a singular value of C, and |µ|2 is an eigenvalue

of C⊤C. Hence ±|µ| are eigenvalues of A and the result follows.

83


Let Hp be a circulant m × m matrix with first row polynomial chosen according

to Theorem 3.8, such that h(x) = 1 + x + xm/4+1. If φγ = eiγ2π/m is a complex mth

root of unity then µ = h(φ) is an eigenvalue of Hp [102]. We may therefore consider

each eigenvalue of Hp to be the sum of three unit vectors in the complex plane, such

that µ = 1∠0 + 1∠θ + 1∠(m/4 + 1)θ, where θ = γ2π/m. Figure 4.17(a) shows the

locus of the spectrum for the size m = 32 circulant Hp used in the Rev64 code. The

vector components of each eigenvalue are also shown. All eigenvalues are located in a

disc of radius 3, centred at the origin. Recall that µ1 = 3 + i0, and thus a small spectral

gap arises when an eigenvalue is positioned close to the edge of this disc. Loci of the

spectra for the other reversible codes are also shown in Figure 4.17, however the vector

components have been omitted to improve clarity.

We now focus on the set of eigenvalues for which the second and third vector com-

ponents have the same argument. This occurs when θ + 2sπ = (m/4 + 1)θ, and thus

θ = 8sπ/m, for s ∈ {1, . . . ,m/4}. Of particular interest is the eigenvalue in this set which

has the smallest nonzero argument. We label this eigenvalue and its complex conjugate

µ2 and µ2∗ respectively, as shown in Figure 4.17(a). There are m/4 eigenvalues in the

set, having uniform angular distribution about the point 1+ i0. Therefore, as we increase

the matrix size, m, the argument of µ2 reduces and |µ2| → 3. Hence, from Lemma 4.1,

the spectral gap of the graph vanishes.

Finally, we note that the poor expansion of the graph may provide some insight

into the effect of changing the clipping parameter. Altering η appears to affect the

information error rate for the reversible structures much more than for the random codes.

In a poor expander, strong local misconception can quickly overpower the correct belief

in neighbouring nodes. Clipping prevents the magnitude of the check output decisions

from becoming too powerful too quickly. Hence for a bad expander this is likely to

be advantageous, up to a point where limiting valid check outputs begins to degrade

performance. From the results in Figure 4.5, this point appears to be η = 1 − 10−2 for

the Rev4096 code.

4.5 Summary

We have investigated, characterised and analysed the behaviour of some rate 1/2, (3,6)-

regular reversible codes, which employ a circulant matrix structure for Hp. Empirical

results, for both the AWGN and binary erasure channels, show that the performance of

84


Im

Re

µ2

µ2∗

(a) Rev64, m = 32

Im

Re

(b) Rev512, m = 256

Im

Re

(c) Rev1024, m = 512

Im

Re

(d) Rev4096, m = 2048

Figure 4.17: Loci of the spectra for Hp of the reversible codes.

85


this class of codes degrades as their block length is increased. A thorough analysis has

been provided, in which an appropriate analytical tool has been employed to account for

each experimental observation.

The results presented in this chapter become less encouraging as we consider longer

codes. However, there were no undetected errors observed, and the comparatively poorer

performance of the longer codes has been linked to instances of decoder convergence

failure. It may therefore be possible to modify the operation of the decoding algorithm

to improve performance. Some evidence of this is provided by the fact that modifying

the clipping parameter can shift the performance floor. Further investigation of how

we may attenuate [42], or reschedule [39], messages may be worthwhile. It may also be

worthwhile investigating the iterative reliability based approach suggested in [44]. On

the AWGN channel, the WER measured across the full codeword is higher than that

measured across the information bits alone. This is due to cases when only the parity

bits of the decoder output are corrupt. In such a case, a cyclic redundancy check applied

across the information bits may be employed. As described in Section 1.1.3, the CRC may

be used to flag the error free information vector and prevent it from being unnecessarily

discarded.

We attribute the decoder convergence problems to the fact that their circulant based

lattice structures are not good expanders. In the case of the BEC it may be possible to

improve performance by a more careful selection of Hu, such that small stopping sets are

avoided. However, in the AWGN channel case, our finite graph-cover analysis highlights

that the weakness in Hp is sufficient to cause convergence problems, regardless of the

choice of Hu. The poor expansion property of the circulant structure admits low weight

pseudo-codewords. We note however, that it is not a necessary condition that Hp be

circulant in order for it to be iteratively encodable. The good expansion property of

random graphs motivates the search for iteratively encodable matrices which incorporate

some form of randomness. This approach is explored in the next chapter.

The reversible codes exhibit different characteristics depending upon the channel,

and corresponding decoder being employed. They therefore provide an interesting case

study and highlight some general points to be considered when analysing a class of LDPC

codes. We close this chapter by summarising these points, the first two of which reiterate

those presented in [79].

Simulate to a Low WER It is common practice to run simulations only to an infor-

mation BER of around 10−5. However, this may not be low enough to expose bad

86


properties of a code. When presenting earlier versions of this work [93], the error

floor effect was not discovered for this reason. By simulating down to a word error

rate of around 10−5, or lower, we are much more likely to uncover such a weakness.

Distinguish Between Error Types It is also common not to distinguish between de-

tected and undetected errors. However, such information can be very useful in

determining the potential cause for a performance issue. Undetected errors relate

to minimum distance, whereas detected errors relate to minimum pseudo-weight.

Good Properties Do Not Always Scale While the good properties of random codes

scale well, we cannot assume this to be the case when considering structured codes.

Not All Decoders are Equal The way in which we handle extreme messages in the de-

coder implementation can affect performance. This is evident in the above analysis,

from the choice of the clipping parameter. Furthermore, favourable implementation

settings are likely to be code dependent.

Choose the Most Appropriate Tool In recent times it has become common to apply

stopping set analysis on the BEC, and then suggest that the results carry through to

the AWGN channel. While there is some evidence that the two are related, stopping

sets do not capture the full picture for the AWGN case [27]. This is reflected in

the above analysis of the reversible codes. Stopping set closure shows a dependence

upon the inclusion of an information bit. However, decoder convergence problems

on the AWGN channel can arise which are dependent only upon the parity bits.

A quantitative account for the latter is provided by finite graph-cover analysis.

Furthermore, the absence of undetected errors tends to suggest that the potentially

poor minimum distance properties are much less significant than the poor minimum

pseudo-weight.

87

Chapter 5

Improved Reversible LDPC Codes

5.1 Introduction

The analysis presented in the previous chapter has exposed a weakness of using weight

j = 3 circulant matrices to construct reversible LDPC codes. The graphs corresponding

to such structures have a poor expansion property. This admits low weight pseudo-

codewords, and ultimately leads to decoder convergence problems. In this chapter we

design some new reversible codes which have a better expansion property. We draw

motivation from the good expansion of random graphs, and the recursive constructions

presented by Tanner [9].

We compare simulation performance of the new codes to that of random codes.

In addition to this benchmark we use the finite geometry codes of Kou et al. [52].

These codes were selected because they have a similar encoding complexity to reversible

codes, both in terms of computation time and architecture size. Furthermore, finite

geometry codes have been shown to outperform regular random codes. We compare the

new reversible codes against their respective benchmarks, in terms of both performance

and implementation complexity.

All results in this section assume transmission on the AWGN channel, with sum-

product decoding, for a maximum of 50 iterations. In all cases the decoder employs a high

clipping limit, η = 1 − 10−10, imposed only to avoid numerical overflow. All simulation

points represent a minimum of 50 word errors, measured across the information section

of the codeword.

This chapter contains original work, presented in part at the 2004 Australian Com-

munications Theory Workshop, Newcastle, Australia [95].

88

Chapter 5. Improved Reversible LDPC Codes

5.2 Simple Metrics for Implementation Complexity

Each nonzero element in H represents an edge in the factor graph representation of the

code, and thus a connection between nodes in the circuit of the corresponding sum-

product decoder. Hence, the density of H gives us some indication of implementation

complexity, in terms of routing.

A simple count of edges in the factor graph does not account for the complexity

of the nodes themselves. For example, a 6-edge check node has the same number of

edges as two 3-edge checks, yet it has a higher internal complexity. As discussed in

Section 2.5.1, we can construct a multiple edged node by using bi-directional soft-logic

gates (Def. 2.6) as building blocks. We may use a count of these blocks as a simple metric

for implementation complexity.

Definition 5.1 (Bi-directional Soft-Logic Gate Count (BSGC)). The bi-directional

soft-logic gate count for a parity-check matrix H, represents the total number of bi-

directional soft-logic gates required to implement the sum-product decoder core correspond-

ing to H.

By cascading two bi-directional soft-logic gates we can build a 4-port node, and so

on. Hence, the BSGC of a single i-edged check node is i − 2. We assume that each

j-edged variable node requires an extra edge for channel interfacing, and thus assign it a

BSGC of j − 1. The BSGC of a (j, i)-regular LDPC code is therefore n(2(1− 1/i)j − 1).

As we focus upon analog implementations in this thesis, the complexity of check and

variable BSGs is assumed to be equal. This may not be the case for other implementa-

tions. The BSGC metric may be easily modified to account for such cases, by weighting

the counts of the different node types. Moreover, for this simple metric, we ignore any

possible optimisation of the nodes that may be achieved through partial reuse of inter-

mediate results. It is intended that the BSGC be used only as an approximate metric for

implementation complexity, rather than as a means of determining the transistor count

of an optimised decoder circuit.

5.3 High Rate Codes

In this section we consider the construction of Hp using a weight j = 5 circulant matrix.

We then use such a matrix to build high rate codes that are encodable in κ = 8 iterations

of the Jacobi method. Our motivation for selecting a higher column weight is twofold.

89


Firstly, when considering high rate binary LDPC codes, MacKay and Davey have

presented arguments against the use of column weight j = 3 [40]. We are considering

finite length codes, and the potential for weakness increases for high rate codes of short

block length. This is reflected by the performance of a randomly generated code, having

(n, k) = (1082, 826) and j = 3, shown in Figure 5.1. The code performs well at low SNR

but then exhibits an error floor. Furthermore, undetected errors were observed during

simulation at high SNR.

Using Algorithm 3.2, we can construct iteratively encodable circulants with weight

j = 5. Our second point of motivation for using such a matrix is that they offer improved

expansion. We have built an iteratively encodable circulant, having h(x) = 1+x21+x53+

x119+x183 and size m = 256. This matrix has a normalised spectral gap µδ = 0.108, which

is just over 50% of the Ramanujan bound when j = 5. Although not optimal, it represents

a large improvement upon the j = 3 case (see Figure 4.16) where µδ is less than 2% of its

corresponding optimal value. We may complete H, to build an (n, k) = (1082, 826) code,

by appending columns of weight j = 5, whilst blocking the creation of 4-cycles. The

performance of this code is shown in Figure 5.1. We note that the BER taken across the

information section closely follows that taken across the full codeword. Also, there is only

a slight deviation between the WER taken across the information section and that taken

across the full codeword. Empirically we observe that the decoder is failing to converge in

the presence of some (5, 5) near-codewords. These near-codewords have nonzero elements

in the parity bits only, thus causing the slight imbalance in word error rate. There were

no undetected errors observed during simulation. The agreement between information

and full codeword error rates, and the absence of an error floor, stands in contrast to the

experimental results for the reversible codes of the previous chapter. We attribute this

to the improved expansion property of the graph corresponding to Hp.

It is a requirement that the circulant Hp have odd column weight in order to be

iteratively encodable. However, aside from enforcing the 4-cycle free constraint, the

choice of Hu is unconstrained. Hence we may choose Hp as above but select columns of

weight j = 4 for Hu. The performance improvement offered by this (n, k) = (1082, 826)

code with j = {5, 4} is shown in Figure 5.1.

As a benchmark for comparison to these rate r ≈ 0.763 reversible codes, we use

a type-I Euclidean-geometry (EG-LDPC) code of the same rate and approximately the

same length. The performance of this code (c.f. [52]) is also shown in Figure 5.1. A

comparative summary of performance and implementation cost, in terms of both the

90


2 2.5 3 3.5 4 4.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

Uncoded BPSKEG−LDPC (1023,781), j=32Rand (1082,826), j=3Rev (1082,826), j=5: InfoRev (1082,826), j=5: FullRev (1082,826), j={5,4}: InfoRev (1082,826), j={5,4}: Full

Eb/N0(dB)

BE

R

(a) Bit error rate

2 2.5 3 3.5 4 4.510

−6

10−5

10−4

10−3

10−2

10−1

100

EG−LDPC (1023,781), j=32Rand (1082,826), j=3Rev (1082,826), j=5: InfoRev (1082,826), j=5: FullRev (1082,826), j={5,4}: InfoRev (1082,826), j={5,4}: Full

Eb/N0(dB)

WE

R

(b) Word error rate

Figure 5.1: Comparing the performance of some r ≈ 0.763 LDPC codes.

91


Column Eb/N0 (dB)Code type (n, k) weight (j) at BER=10−6 Edge count BSGCEG-LDPC (1023,781) 32 3.63 32736 62403Reversible (1082,826) 5 3.89 5410 7574Reversible (1082,826) {5,4} 4.08 4584 9226

Table 5.1: Comparison of high rate code performance and complexity.

total number of edges in the code’s factor graph and the BSGC metric, is provided in

Table 5.1.

We see that the EG-LDPC code outperforms the j = {5, 4} reversible code by ap-

proximately 0.26dB at a BER of 10−6. However, this comes with a considerable cost in

decoder complexity. The EG-LDPC code has redundant rows in its square parity-check

matrix. Every variable and check has degree 32, giving a much higher node implemen-

tation complexity than that of the reversible codes. Furthermore, as the reversible codes

have a much smaller edge count they offer a significant reduction in routing complexity.

5.4 Codes Built From Improved Expanders

The results of the last section suggest that we can improve graph expansion of type-I

reversible codes, and thus improve performance, by increasing the column weight of Hp.

Although suitable for building high rate codes, type-I reversible structures with col-

umn weight j > 3 are not as well suited to rate 1/2 applications. This is because

increasing the column weight of rate 1/2 codes, having (j,2j)-regular structures, shifts

their performance away from capacity [5, 35]. As the (3,6)-regular ensemble is the best

regular ensemble, we seek a means of building a 3-regular iteratively encodable Hp with

good expansion.

5.5 Recursive Construction

It is well known that random graphs exhibit a good expansion property. Here we aim to

break the lattice structure of the codes presented in the previous chapter, by incorporating

randomness into the graph. The challenge in doing this is that the constructions must

also satisfy the iterative encodability (3.5) and overlap (Def. 2.10) constraints. The

following algorithm uses a recursive approach [9] to generate Hp, while incorporating two

randomly generated components. We use the terminology in the following definition to

identify codes built using this technique.

92


Definition 5.2 (Type-II Reversible Code). A type-II reversible code has Hp recur-

sively constructed according to Algorithm 5.1.

Algorithm 5.1: Construction of Hp for a Type-II Reversible Code

Initialise:1

Choose target size m, for Hp. Select the template matrix size s ≪ m, and set thecomponent matrix size p = m/s.Create Template:2

Construct the following s × s code template matrix.

T =

[

I NI I

]

Select N to be a 4-cycle free, 2-regular, s/2 × s/2 matrix, having the propertyN4 = 0. This matrix can be constructed as a circulant with first row polynomialn(x) = x + xs/8+1. It is easy to see that this polynomial generates N, byconsidering Theorem 3.8. Small circulants offer good expansion, and as s ≪ m,they make suitable template components. We note that T is 4-cycle free, sincediag N = 0.Generate Components:3

Choose two p × p component matrices as follows.P : random permutation matrixR : random 2-regular, 4-cycle free matrix.Apply Template4

Let A ⊗ B represent the matrix Kronecker product of A and B. SettingK1 = N ⊗ P and K2 = I ⊗ R, insert the components into the template to createHp, as follows.

Hp =

[

I K1

K2 I

]

Theorem 5.1. Algorithm 5.1 generates a 3-regular, 4-cycle free matrix, which is itera-

tively encodable in κ = 8 iterations.

Proof. In Step 1 the template T is generated such that it is 4-cycle free. As Step 2

selects 4-cycle free components, it follows that Hp is also free of 4-cycles. The Kronecker

products performed in Step 4 generate 2-regular K1 and K2, and hence Hp is 3-regular.

We now show that Hp is iteratively encodable, by considering

(Hp + I)8 =

[

(K1K2)4 0

0 (K2K1)4

]

We note that the identity (A⊗B)(C⊗D) = AC⊗BD holds for appropriately dimen-

sioned matrices. Thus (K1K2)4 = (N⊗PR)4 and (K2K1)

4 = (N⊗RP)4. Step 2 selects

93


N4 = 0, causing the result to vanish for both cases. Hence Hp satisfies (3.5) for κ = 8

iterations.

We complete H by extending Hp with a randomly generated Hu, while blocking

the creation of 4-cycles. Using Algorithm 5.1, with s = 32, we have built example Hp

matrices of size m =256, 512 and 2048. Figure 5.2 shows that these matrices offer much

better expansion than the circulant structures presented in the previous chapter.

200 400 600 800 1000 1200 1400 1600 1800 20000

0.01

0.02

0.03

0.04

0.05

0.06

0.07Reversible (Circulant)Reversible (Recursive)Ramanujan Bound

Matrix size, m

Nor

mal

ised

spec

tral

gap,µ

δ

Figure 5.2: Expansion of type-II reversible structures.

We have broken the lattice structure discussed in the previous chapter, thus reducing

the probability of creating small stopping sets. Hence we expect that reversible codes

built using the above matrices should also provide improved BEC performance. Further-

more, incorporating randomness into the structure should statistically improve minimum

distance. For example, the lowest weight row in the generator matrix for RevII512 has

weight 48, in contrast to the values presented in Table 4.5.

In contrast to the circulant designs presented in the previous chapter, the size of

type-II reversible codes is not constrained such that m ∈ 2y, y ∈ Z+. As the choice of

component matrix size is arbitrary, the size of Hp is constrained only to be some multiple

of the template size. Recall also that the choice of the number of columns in Hu is

arbitrary. Hence type-II reversible codes can be designed for a very wide selection of rate

and block length.

94


5.6 Simulation Performance

Here we investigate the performance of some type-II reversible codes, built using the

three example Hp matrices of the previous section. In each case we have extended Hp

to build 4-cycle free codes, using a randomly generated Hu. There were no undetected

errors observed during any of the simulations presented in this section.

Figure 5.3 shows the AWGN performance of three (3,6)-regular type-II reversible

codes. Performance of the random codes listed in Table 4.2 is again provided as a bench-

mark.

The performance of the n = 512 and n = 1024 codes matches that of their respective

random benchmarks, with no evidence of a floor. In both cases the BER and WER

measured across the information section matches that measured across the full codeword.

The n = 4096 code initially compares well to its random benchmark but shows signs of

flooring at around WER = 10−5. For this code we empirically note convergence failure

in the presence of some (14, 2) near-codewords. All codes perform significantly better

than those presented in the previous chapter.

Using the example Hp structure with size m = 512, we have also constructed two

example codes having (n, k) = (1280, 768) and rate r = 0.6. The performance of these

type-II reversible LDPC codes is shown in Figure 5.4. All columns in Hu for the code

labelled A, have weight j = 4. The code labelled B has an Hu section with 512 columns

of weight j = 3 and 256 columns of weight j = 4. As a comparative benchmark, we

also show the performance of an extended Euclidean-geometry code (c.f. [52]) having the

same rate and approximately the same length.

All three codes exhibit similar performance. At low SNR the reversible code having

some weight j = 3 columns in Hu performs slightly better than the other reversible code.

However, this gap closes as the SNR increases. The parity-check matrix of an extended

EG-LDPC code has a much lower density than that of a type-I EG-LDPC code. In this

case, the benchmark code has columns of weight j = {3, 4} and row weight i = 8. As a

result, all three codes have approximately the same implementation complexity.

5.7 Summary

In this chapter we have demonstrated that reversible codes exist which do not exhibit con-

vergence problems. We have achieved this by targeting the weakness in graph expansion

95


1 1.5 2 2.5 3 3.5 410

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

Uncoded BPSKRevII512: InfoRevII512: FullRand504RevII1024: InfoRevII1024: FullRand1008RevII4096: InfoRevII4096: FullRand4000

Eb/N0(dB)

BE

R

(a) Bit error rate

1 1.5 2 2.5 3 3.5 410

−6

10−5

10−4

10−3

10−2

10−1

100

RevII512: InfoRevII512: FullRand504RevII1024: InfoRevII1024: FullRand1008RevII4096: InfoRevII4096: FullRand4000

Eb/N0(dB)

WE

R

(b) Word error rate

Figure 5.3: Performance of type-II reversible (3,6)-regular LDPC codes.

96


1 1.5 2 2.5 3 3.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

Uncoded BPSKExtended EG−LDPC (1275,765)RevII (1280,768) A: InfoRevII (1280,768) A: FullRevII (1280,768) B: InfoRevII (1280,768) B: Full

Eb/N0(dB)

BE

R

(a) Bit error rate

1 1.5 2 2.5 3 3.5

10−4

10−3

10−2

10−1

100

Extended EG−LDPC (1275,765)RevII (1280,768) A: InfoRevII (1280,768) A: FullRevII (1280,768) B: InfoRevII (1280,768) B: Full

Eb/N0(dB)

WE

R

(b) Word error rate

Figure 5.4: Performance of rate r = 0.6 type-II reversible LDPC codes.

97


that was exposed in the previous chapter. Although the circulant design approach pre-

sented in Chapter 3 is not well suited to (3,6)-regular codes, it may be used to construct

high rate codes. We have presented a simple recursive algorithm for constructing (3,6)-

regular, type-II reversible codes. This reversible design employs random components,

and provides improved expansion to the graph of Hp.

Simulation results for the example type-II reversible codes compare well to random

and extended EG-LDPC benchmarks. The high rate circulant based reversible codes offer

a performance/complexity tradeoff in comparison to the type-I EG-LDPC structures.

The example length n = 512 and n = 1024 type-II reversible codes do not show

signs of a performance floor. However, the expansion and performance of the n = 4096

code are not as good. This relationship again motivates the expansion of Hp as a metric

for the expected performance of reversible codes. From Figure 5.2, we note that the

Hp component of the RevII512 code has optimal expansion. However, the expansion

metric value decreases as the size of Hp is increased. By exploring alternative recursive

approaches, it may be possible to improve the performance of the longer codes.

Recursive constructions offer a wide range of choices for the size of Hp. We also have

an arbitrary choice for the number of columns in Hu. Hence, type-II reversible codes

may be designed with a high degree of freedom for rate and block length.

98

Chapter 6

Analog Decoding

6.1 Introduction

Many high performance channel codes, such as LDPC codes, may be represented using

the factor graph structure presented in Section 2.4. Iterative decoding algorithms for

these codes, such as the sum-product algorithm presented in Section 2.5.1, are then

viewed as message-passing on the graph. The highly parallel structure of the factor

graph representation of LDPC codes offers the potential for very high throughput decoder

architectures to be built.

The standard approach to decoder implementation has been to design parallel [89],

or partly parallel [60, 103], digital circuits which map the structure of the code’s factor

graph. Each node on the factor graph can be considered as a small processor for message-

passing decoding. Messages are passed around the decoder circuit in a quantised form.

Analog implementations of the Viterbi algorithm have been used since the late

1970s [104], with recent application to magnetic disk drive channels [105]. Following

on from this has been the suggestion that analog VLSI circuits be used to implement

iterative soft decision decoding on factor graphs [106–108]. Since then a research com-

munity has begun to develop, in order to explore this alternative approach. The principles

that underly analog decoding are the same as those used to build analog circuits for some

artificial intelligence applications [109].

An analog decoder circuit maps the constraints of the code, in a similar manner to a

digital decoder. In contrast to the digital decoder, the analog decoder operates by passing

unquantised messages in continuous time, as it settles to the steady state output1. The

1An exception to the generalised digital decoder description is the recently proposed stochastic de-coder [110, 111]. This algorithm operates by passing randomised digital messages synchronously orasynchronously.

99

Chapter 6. Analog Decoding

designer may choose to consider the messages being passed between processing blocks,

as either currents or voltages. Hence, the designs are often referred to as being either

current-mode or voltage-mode respectively. Of course, current and voltage are related

and there is no fundamental difference between the two views [112]. We use the following

definition to determine the mode of a processing block [3].

Definition 6.1 (Current/Voltage-Mode). A processing block with low impedance in-

puts is a current-mode block, otherwise it is a voltage-mode block.

6.2 Potential Advantages

In this thesis we focus upon the analog implementation of an LDPC decoder. We now

list some of the potential advantages that an analog decoder presents, in contrast to that

of a digital implementation. Some of these advantages are particularly relevant to LDPC

decoder implementation.

Preservation of Soft Information A digital circuit must sample the received vector

and pass quantised message values around the circuit. Analog circuits internally

pass messages as real values without quantisation.

Fast Decoding Digital decoders operate by passing messages in discrete time iterations

around the circuit. An analog decoder operates in continuous time, as it settles to

a steady state output2. During the decoding process, voltage variations throughout

the decoder become progressively smaller, thus leading to fast convergence [113].

Low Power The analog decoder circuit is only required to settle to a steady state once it

is switched. This offers a potential power saving when compared to digital circuits,

which must operate for several iterations. Digital power consumption is propor-

tional to the switching (clock) rate, whereas analog decoders are only switched

once per block. Analog decoders can outperform digital decoders by two orders of

magnitude in terms of power consumption [106, 114]. The largest fabricated analog

decoder to date is the product decoder presented in [115]. This analog decoder

consumes approximately 2% of the power consumed by the digital LDPC decoder

presented in [89].

2As is the case for a digital implementation, it is also possible for the analog decoding process tooscillate [3].

100


Small Area Analog decoders have the potential to be significantly smaller than digital

decoders [106, 114]. An analog message value may be represented on a single wire,

compared to q wires required for q-bit quantisation in the digital case. This reduces

the amount of wires required for routing and thus saves circuit space. This saving is

of particular relevance to LDPC decoders, for which the size of a digital implemen-

tation is determined by routing congestion rather than the transistor count [89].

After accounting for differences in code length and process technology, the analog

decoder presented in [115] requires approximately 4 times less area than the digital

decoder presented in [89].

Programmability The ability to represent real valued messages on a single wire allows

the design of efficient reprogrammable message-passing networks [116, 117]. This

offers the potential for the code structure to be dynamically loaded in a boot stage

prior to decoding, rather than being fixed at fabrication.

Elimination of Analog to Digital Converter A digital implementation requires a

separate circuit to convert the received analog matched-filter samples into digital

form before decoding. However, the analog decoder is effectively a smart analog to

digital (A/D) converter [118]. During the decoding process it converts analog sam-

ples directly to digital information bits. Eliminating the need for an A/D converter

stage provides a significant reduction in power consumption and circuit area.

Suitability to System-on-a-Chip Implementations Analog decoders are well suited

to sharing a single chip with other system components without causing interference,

as they have no high-frequency switching components. Furthermore, the large par-

allel decoding approach makes the analog decoder an effective low-pass filter, so

that it is robust against interference from other digital circuit components. Hence

analog decoders provide excellent radio frequency compatibility. The low voltage

analog designs presented in [118, 119] have very flexible supply voltages, and do not

require regulated reference voltages. This makes them particulary attractive for

system-on-a-chip (SOC) applications.

6.3 Existing Work and Remaining Challenges

So far the analog approach has been used to design trellis based BCJR-style decoders [114,

120, 121], turbo decoders [117, 122, 123], decoders suitable for LDPC codes [3, 124], min-

101


sum decoders [125], and a decoder for block product codes [115, 126]. Several of these

designs have been successfully fabricated. Most of these decoders are proof-of-concept

designs having very small block length (<50 bits), with notable exceptions being [115,

123].

Work is ongoing into the analysis of analog iterative decoding circuits. Distortion in

analog computation can arise through mismatch between transistors in the circuit, and

a theoretical model for this effect is provided in [127]. For the small analog decoders

built to date, empirical evidence suggests that mismatch effects are not of significant

concern [124, 128, 129]. A recent density evolution based model for the effects of mismatch

in larger decoders [130] indicates that a mismatch of up to 20-25% may be tolerated.

Device mismatch may be easily set below this point by slightly increasing transistor size

beyond minimum dimensions.

Message-passing in a digital decoder reflects the discrete time schedule exactly. How-

ever, operation of the continuous time analog circuit is not as well understood. A recent

model has been proposed which approximates message-passing in the analog network as

successive over relaxation (SOR) [131]. Simulation results suggest that SOR provides

improved performance over the flooding schedule (Section 2.5.1).

New approaches for building larger analog decoders are required. The desire to op-

erate the circuit in continuous time, in contrast to time multiplexed digital computation,

calls for intelligent routing techniques [116]. A tailbiting ring decoder which operates

using a sliding window [132] provides an efficient architecture for analog decoding of

convolutional codes. However, building large sum-product based LDPC decoders is dif-

ficult, due to fast growth in routing complexity. Manual schematic design and layout for

large decoders is a tedious and error prone process. Using standard cells and automated

routing is attractive, with the potential to automatically generate a chip layout directly

from the parity-check matrix [3]. However, approaches to date lead to impractically large

layouts [127]. Hence, further investigation into design automation is required.

An alternative approach for constructing analog computation cells, using multiple-

input floating-gate (MIFG) transistors has been proposed [133]. These circuits have the

potential to offer smaller chip area.

The requirement that transistors operate in saturation places a limit of around 1.2V

on the minimum supply voltage of typical analog decoder cells. As CMOS processes have

evolved, their allowable supply voltages have correspondingly decreased. An alternative

cell design has therefore been proposed which can operate from a supply voltage as low

102


as 0.6V [118, 119].

While the recent focus has been placed upon building analog decoders, potential

applications for analog processing are by no means limited to decoding. Essentially any

algorithm that operates on a factor graph is a candidate for analog implementation.

Research is ongoing into using analog computation for other applications, such as equali-

sation [134], timing recovery and synchronisation [135]. By replacing digital components

the potential exists to create a fully analog front end, and thus realise the true potential

of eliminating the A/D converter.

As discussed in Section 3.1, verification of the decoder circuit is a challenging task.

We may reduce this burden through importance sampling, when obtaining empirical

BER results from circuit simulation [136]. An alternate approach is to test the routing

connectivity of the circuit independently from operating the decoder, and provide a built-

in self-test (BIST) function for the fabricated circuit [137, 138].

6.4 The Subthreshold CMOS Analog Approach

We represent the probability mass function pX(x), using the currents on two wires, as

the vector (Ix,0,Ix,1) corresponding to (pX(0),pX(1)). We define the sum current, IX , as

the sum Ix,0 + Ix,1. We denote a probability of 1 by the unit current, Iu.

6.4.1 Subthreshold Operation

An n-type MOSFET [139] is shown in Figure 6.1. The device has four terminals, namely

the gate (G), source (S), drain (D), and bulk. The bulk terminal represents a connection

to the substrate. We assume that the bulk terminal is tied to either Gnd or Vdd for n-type

and p-type FETs respectively, and hence it has been omitted from the diagram.

D

S

G

Vgs

ID

Figure 6.1: An n-type MOSFET.

103


In simplistic terms we may consider the above FET to act as a switch, which connects

the drain to source when the gate-to-source voltage Vgs is above the transistor threshold

voltage, Vth, or isolates the drain and source when Vgs is below Vth. This relationship

is exploited in digital circuit design. However, in reality the device does not turn off

abruptly as soon as Vgs drops below Vth. When Vgs ≈ Vth, a weak inversion layer exists,

supporting some current flow from drain to source. In this case the device is said to

be operating in the weak inversion, or subthreshold, region [139]. We assume that the

device is saturated when the drain-to-source voltage is Vds & 200mV. If the device is

saturated and operating in the subthreshold region then the drain current, ID, exhibits

the following exponential relationship to Vgs [139]. This relationship is similar to the

relationship between the collector current and base-emitter voltage of a bipolar device.

ID = I0eVgs/ζVT (6.1)

Here the specific current, I0, is a process dependent constant. The factor ζ > 1

is used to account for non-ideal device behaviour [139]. The thermal voltage is denoted

VT = kBT/q, where T is the absolute temperature, kB is Boltzmanns constant and q is the

carrier charge. We may consider the term ζVT to be a constant normalisation factor, with

units of voltage, leading to a dimensionless exponential component that varies with Vgs.

We exploit the relationship of (6.1) to build the simple analog processing circuits

that are reviewed in this section [3]. To do so, we choose a unit current such that the

devices in these circuits are biased to operate in the subthreshold region.

We now consider the n-type differential pair circuit shown in Figure 6.2. Here the

drain currents of the pair represent the probability mass pX(x) of a binary random vari-

able X, normalised to the unit current Iu. Using (6.1) it is straightforward to show

that the differential voltage VX0 − VX1 represents the corresponding log-likelihood ratio

loge(pX(0)/pX(1)). Hence, if we set a constant differential reference voltage, VX1 = Vdiff,

we can use this circuit to convert a voltage VX0 representing a log-likelihood value, into a

pair of currents (IupX(0),IupX(1)) representing the corresponding probability mass. The

reference voltage Vdiff represents a log-likelihood value of zero.

In this thesis we focus upon circuits which operate in subthreshold CMOS. Alterna-

tively, we may use bipolar devices to implement high speed analog decoders. Existing

subthreshold CMOS decoders are able to offer throughput in the order of 1-500Mbit/sec,

and we expect 1Gbit/sec throughput to be achievable with newer CMOS processes. Re-

cent predictions for the throughput of SiGe decoders are in the order of 10Gbit/sec [140].

104


Iu

VX0 VX1

IupX(0) IupX(1)

Figure 6.2: An n-type differential pair.

Bipolar circuits use higher currents, as they do not operate in subthreshold mode. Hence

they present a speed/power tradeoff in relation to subthreshold CMOS decoders. More-

over, fabrication costs for the BiCMOS [121] process, and the more advanced SiGe [141]

process, are significantly higher than for the standard CMOS process.

6.4.2 The Gilbert Multiplier

Consider two probability masses pX(x) = (pX(0), pX(1)) and pY (y) = (pY (0), pY (1))

represented by current vectors (Ix,0,Ix,1) and (Iy,0,Iy,1) respectively, as described above.

Current vectors IX(pX(0), pX(1)) and IY (pY (0), pY (1)) are presented as input to the

circuit shown in Figure 6.3. When operating in the subthreshold region, the circuit shown

in Figure 6.3 outputs currents which are the pairwise products of pX(x) and pY (y), scaled

by the sum current IZ = IX . This form of the standard Gilbert multiplier circuit is used

in the core of the analog decoder processing blocks [106, 142]. Here the n-type reference

potential is labelled VrefN. The multiplier circuit may be easily extended to accommodate

wider input vectors [3].

6.4.3 Probability Normalisation

We may normalise the current vector resulting from a calculation such as that performed

in Figure 6.3, using the biased current mirror circuit shown in Figure 6.4. Here the p-type

reference potential is labelled VrefP.

Using this circuit the normalised output current Ioutj may be calculated using the

105


IXpX(0)

IXpX(1)

IY pY (0)

IY pY (1)

-

-

-

-

I Zp

X(0

)pY

(0)

?

I Zp

X(0

)pY

(1)

?

I Zp

X(1

)pY

(0)

?

I Zp

X(1

)pY

(1)

?

VrefN

VrefN

Figure 6.3: Vector multiplication core.

following expression.

Ioutj = IuIinj

∑km=1 Iinm

6.5 Analog Computation using Soft-Logic Gates

In this section we review the variable and check node circuits which appear in the core

of an analog sum-product decoder [3]. In Section 2.5.1 we showed how these nodes may

be considered to be a concatenation of soft-logic gate components. Here we provide a

detailed description of soft-XOR and soft-equal gate functionality and implementation.

So far in our description we have the means to multiply and normalise currents.

Addition of currents is done simply by connecting wires together. If a current is required

more than once then we may use current mirrors to replicate it. We now use these rules

to build the two principle processing structures used in an analog LDPC decoder.

106


. . . . . .

I in1

?

I out1

?

I in2

?

I out2

?

I ink

?

I outk

?

Iu

VrefP VrefP VrefP

Figure 6.4: Vector normalisation using a biased current mirror.

6.5.1 The Analog Soft-XOR Gate

The soft-XOR gate (Def. 2.4) may be used to build check nodes. We can implement

this processing block using an analog circuit, as shown in Figure 6.5. Current vectors

representing the probability masses pX(x) and pY (y) are set as input to the core circuit

of Figure 6.3. The outputs for IZpX(0)pY (0) and IZpX(1)pY (1) are then connected,

resulting in the current labelled IZ0, such that IZ0 ∝ pZ(0). The current labelled IZ1 is

formed in a similar manner, such that IZ1 ∝ pZ(1). Finally the resulting k = 2 currents

are used as input to the normalisation circuit of Figure 6.4 to generate IupZ(z).

6.5.2 The Analog Soft-Equal Gate

The soft-equal gate (Def. 2.5) may be used to build variable nodes. The analog circuit

implementation of this processing block shown in Figure 6.6 is similar to that of the

soft-XOR gate. In this case however, the outputs for IZpX(0)pY (1) and IZpX(1)pY (0)

are not used in the computation of the output distribution, and hence these paths are

terminated. The remaining outputs are then connected as input to the normalisation

circuit, to generate pZ(z).

The soft-logic gates presented so far have a diode connected (gate tied to drain)

FET at each input. Hence the blocks have a low input impedance and are therefore

current-mode circuits (Def. 6.1). Transforming the blocks to voltage-mode operation is

easily achieved by shifting the diode connected FETs from the input to the output side.

107


IXpX(0)-

IXpX(1)-

IY pY (0) -

IY pY (1) -

IupZ(0)-

IupZ(1)-

Iu

IZ0? IZ1?

VrefN

VrefN

VrefP VrefP

Figure 6.5: Subthreshold CMOS soft-XOR gate.

Figure 6.7 shows a voltage-mode soft-XOR gate. Here we represent probability mass

pX(x) using the voltage pair (Vx,0, Vx,1). The simplicity of this transformation reflects

the fact that voltage-mode and current-mode views are fundamentally the same [112].

The designer is therefore free to choose whichever approach best fits the overall system [3].

6.5.3 Building Variable and Check Nodes

We may construct bi-directional soft-logic gates (Def. 2.6) using the method introduced

in Section 5.2. The analog soft-logic gates described above are specific implementations

of the single output soft-logic gate shown in Figure 2.4(a). Here the function pZ(z)out =

f(pX(x)in, pY (y)in) is implemented using the circuits shown in Figure 6.5 and Figure 6.6,

for soft-XOR and soft-equal gates respectively. The single output analog gates are then

connected as shown in Figure 2.4(b), to form bi-directional gates. Each edge in Figure 2.4

represents two wires in the circuit, e.g. carrying pX(0) and pX(1). The factor graph

representation of these bi-directional soft-logic gates is shown in Figure 6.8.

108


IXpX(0)-

IXpX(1)-

IY pY (0) -

IY pY (1) -

IupZ(0)-

IupZ(1)-

Iu

VrefN

VrefN

VrefP VrefP

Figure 6.6: Subthreshold CMOS soft-equal gate.

We can construct a 4-port check node by cascading two 3-port bi-directional soft-

XOR gates, as shown in Figure 6.9(a). Larger nodes may be built in a similar manner.

Variable nodes are built using a similar approach, as shown in Figure 6.9(b). Here the

variable has 3.5 ports (3 bi-directional check connections plus an input only channel con-

nection) and an output bit slicer [3]. The output bit slicer uses a single output soft-equal

gate to incorporate channel information into the posterior output calculation. Messages

(probability masses) are passed as vectors in a single direction along 2 wires (dotted lines)

and bidirectionally between cells along 4 wires (solid lines).

6.6 The Analog Sum-Product Decoder

We may build an analog version of the sum-product decoder (Algorithm 2.1) using the

above processing blocks. The basic approach is to map the factor graph of the code

onto a circuit, using analog check and variable nodes. A detailed description has been

109


Vx,0

Vx,1

Vy,0

Vy,1

Vz,0

Vz,1

Iu

VrefN

VrefN

VrefP VrefP

Figure 6.7: Voltage-mode soft-XOR gate.

completed by Lustenberger [3].

In this thesis we design a proof-of-concept codec core for a reversible LDPC code

having (n, k) = (16, 8). The code is (3,6)-regular, with Hp structured according to

Theorem 3.8 and Hu being randomly generated. The resulting parity-check matrix,

which contains 4-cycles due to the very small size of the code, is defined as follows.

H =

1 1 0 1 0 0 0 0 1 0 0 1 1 0 0 00 1 1 0 1 0 0 0 1 1 0 0 0 0 1 00 0 1 1 0 1 0 0 0 1 0 0 0 1 0 10 0 0 1 1 0 1 0 0 1 1 0 1 0 0 00 0 0 0 1 1 0 1 0 0 0 0 1 1 1 01 0 0 0 0 1 1 0 1 0 1 0 0 0 0 10 1 0 0 0 0 1 1 0 0 0 1 0 0 1 11 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0

(6.2)

The analog sum-product circuit which maps the factor graph of H is shown in Fig-

ure 6.10. Each 6-edge check node is implemented using a 6-port bi-directional soft-XOR

gate, and each variable node using a 3.5-port soft-equal gate with an output bit slicer.

110


pX(x)

pY (y)

pZ(z)

(a) Soft-equal gate.

pX(x)

pY (y)

pZ(z)

(b) Soft-XOR gate.

Figure 6.8: Factor graph representation of bi-directional 3-port soft-logic gates.

pW (w)

pX(x) pY (y)

pZ(z)

(a) 4-port check node.

pX(x)

pY (y)

pZ(z)

prior in

posterior out

(b) 3.5-port variable node with output bit slicer.

Figure 6.9: Check and variable nodes.

Channel observations are passed into the variable nodes and the decoded soft decisions

are presented at the variable node outputs.

If we consider the soft-equal and soft-XOR gates to be clocked so that they operate

in discrete time iterations, then it is easy to show that the above circuit implements the

sum-product algorithm. However, the analog implementation operates asynchronously

in continuous time, and for this case the connection is not so clear. Research into a

more accurate algorithmic description of the analog circuit operation has gained recent

interest [131].

6.7 Summary

The principle of analog decoding has passed the proof-of-concept phase. The analog

decoder community is still growing, and is now setting its sights upon larger chip designs.

The analog approach has also proven suitable to other front end applications. However,

further work is required before we can replace all digital components, and thus eliminate

the large and power demanding A/D converter. New theoretical models and circuit

111


Parityvariables

InformationvariablesChecks

v8

v7

v6

v5

v4

v3

v2

v1

v16

v15

v14

v13

v12

v11

v10

v9

c8

c7

c6

c5

c4

c3

c2

c1

Variable node:3.5 port soft equal gate

Check node:6 port soft XOR gate

Received channel data:prior input

Decoder soft decision:posterior output

Probability vector bus:2 wires

Bidirectionalprobability vector bus:

4 wires

Figure 6.10: Analog decoder circuit for H.

analysis techniques are also being developed.

The analog approach appears well suited to LDPC decoder design. It offers fast,

accurate, and power efficient decoding. The promise of reduced routing complexity is

also of particular significance to LDPC decoder implementation.

The iterative encoding techniques developed in Chapter 3 allow a digital sum-product

decoder to be easily re-used for encoding reversible codes. However, re-use of the analog

decoder architecture to implement the encoding algorithm is not as obvious. In the

following chapter we modify the analog sum-product decoder core to also allow encoding,

with very little overhead to circuit area.

112

Chapter 7

The Reversible LDPC Codec

7.1 Introduction

In this chapter we show how the analog sum-product decoder may be extended to allow

iterative encoding of reversible codes. We focus only upon the codec core. Interface

circuits such as those discussed in [126, 143] may be added to complete the chip design.

A novel architecture is presented, based upon the iterative Jacobi message-passing

encoder. This discrete time hard decision algorithm offers straightforward reuse of a

digital sum-product decoder for encoding reversible codes. However, methods for reusing

the continuous time soft decision analog decoder to implement the algorithm are not as

obvious.

This chapter contains original work, resulting from collaborations with the Electri-

cal Engineering Department at the University of Utah, and the High Capacity Digital

Communications Laboratory at the University of Alberta. The initial concept codec

design was presented at the 2003 Australian Communications Theory Workshop, in Mel-

bourne [144]. This circuit reused the subthreshold computation cells of the decoder to

perform encoding, allowing half-duplex operation of encode and decode modes. A refined

version of the design was then presented at the 3rd International Symposium on Turbo

Codes and Related Topics, in Brest, France [88]. An alternative approach is presented

in this chapter, which allows the codec to switch between analog decoding and digital

encoding, and offers full-duplex operation. This design was presented at the 2nd Analog

Decoder Workshop, in Zurich, Switzerland [145]. An alternative application for the cell

designs, toward circuit verification, was recently presented at the 3rd Analog Decoder

Workshop, in Banff, Canada [137].

113

Chapter 7. The Reversible LDPC Codec

Mode enc enc Vu VrefN VrefP

Decode Gnd Vdd Vu VrefN VrefP

Encode Vdd Gnd Gnd Gnd Vdd

Table 7.1: Mode-switching gate settings.

7.2 Mode-Switching Gates

From initial investigations into reusing the analog decoder to perform iterative encoding

(Algorithm 3.1) we have identified two issues [88, 144]. Firstly, the output of the soft-

XOR gates decays over time during the iterative process, thus requiring amplification.

The second issue is related to the continuous time operation of the analog circuit. As

the arrival of messages at a node is asynchronous, it is possible for the circuit to stray

from the iterative Jacobi path, before settling to the correct steady state solution. We

therefore latch the feedback, forcing the circuit to update each iteration in discrete time.

These problems both arise from the fact that we are trying to map the discrete time

digital encoding algorithm onto a continuous time analog circuit. The algorithm passes

hard messages ∈ {0, 1} yet the analog circuit passes soft messages ∈ (0, 1). Motivated

by this we present an alternative design, using mode-switching gates (MSG) in the check

nodes. These gates operate as analog soft-XOR gates (Section 6.5.1) during decoding.

They are then switched to operate as digital resistor-transistor-logic (RTL) XOR gates

during the encode operation.

A mode-switching XOR gate is shown in Figure 7.1. This cell is based upon the

voltage-mode soft-XOR gate shown in Figure 6.7, with the addition of four transmission

gates. We may use minimum sized FETs for these transmission gates and hence they

present very little overhead to the original circuit. The cell may be switched between

encode (hard) and decode (soft) modes according to Table 7.1.

We now explain how the circuit generates the output pZ(1). The output pZ(0) is

generated in a similar manner. The transmission gates are used to dynamically rewire

the cell between encode (enc) and decode (enc) modes.

When switched to operate in decode mode (enc = Gnd, enc = Vdd), the circuit

is wired as a soft-XOR gate, equivalent to that shown in Figure 6.7. The voltage Vu

represents an external connection to the driving FET of a p-type current mirror, such

that transistor M1 is biased to supply the unit current Iu. Transistors M2 and M3 form

part of the normalisation network, and M4 is diode connected.

When switched to operate in encode mode (enc = Vdd, enc = Gnd), the circuit

114


Vx,0

Vx,1

Vy,0

Vy,1

Vz,0

Vz,1

Vu

VrefP VrefP

VrefN VrefN

enc enc

enc

enc

enc enc

enc

enc

M1

M2 M3

M4

Figure 7.1: Mode-switching XOR gate.

115


is wired as a digital resistor-transistor-logic XOR gate. We assume that hard decision

{Gnd, Vdd} voltages drive the inputs. We set VrefP = Vdd and use transistor M2 as

a resistive load to the multiplication matrix. The gate voltage of M2 represents the

inverted result for pZ(1). We set Vu = Gnd to turn on M1, and VrefN = Gnd so that

transistors M3 and M4 form an inverter. The true form of pZ(1) is then presented at the

output.

Potential applications for mode-switching gates [137] extend beyond the encoding

architecture presented in this work. Research is ongoing into using these gates to incor-

porate built-in self-test functionality into codec designs [138].

7.3 Codec Core Architecture

We now extend the analog sum-product decoder core described in Section 6.6 to allow

encoding. The encode operation is performed digitally, using the mode-switching XOR

gates presented above. This proof-of-concept design is based upon a type-I reversible code

(Def. 3.3), however the codec architecture may also be used for type-II codes (Def. 5.2).

Consider the decoder circuit corresponding to (6.2), shown in Figure 6.10. We sepa-

rate this circuit into information and parity sections by splitting each 6-port XOR gate,

cr, into two 4-port gates, cur and cpr, as shown in Figure 7.2. Moreover, we replace the

soft-XOR gates used in Figure 6.10 with mode-switching XOR gates.

The information variable nodes (x9 . . . x16) are extended to allow encoding as shown

in Figure 7.3 for the case of x9. Similarly, the parity variables (x1 . . . x8) are extended as

shown in Figure 7.4 for the case of x1. Transmission gates and a multiplexer (MUX) are

used to switch each variable node between encode (enc) and decode (enc) modes. The

multiplexer is also built from transmission gates. Here 2-wire probability vector buses

(thick lines) carry messages representing (pX(0),pX(1)). Where necessary these have been

expanded into single wires (thin lines).

7.3.1 Decode Mode

In decode mode (enc = Gnd, enc = Vdd) the circuit operates as a subthreshold CMOS

analog sum-product decoder [3]. All checks (mode-switching XOR gates) are set to

operate as soft-XOR gates.

In this mode, information and parity variables both perform the same function. Each

variable is based upon a soft-equal gate, as shown in the shaded region of Figure 7.3

116


Paritysection

Informationsection

v8

v7

v6

v5

v4

v3

v2

v1

v16

v15

v14

v13

v12

v11

v10

v9

x8

x7

x6

x5

x4

x3

x2

x1

x16

x15

x14

x13

x12

x11

x10

x9

b8

b7

b6

b5

b4

b3

b2

b1

cp8

cp7

cp6

cp5

cp4

cp3

cp2

cp1

cu8

cu7

cu6

cu5

cu4

cu3

cu2

cu1

Figure 7.2: Codec circuit for H.

From cu6

From cu2

From cu1

Vdiff

Prior

To cu6

To cu2

To cu1

Posterior

S

Select Encoder (enc) or Decoder (enc)

MUX

enc1

enc2

enc3

enc4

enc1

enc2

enc3

enc4

O1

O2

O3

O4

enc

enc enc

drstNDIFF NORM

E3In

E2In

E1In

GateIn

E3Out

E2Out

E1Out

GateOutM1

M2

Figure 7.3: Information variable node structure for x9

117


From cp8

From cp6

From cp1

Vdiff

Prior

From cp1

(p(x1 = 1))

To cp8

To cp6

To cp1

Posterior

S

Select Encoder (enc) or Decoder (enc)

MUX

enc1

enc2

enc3

enc4

enc1

enc2

enc3

enc4

O1

O2

O3

O4

φ1

φ1

φ2

φ2enc

drst

erst

NDIFF NORM

E3In

E2In

E1In

GateIn

E3Out

E2Out

E1Out

GateOutM1

M2

M3

Figure 7.4: Parity variable node structure for x1

and Figure 7.4. The equal gate operates as a voltage mode cell, however the output

result (posterior) is presented as a current-mode pair. Gate edge connections E1Out,

E2Out and E3Out are routed to adjacent check nodes. Edge connections E1In, E2In and

E3In receive messages routed from adjacent check nodes. We assume that the received

channel observation (prior) is represented as a voltage, in log-likelihood form. This

voltage is converted into a current-mode probability vector, using the n-type differential

pair (NDIFF) circuit described in Section 6.4.1. The reference voltage (Vdiff) represents

a log-likelihood value of zero. A vector normalisation circuit (NORM), as described in

Section 6.4.3, then feeds two diode connected FETs. These FETs provide a voltage-mode

representation of the vector, for input (via GateIn) to the equal gate. The decoded soft

output (posterior) is then taken from the output (GateOut port) of the equal gate.

A reset FET, M1, is connected across the input to the normalisation circuit. To reset

the decoder, i.e. clear the previous result, we briefly set drst = Vdd. This causes M1 to

turn on and sets a uniform distribution, p(xs = 0) = p(xs = 1) = 0.5, for all variables.

Once released from reset we allow the decoder circuit some fixed time, tdec, to settle

to its steady state. At this point an interface circuit [143] may be used to sample the

posterior core output, and present this result as a hard decision at the output of the chip.

118


7.3.2 Encode Mode

In encode mode (enc = Vdd, enc = Gnd) the circuit operates as a digital message-passing

Jacobi encoder (Algorithm 3.1). All checks (mode-switching XOR gates) are set to op-

erate as digital RTL XOR gates.

The first step toward encoding is to generate the vector b = Hux⊤u from the infor-

mation variables, as shown in Figure 7.2. Using a multiplexer we bypass the equal gate

that is used in decode mode (see Figure 7.3). The (p(xs = 1)) information bit value is

presented as a hard decision voltage at the (prior) node input. An inverter is then used to

generate p(xs = 0), and both values are sent to each outgoing edge of the variable node.

Transistor M2 at the input of this inverter connects it to ground during the decode mode

(enc = Vdd). This connection is made to prevent the input of the inverter from floating,

which can create a resistive path between the power rails and waste power.

Each check node includes the three adjacent incoming information bits in the hard

XOR operation, thus producing b. For example, the XOR operation at check cu1 includes

information bits x9, x12 and x13 to produce b1.

To implement the Jacobi iteration (3.4) we require only the check node output passed

along the path µcs→vsfor each vs representing the parity bit xs. This value for vs is then

fed back into the checks c ∈ Γ (vs) \ cs, and also forms the final decision for xs. We use a

two phase shift register, consisting of two transmission gates and two inverters, to latch

the feedback (see Figure 7.4). The input to this shift register is taken from the voltage

representing p(xs = 1). We initially ignore the value representing p(xs = 0), and generate

it later at the output stage of the shift register.

Signals φ1 and φ2 are used to clock the shift register, as shown in Figure 7.5. These

clock signals have duty cycle less than 50% and phases constrained such that they are

never both high at the same time. This prevents feedback during the present iteration

from altering results from the previous iteration. During the input phase (φ1 high) the

first transmission gate is closed, i.e. its input and output are connected, and the result

of the present iteration is stored on the gate of the first inverter. Upon completion of this

phase (φ1 low) the first transmission gate is opened. Shortly after this, the output phase

(φ2 high) begins and the second transmission gate closes. Calculation of the next iteration

begins immediately, and continues on after the second transmission gate is opened (φ2

low). The result is then sampled on the next input phase. The process is repeated for the

number of iterations required to encode, e.g. κ = 4 for the reversible code used in this

119


Non-overlapping

Input Phase, φ1

Output Phase, φ2

Figure 7.5: Shift register clock phases.

Function Components FET RequirementDynamic cell reconfiguration Transmission gates 192Edge multiplexing Transmission gates 512Input multiplexing Transmission gates 32Feedback shift registers Transmission Gates/Inverters 80

Table 7.2: Transistor overhead for encoding.

design. Upon completion, at time tenc, the result is latched through to the chip interface.

Initialisation is performed using transistor M3, which ties the input of the second

inverter to Vdd. We hold the encoder in the reset state by setting erst = Gnd. During

the decode mode (enc = Vdd) we set erst = Gnd. Transistors M2 and M3 then turn on,

and connect the input of each inverter to a supply rail, in order to prevent these input

voltages from floating.

7.3.3 Estimate of Encoder Implementation Overhead

An approximate count of the number of FETs used to add encoder functionality to this

(n = 16) analog decoder core is summarised in Table 7.2. We note that it is not necessary

to transform all soft-XOR gates into mode-switching gates. Only those gates which route

messages in the path of the iterative encoding algorithm require conversion, thus saving

some transistor overhead.

The total number of FETs required to transform the above decoder core into a codec

is 816. This represents approximately 15% of the total number of transistors used to build

the core. The average overhead is 51 FETs per bit, and this count can be used as a linear

guide to predict the overhead requirement for a larger codec. Most of these transistors

are used to build transmission gates, and hence they can have minimum dimensions. The

routing overhead is negligible, as only a small number of control signals have been added,

120


Parameter Symbol ValueUnit bias current Iu 100nAVoltage supply Vdd 1.8VReference voltage (n-type) VrefN 0.4VReference voltage (p-type) VrefP 1.4VReference voltage (differential) Vdiff 0.8V

Table 7.3: Codec core simulation parameters.

Iteration Parity Vector (x1...8)0 0 0 0 0 0 0 0 01 1 1 0 1 0 0 1 12 1 1 1 0 1 0 1 03 0 1 0 1 0 0 0 1

≥ 4 1 1 1 1 1 0 1 1

Table 7.4: Example iterative solution.

and data paths have been reused from the decoder.

7.4 Circuit Simulations

We have built a T-SPICE description of the complete codec core described above, for

the code with H defined by (6.2), using the TSMC 0.18µm CMOS technology1. Circuit

simulation parameters are provided in Table 7.3. The voltages representing log-likelihood

prior input to the decoder variables have a maximum deflection of ±0.2V about Vdiff.

Figure 7.6 shows the simulation of an example block decode. The circuit output

current Ixs,1, representing p(xs = 1), is shown for each symbol. In this example the

decoder is released from reset at time zero, and the channel has flipped bits x7 and

x11. The decoder successfully corrects these two errors, to arrive at the codeword x =

[0011001000101010]. The settled decoding process may be safely sampled by the interface

at time tdec = 10µs.

To demonstrate the operation of the encoder we apply the information vector x9...16 =

[00110111] and expect the parity vector x1...8 = [11111011]. From (3.4) we obtain each

step of the Jacobi iterative solution shown in Table 7.4.

For this example, the voltage Vxs,1 representing p(xs = 1) for each parity output bit

is shown in Figure 7.7. The information vector is applied at time zero and the reset latch

1The circuit description uses BSIM3v3 simulation device models, obtained through Canadian Micro-electronics Corporation.

121


0 5 10 15

0

20

40

60

80

100

Time (µs)

Outp

ut

curr

ent,

I xs,1

(nA

)A

AKIx7,1

AAK

Ix11,1

Figure 7.6: Decoder output with bits x7 and x11 corrected.

released 5ns later. The circuit is then clocked for κ = 4 iterations, to arrive at the correct

codeword, i.e. x6 is the only output bit having p(xs = 1) = 0. The iterative solution for

x1, shown in Figure 7.8, matches that predicted in Table 7.4. The encoded result may

be safely latched through to the output interface at time tenc = 50ns.

A summary of specifications for the codec core, measured from circuit simulation,

is provided in Table 7.5. We expect the power requirements and throughput of a larger

core to grow linearly with block length, for both encode and decode modes.

Parameter Decode EncodeTime (per block) 10µs 50nsPower (per block) 110µW 21.6mWEnergy 138pJ/decoded information bit 135pJ/encoded parity bitThroughput 800kbit/sec (information bits) 160Mbit/sec (parity bits)

Table 7.5: Codec core specification.

Although the digital encoder draws significantly more power than the analog decoder,

it operates 200 times faster, and hence the per block energy requirements of the two

modes balance. The average power consumption, assuming an equal number of encode

and decode operations, is 217µW per block. Based upon these results, and considering

a reversible code that is encodable in κ iterations, we expect the time taken to encode a

block to be approximately 0.125κ% of that taken to decode a block.

122


0 5 10 15 20 25 30 35 40 45 50 55−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Time (ns)

Outp

ut

vol

tage

,V

xs,1

(V)

��Vx6,1

Figure 7.7: Encoder parity bit outputs.

0 5 10 15 20 25 30 35 40 45 50 55−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Time (ns)

Outp

ut

vol

tage

,V

x1,1

(V)

Figure 7.8: Encoder output for parity bit x1.

123


7.5 Summary

In this chapter we have designed and simulated the circuit architecture for the core of

a reversible LDPC codec. The analog decoder architecture is well suited to continuous

time soft decision decoding. However, the discrete time hard decision Jacobi encoder is

more appropriately implemented with a digital circuit. In order to achieve these seemingly

contradictory roles for a circuit that is based upon architecture re-use, we have introduced

mode-switching logic gates. Moreover, these gates have other potential applications, e.g.

for circuit verification, that go beyond that studied in this thesis [137, 138].

Analog decoders are robust to design and fabrication errors [90]. This makes their

behaviour difficult to verify, both at design time and after fabrication. However, encoding

is a deterministic process in which errors are exposed quickly. Since we are reusing the de-

coder for encoding, verification of the encode operation also provides implicit verification

for components of the decoder circuit.

The additional area required to convert the analog sum-product decoder into a full

codec circuit is very low. Most of the additional transistors may have minimum dimen-

sions and the routing overhead is negligible. This efficient use of circuit area results from

both the novel encoding algorithm, and the novel circuit architectures that are used in

its implementation.

The example circuit in this work is based upon a small type-I reversible code, i.e.

the code has a circulant Hp. However, by scaling up the design the same architecture

can in principal be used to build a codec for one of the larger type-II reversible codes

presented in Chapter 5. As both encode and decode operations are fully parallel, the

block computation time remains constant as we scale block length.

The time taken for the codec to perform an encode operation is insignificant in

comparison to that taken to decode. For example, we expect the block encode time for

a type-II reversible code to be around 1% of the block decode time. Hence this circuit is

well suited to use in full-duplex communication systems.

124

Chapter 8

Conclusion

8.1 Summary of Results

In this thesis we propose a new approach to LDPC codec design, from both coding the-

ory and circuit implementation perspectives. The approach is based upon the reuse of

the sum-product decoder to perform encoding. It is motivated by the fact that encoding

LDPC codes is in general not a trivial problem, and by the potential to reduce implemen-

tation area. Given that the routing of wires in an LDPC decoder represents a significant

proportion of the total chip area, making further use of the architecture by reusing it for

encoding is advantageous.

In Chapter 3 we propose a novel encoding algorithm and a corresponding class of

reversible LDPC codes. The encoding algorithm is based upon the Jacobi method for it-

erative matrix inversion, for matrices which have elements from F2. The algorithm allows

reuse of a parallel sum-product decoder implementation, and thus offers an alternative to

the existing methods for encoding LDPC codes presented in Chapter 2, which are based

upon serial processing. Moreover, by using the same circuit for encoding and decoding,

verification of the encoder operation implicitly provides verification for components of

the decoder. This represents a further advantage of the iterative Jacobi approach, that

is not shared by the existing dedicated encoder architectures described in Chapter 2.

We determine the convergence constraint that a matrix must satisfy in order to be

iteratively invertible and thus the total number of iterations required to encode. We

also present an algebraic method for constraining a circulant matrix such that it is 4-

cycle free. Using these two constraints we present an algorithm for constructing 4-cycle

free type-I reversible codes, which are encodable using the Jacobi method over F2. The

algorithm allows some flexibility in the choice of code length and rate. From the analysis

125

Chapter 8. Conclusion

presented in Chapters 4 and 5, we find that the type-I construction allows good codes

to be developed for high rate applications. However, the method is not as well suited to

building (3,6)-regular codes, where we attribute decoder convergence problems to poor

expansion in the factor graph representation of these codes.

Motivated by the good expansion property of random graphs, in Chapter 5 we re-

cursively construct type-II reversible LDPC codes, such that they incorporate random

components. These codes are 4-cycle free and are encodable using eight iterations of

the Jacobi encoder. The algorithm produces codes which have better graph expansion,

and performance, than the type-I reversible codes. It also offers greater flexibility in the

choice of code length and rate. Moreover, it allows the construction of (3,6)-regular codes

with good error correcting performance.

In Chapter 7 we present a novel circuit architecture for the core of a reversible

LDPC codec. The circuit switches between analog decode and digital encode operations.

Encoding is performed via the iterative Jacobi method over F2, by reusing the decoder

architecture. In order to switch the circuit between decode and encode modes, we have

presented a novel circuit design for mode-switching gates. These logic gates are able to

switch between analog (soft) and digital (hard) computation, and may also be applied

to built-in self-test circuits for analog decoders. We now discuss the proposed codec

implementation in the context of the design criteria presented in Section 1.2.

Good Error Correcting Performance We have shown that reversible LDPC codes

can be constructed which are 4-cycle free and offer good performance. Empirical

simulation results compare well to those of random and finite geometry benchmark

codes. The subthreshold CMOS sum-product decoder design used as a basis for

this codec is not new. Results from several fabricated proof-of-concept decoders of

this form indicate that implementation performance agrees with that predicted by

simulation. Hence we expect that combining the reversible codes with an analog

decoder will result in good error correcting performance.

Flexibility of Code Design The reversible design approach offers a large amount of

flexibility in the choice of code rate, as the iterative encodability constraint only ap-

plies to a section of the parity-check matrix. We have provided examples of several

reversible codes with block length n ≈ 1000 that offer good empirical performance.

The type-II reversible code with n = 4096 exhibits signs of an error floor at a word

error rate of approximately 10−5. We expect that performance improvements for

126


codes of this length may result from further investigation into alternative recursive

code constructions.

Simplicity of Design The algorithms presented for generating type-I and type-II re-

versible codes are easy to implement. Moreover, the underlying ideas behind these

algorithms, and the encoder itself, are straightforward.

High Speed Analog decoders are expected to operate at speeds of around two orders

of magnitude faster than digital decoders. Recent results for fabricated analog

decoders are in agreement with this expectation. Using the proposed architecture,

in the case of a type-II reversible code, the computational latency of the encode

operation is around 1% of the decode latency. Hence the codec is suitable for

full-duplex applications. Moreover, the parallel implementation implies that the

computational latency of both encode and decode operations is fixed as code length

is scaled.

Low Power Consumption The subthreshold CMOS decoder operates with very low

transistor bias currents and is thus very power efficient. Recent results for fab-

ricated analog subthreshold CMOS decoders indicate that they can operate with

power consumption that is around two orders of magnitude less than that of a dig-

ital decoder. Furthermore, for the proposed codec, the energy requirement of the

encoder is approximately the same as that of the decoder.

Small Area Analog decoder circuits offer a significant saving in routing complexity. As

the size of an LDPC decoder implementation is primarily determined by routing

congestion rather than the transistor count we expect the analog LDPC decoder

to be smaller than a digital decoder. Using the proposed approach, only a small

additional area overhead is required to transform the decoder into a reversible codec.

Ease of Verification It is the task of a decoder to correct errors. Hence the correct

behaviour of a decoder circuit is difficult to verify, both at design time and after fab-

rication. However, encoding is a deterministic process in which errors are exposed

quickly. The proposed codec reuses the decoder for encoding. Hence, verification

of the encode operation also provides implicit verification for components of the

decoder circuit.

By combining the analog approach and the reversible LDPC coding scheme we are

able to build a small, power efficient codec, which offers good performance with a flexible

127


selection of code length and rate. In particular, the small size and low power requirements

of this codec design make it a candidate for use in mobile telephony, wireless networking,

implantable devices and other biomedical applications.

8.2 Suggestions for Further Work

• In this work we have developed design approaches for reversible codes by consider-

ing the encoding constraint, and factor graph expansion metric, in terms of matrix

manipulation. An alternative approach may be to consider how the iterative encod-

ability constraint maps onto the factor graph, and design codes from a graphical

viewpoint. In Chapter 5 we have approached the problem of building codes with

improved expansion, by incorporating randomness into the structure of the graph,

while maintaining its iterative encodability. By considering the encodability con-

straint in graphical terms, we may be able to devise a way of explicitly designing

good expanders that are iteratively encodable. A maximum of eight iterations are

required to encode the reversible codes presented in this thesis. However, the high

speed at which the encode operation may be performed using the architecture pro-

posed in Chapter 7 implies that it should be practical to allow more than eight

encoder iterations. Hence we may relax this design constraint in the search for

codes which offer high error correcting performance.

• This thesis focusses upon encoding via the application of the Jacobi method for iter-

ative matrix inversion. Investigating the suitability of applying other techniques for

solving linear systems of equations to the encoding problem may prove worthwhile.

• The recursive method proposed in Algorithm 5.1 demonstrates that good (3,6)-

regular codes with length n ≈ 1000 can be built. We should be able to develop

similar algorithms which allow reversible codes to be constructed with longer block

length and/or higher column weight.

• An investigation of irregular reversible LDPC code construction was undertaken

in [93]. An example irregular reversible code was built, having rate 1/2 and length

n = 1008. The code contained many connected cycles of length six, and exhibited

a BER floor at around 10−4. Hence an alternative approach to building irregular

structures is required. Using a recursive method, e.g. similar to that of Algo-

rithm 5.1, we may be able to develop good irregular reversible codes.

128


• Results from the finite graph-cover analysis presented in Section 4.4.4 imply that

a weakness in the structure of Hp alone is sufficient to cause convergence problems

in the AWGN channel case. Hence in this work we have focussed on the section of

the factor graph corresponding to Hp. We have randomly generated Hu, imposing

only node degree constraints upon the graph, and blocking 4-cycles. However, there

may be ways to structure Hu which block the creation of small stopping sets, and

hence improve BEC performance.

• It is well known that we may test the parity-check constraints during sum-product

decoding of LDPC codes, and terminate the algorithm early if all constraints are

satisfied. Existing designs for analog LDPC decoders do not incorporate early

stopping criteria. The circuit is instead allowed to run for the fixed amount of

time before the result is latched. By testing the state of check nodes, it may be

possible to incorporate an early stopping criteria into the architecture, so that the

decoder stops once a valid codeword is obtained. This may reduce the average

computational latency of the decode operation.

• In the case of undetected error events, the number of satisfied constraints may

provide some indication of the correctness of the codeword estimate. We are able

to extract soft information pertaining to the satisfaction of individual checks, from

the soft-XOR gate cells. It may then be possible to combine this information using

other soft-logic gates, and provide an analog estimate of how many codeword bits

are erroneous. This may be useful for error tolerant systems which incorporate an

automatic-repeat-request (ARQ) protocol.

• Once the iterative Jacobi method arrives at a valid codeword it does not shift from

that state as further iterations are performed. We may therefore incorporate an

early stopping criteria into the architecture, so that the encoder stops once a valid

codeword is obtained. This may reduce the average computational latency of the

encode operation.

• The mode-switching gates introduced in Chapter 7 have further application to

built-in self-test circuits for analog decoders. Work is ongoing in this area in the

High Capacity Digital Communications Laboratory at the University of Alberta.

Moreover, these gates are not limited to use in an error control codec. They may be

used in any application that could benefit from being able to switch logic between

hard and soft operation.

129

Bibliography

[1] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Pren-tice Hall, 1995.

[2] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block andconvolutional codes,” IEEE Trans. Inform. Theory, vol. 42, no. 2, pp. 429–445,Mar. 1996.

[3] F. Lustenberger, “On the design of analog iterative VLSI decoders,” Ph.D. disser-tation, ETH, Zurich, Switzerland, 2000.

[4] C. E. Shannon, “A mathematical theory of communication,” Bell System TechnicalJournal, vol. 27, pp. 379–423, 623–656, July, Oct. 1948.

[5] R. G. Gallager, Low-density parity-check codes. Cambridge, MA: MIT Press, 1963.

[6] ——, “Low-density parity check codes,” IRE Trans. Inform. Theory, vol. IT-8, pp.21–28, Jan. 1962.

[7] V. Zyablov and M. S. Pinsker, “Estimation of the error-correcting complexity ofGallager low-density codes,” Problems of Info. Trans., vol. 11, no. 1, pp. 23–26,1975.

[8] G. A. Margulis, “Explicit constructions of graphs without short cycles and lowdensity codes,” Combinatorica, vol. 2, no. 1, pp. 71–78, 1982.

[9] R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. In-form. Theory, vol. IT-27, pp. 533–547, Sept. 1981.

[10] C. Berrou, A. Glavieux, and P. Thitimajshirna, “Near Shannon limit error cor-recting coding and decoding: Turbo codes,” in Proc. International Conference onCommunications (ICC 93), Geneva, Switzerland, 1993, pp. 1064–1070.

[11] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: MorganKaufmann, 1988.

[12] R. J. McEliece, D. J. C. MacKay, and J. F. Cheng, “Turbo decoding as an instanceof Pearl’s “belief propagation” algorithm,” IEEE J. Select. Areas Commun., vol. 16,pp. 140–152, Feb. 1998.

[13] B. J. Frey and F. R. Kschischang, “Probability propagation and iterative decod-ing,” in Proc. Allerton Conf. on Communication, Control and Computing, AllertonHouse, Monticello, IL, 1996.

130

BIBLIOGRAPHY

[14] D. J. C. MacKay, “Good error correcting codes based on very sparse matrices,”IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 399–431, March 1999.

[15] N. Wiberg, “Codes and decoding on general graphs,” Ph.D. dissertation, Univ.Linkoping, Linkoping, Sweden, 1996.

[16] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance of low densityparity check codes,” Electron. Lett., vol. 33, no. 6, pp. 457–458, Mar. 1997.

[17] M. Sipser and D. A. Spielman, “Expander codes,” IEEE Trans. Inform. Theory,vol. 42, no. 6, pp. 1710–1722, Nov. 1996.

[18] G. D. Forney Jr., “Codes on graphs: News and views,” in Proc. Int. Symp. onTurbo Codes and Related Topics, Brest, France, 2000, pp. 9–16.

[19] N. Wiberg, R. Kotter, and H.-A. Loeliger, “Codes and iterative decoding on generalgraphs,” European Trans. on Telecommun., vol. 6, pp. 513–525, Sept./Oct. 1995.

[20] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb.2001.

[21] M. G. Luby, M. A. Shokrolloahi, M. Mizenmacher, and D. A. Spielman, “Improvedlow-density parity-check codes using irregular graphs,” IEEE Trans. Inform. The-ory, vol. 47, no. 2, pp. 585–598, Feb. 2001.

[22] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. Inform. The-ory, vol. 47, no. 2, pp. 619–637, Feb. 2001.

[23] S.-Y. Chung, D. J. Forney Jr., T. J. Richardson, and R. L. Urbanke, “On the designof low-density parity-check codes within 0.0045 dB of the Shannon limit,” IEEECommun. Lett., vol. 5, no. 2, pp. 58–60, Feb. 2001.

[24] C. Di, D. Proietti, I. E. Telatar, T. J. Richardson, and R. L. Urbanke, “Finite-length analysis of low-density parity-check codes on the binary erasure channel,”IEEE Trans. Inform. Theory, vol. 48, no. 6, pp. 1570–1579, Jun. 2002.

[25] T. Tian, C. Jones, J. D. Villasenor, and R. D. Wesel, “Construction of irregularLDPC codes with low error floors,” in Proc. ICC 2003, vol. 5, Anchorage, AK,2003, pp. 3125–3129.

[26] J. Feldman, “Decoding error-correcting codes via linear programming,” Ph.D. dis-sertation, Massachusetts Institute of Technology, 2003.

[27] R. Kotter and P. O. Vontobel, “Graph-covers and iterative decoding of finite lengthcodes,” in Proc. Int. Symp. on Turbo Codes and Related Topics, Brest, France, 2003,pp. 75–82.

[28] P. Vontobel and R. Kotter, “Lower bounds on the minimum pseudo-weight of linearcodes,” in Proc. IEEE Int. Symp. on Inform. Theory, Chicago, IL, 2004, p. 70.

131

BIBLIOGRAPHY

[29] J. L. Massey, Threshold decoding. Cambridge, MA: MIT Press, 1963.

[30] M. C. Davey, “Error-correction using low-density parity-check codes,” Ph.D., Univ.Cambridge, Cavendish Laboratory, 1999.

[31] D. J. C. MacKay, Information theory, inference, and learning algorithms. Cam-bridge University Press, 2003.

[32] D. J. C. MacKay, S. T. Wilson, and M. C. Davey, “Comparison of constructionsof irregular Gallager codes,” IEEE Trans. Communications, vol. 47, no. 10, pp.1449–1454, Oct. 1999.

[33] F. R. Kschischang and B. J. Frey, “Iterative decoding of compound codes by prob-ability propagation in graphical models,” IEEE J. Select. Areas Commun., vol. 16,no. 2, pp. 219–230, Feb. 1998.

[34] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics. Reading,MA: Addison-Wesley, 1989.

[35] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-checkcodes under message-passing decoding,” IEEE Trans. Inform. Theory, vol. 47, no. 2,pp. 599–618, Feb. 2001.

[36] G. D. Forney Jr., “Codes on graphs: Normal realizations,” IEEE Trans. Inform.Theory, vol. 47, no. 2, pp. 520–548, Feb. 2001.

[37] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, D. A. Spielman, and V. Stemann,“Practical loss-resilient codes,” in Proc. 29th Symp. Theory Computing, 1997, pp.150–159.

[38] S. M. Aji and R. J. McEliece, “The generalized distributive law,” IEEE Trans.Inform. Theory, vol. 46, no. 2, pp. 325–343, Mar. 2000.

[39] Y. Mao and A. H. Banihashemi, “Decoding low-density parity-check codes withprobabilistic scheduling,” IEEE Commun. Lett., vol. 5, no. 10, pp. 414–416, Oct.2001.

[40] D. J. C. MacKay and M. C. Davey, “Evaluation of Gallager codes for short blocklength and high rate applications,” in Codes, Systems and Graphical Models, ser.IMA Volumes in Mathematics and its Applications, B. Marcus and J. Rosenthal,Eds. New York: Springer-Verlag, 2000, vol. 123, pp. 113–130.

[41] B. Frey, R. Kotter, and A. Vardy, “Skewness and pseudocodewords in iterativedecoding,” in Proc. IEEE Int. Symp. on Inform. Theory, Cambridge, MA, 1998, p.148.

[42] B. J. Frey, R. Kotter, and A. Vardy, “Signal-space characterization of iterativedecoding,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 766–781, Feb. 2001.

[43] M. P. C. Fossorier, M. Mihaljevic, and H. Imai, “Reduced complexity iterativedecoding of low-density parity check codes based on belief propagation,” IEEETrans. Communications, vol. 47, no. 5, pp. 673–680, May 1999.

132

BIBLIOGRAPHY

[44] M. P. Fossorier, “Iterative reliability-based decoding of low-density parity checkcodes,” IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 908–917, May 2001.

[45] F. Guilloud, E. Boutillon, and J.-L. Danger, “λ-min decoding algorithm of regularand irregular LDPC codes,” in Proc. Int. Symp. on Turbo Codes and Related Topics,Brest, France, 2003, pp. 451–454.

[46] P. Zarrinkhat and A. Banihashemi, “Hybrid decoding of LDPC codes,” in Proc.Int. Symp. on Turbo Codes and Related Topics, Brest, France, 2003, pp. 503–506.

[47] S. Howard, “Two new LDPC decoding algorithms: Soft bit decoding and sub-min-sum decoding,” in 3rd Analog Decoder Workshop, Banff, Canada, 2004.

[48] L. Ping, W. K. Leung, and N. Phamdo, “Low density parity check codes withsemi-random parity check matrix,” Electron. Lett., vol. 35, no. 1, pp. 38–39, Jan.1999.

[49] M. Rashidpour and S. H. Jamali, “Low-density parity-check codes with simpleirregular semi-random parity-check matrix for finite-length applications,” in Proc.IEEE Int. Symp. on Personal, Indoor and Mobile Communication, PIMRC2003,vol. 1, Beijing, China, 2003, pp. 439–443.

[50] M. Yang, W. E. Ryan, and L. Yan, “Design of efficiently encodable moderate-lengthhigh-rate irregular LDPC codes,” IEEE Trans. Communications, vol. 52, no. 4, pp.564–571, Apr. 2004.

[51] R. Lucas, M. P. C. Fossorier, Y. Kou, and S. Lin, “Iterative decoding of one-step majority logic deductible codes based on belief propagation,” IEEE Trans.Communications, vol. 48, no. 6, pp. 931–937, June 2000.

[52] Y. Kou, S. Lin, and M. P. Fossorier, “Low density parity check codes based onfinite geometries: A rediscovery and new results,” IEEE Trans. Inform. Theory,vol. 47, no. 7, pp. 2711–2736, Nov. 2001.

[53] R. L. Townsend and E. J. Weldon Jr., “Self orthogonal quasi-cyclic codes,” IEEETrans. Inform. Theory, vol. 13, no. 2, pp. 183–195, Apr. 1967.

[54] M. Karlin, “New binary coding results by circulants,” IEEE Trans. Inform. Theory,vol. 15, pp. 81–92, 1969.

[55] S. J. Johnson and S. R. Weller, “A family of irregular LDPC codes with low en-coding complexity,” IEEE Commun. Lett., vol. 7, no. 2, pp. 79–81, Feb. 2003.

[56] W. W. Peterson, Error Correcting Codes. Cambridge, MA: MIT Press, 1961.

[57] D. A. Spielman, “Linear-time encodable and decodable error-correcting codes,”IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1723 –1731, Nov. 1996.

[58] R. Echard and S. C. Chang, “The π-rotation low-density parity check codes,” inProc. GLOBECOM 2001, vol. 2, San Antonio, TX, 2001, pp. 980–984.

133

BIBLIOGRAPHY

[59] ——, “The extended irregular π-rotation low-density parity check codes,” IEEECommun. Lett., vol. 7, no. 5, pp. 230–232, May 2003.

[60] T. Zhang and K. K. Parhi, “Joint (3,k)-regular LDPC code and decoder/encoderdesign,” IEEE Trans. Sig. Proc., vol. 52, no. 4, pp. 1065–1079, Apr. 2004.

[61] T. J. Richardson and R. L. Urbanke, “Efficient encoding of low-density parity-checkcodes,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 638–656, Feb. 2001.

[62] T. J. Richardson, A. Shokrollahi, and R. Urbanke, “Design of provably good low-density parity-check codes,” in Proc. Int. Symp. Information Theory, Sorrento,Italy, 2000, p. 199.

[63] S. M. Aji, G. B. Horn, and R. J. McEliece, “Iterative decoding on graphs with asingle cycle,” in Proc. IEEE Int. Symp. on Inform. Theory, Cambridge, MA, 1998,p. 276.

[64] Y. Weiss, “Correctness of local probability propagation in graphical models withloops,” Neural Computation, vol. 12, pp. 1–41, 2000.

[65] G. D. Forney Jr., R. Koetter, F. R. Kschischang, and A. Reznik, “On the effec-tive weights of pseudocodewords for codes defined on graphs with cycles,” Codes,Systems and Graphical Models, vol. 123, pp. 101–112, 2001.

[66] T. Etzion, A. Trachtenbeg, and A. Vardy, “Which codes have cycle-free Tannergraphs?” IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 2173–2181, Sept. 1999.

[67] P. Vontobel, “Algebraic coding for iterative decoding,” Ph.D. dissertation, ETH,Zurich, Switzerland, 2003.

[68] S. Ikeda, T. Tanaka, and S. i. Amari, “Information geometry of turbo and low-density parity-check codes,” IEEE Trans. Inform. Theory, vol. 50, no. 6, pp. 1097–1114, June 2004.

[69] J. A. McGowan and R. C. Williamson, “Loop removal from LDPC codes,” in Proc.Inform. Theory Workshop, Paris, France, 2003, pp. 230–233.

[70] J. Campello, D. S. Modha, and S. Rajagopalan, “Designing LDPC codes usingbit-filling,” in Proc. ICC 2001, vol. 1, Helsinki, Finland, 2001, pp. 55–59.

[71] J. Campello and D. S. Modha, “Extended bit-filling and LDPC code design,” inProc. GLOBECOM 2001, vol. 2, San Antonio, TX, 2001, pp. 985–989.

[72] R. M. Tanner, “Explicit concentrators from generalized N-gons,” SIAM J. Alg.Disc. Meth., vol. 5, no. 3, pp. 287–293, Sept. 1984.

[73] N. Alon, “Eigenvalues and expanders,” Combinatorica, vol. 6, no. 2, pp. 83–96,1986.

[74] J. Friedman, “On the second eigenvalue and random walks in random d-regulargraphs,” Combinatorica, vol. 11, no. 4, pp. 331–362, 1991.

134

BIBLIOGRAPHY

[75] G. A. Margulis, “Explicit group-theoretic constructions of combinatorial schemesand their applications in the construction of expanders and concentrators,” Prob-lems Inform. Transmission, vol. 24, no. 1, pp. 39–46, 1988.

[76] A. Lubotzky, R. Phillips, and P. Sarnak, “Ramanujan graphs,” Combinatorica,vol. 8, no. 3, pp. 261–277, 1988.

[77] J. Lafferty and D. Rockmore, “Codes and iterative decoding on algebraic expandergraphs,” in Proc. Int. Symp. on Inform. Theory and Its Applications, Honolulu,HI, 2000, p. 276.

[78] J. Rosenthal and P. O. Vontobel, “Constructions of LDPC codes using Ramanu-jan graphs and ideas from Margulis,” in Proc. Allerton Conf. on Communication,Control and Computing, Allerton House, Monticello, IL, 2000, pp. 248–257.

[79] D. J. C. MacKay and M. S. Postol, “Weaknesses of Margulis and Ramanujan-Margulis low-density parity-check codes,” in Proc. 2nd Irish Conference on theMathematical Foundations of Comp. Sci. and Info. Tech., MFCSIT2002, ser. Elec-tronic Notes in Theoretical Computer Science, vol. 74, Galway, Ireland, 2003.

[80] E. W. Weisstein, “Spanning tree,” [Online]http://mathworld.wolfram.com/SpanningTree.html.

[81] S.-Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis of sum-product de-coding of low-density parity-check codes using a Gaussian approximation,” IEEETrans. Inform. Theory, vol. 47, pp. 657–670, Feb. 2001.

[82] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-checkcodes for modulation and detection,” IEEE Trans. Communications, vol. 52, no. 4,pp. 670–678, May 1999.

[83] M. C. Davey and D. MacKay, “Low-density parity check codes over GF(q),” IEEECommun. Lett., vol. 2, no. 6, pp. 165–167, June 1998.

[84] D. J. C. MacKay and M. C. Davey, “Low density parity check codes over GF(q) ,”in Proc. Inform. Theory Workshop, Killarney, Ireland, 1998, pp. 70–71.

[85] P. O. Vontobel and R. M. Tanner, “Construction of codes based on finite generalizedquadrangles for iterative decoding,” in Proc. IEEE Int. Symp. on Inform. Theory,Washington, DC, 2001, p. 223.

[86] S. J. Johnson and S. R. Weller, “Codes for iterative decoding from partial geome-tries,” IEEE Trans. Communications, vol. 52, no. 2, pp. 236–243, Feb. 2004.

[87] ——, “Structured low-density parity-check codes over non-binary fields,” in Proc.5th Australian Communications Theory Workshop, Newcastle, Australia, 2004, pp.18–22.

[88] D. Haley, C. Winstead, A. Grant, and C. Schlegel, “An analog LDPC codec core,”in Proc. Int. Symp. on Turbo Codes and Related Topics, Brest, France, 2003, pp.391–394.

135

BIBLIOGRAPHY

[89] A. Blanksby and C. Howland, “A 690-mW 1-Gb/s 1024-b, rate-1/2 low-densityparity-check code decoder,” IEEE Journal of Solid-State Circuits, vol. 37, no. 3,pp. 404–412, Mar. 2002.

[90] C. Winstead, J. Dai, W. J. Kim, S. Little, Y. B. Kim, C. Myers, and C. Schlegel,“Analog MAP decoder for (8,4) Hamming code in subthreshold CMOS,” in Proc.ARVLSI (Advanced Research in VLSI), Salt Lake City, Utah, 2001, pp. 132–147.

[91] J. Bond, S. Hui, and H. Schmidt, “Constructing low-density parity-check codes withcirculant matrices,” in Proc. Inform. Theory and Networking Workshop, Metsovo,Greece, 1999, p. 52.

[92] ——, “Constructing low-density parity-check codes,” in EUROCOMM 2000, Mu-nich, Germany, 2000, pp. 260–262.

[93] D. Haley, A. Grant, and J. Buetefuer, “Iterative encoding of low-density parity-check codes,” in Proc. GLOBECOM 2002, vol. 2, Taipei, Taiwan, 2002, pp. 1289–1293.

[94] ——, “Iterative encoding of low-density parity-check codes,” in Proc. 3rd AustralianCommunications Theory Workshop, Canberra, Australia, 2002, pp. 15–17.

[95] D. Haley and A. Grant, “High rate reversible LDPC codes,” in Proc. 5th AustralianCommunications Theory Workshop, Newcastle, Australia, 2004, pp. 114–117.

[96] G. Strang, Linear Algebra and its Applications, 3rd ed. Saunders College Publish-ing, 1988.

[97] T. Shibuya and K. Sakaniwa, “Construction of cyclic codes suitable for iterativedecoding via generating idempotents,” IEICE Trans. on Fundamentals, vol. E86-A,no. 4, pp. 928–939, 2003.

[98] D. J. C. MacKay, “Encyclopedia of sparse graph codes,” [Online]http://wol.ra.phy.cam.ac.uk/mackay/codes/.

[99] A. Vardy, “The intractability of computing the minimum distance of a code,” IEEETrans. Inform. Theory, vol. 43, no. 6, pp. 1757–1766, Nov. 1997.

[100] R. M. Tanner, “Minimum-distance bounds by graph analysis,” IEEE Trans. Inform.Theory, vol. 47, no. 2, pp. 808–821, Feb. 2001.

[101] ——, “Spectral graphs for quasi-cyclic LDPC codes,” in Proc. IEEE Int. Symp. onInform. Theory, Washington, DC, 2001, p. 226.

[102] E. W. Weisstein, “Circulant determinant,” [Online]http://mathworld.wolfram.com/CirculantDeterminant.html.

[103] E. Boutillon, J. Castura, and F. R. Kschischang, “Decoder first code design,” inProc. Int. Symp. on Turbo Codes and Related Topics, Brest, France, 2000, pp.459–462.

136

BIBLIOGRAPHY

[104] A. S. Acampora and R. P. Gilmore, “Analog Viterbi decoding for high speed digitalsatellite channels,” IEEE Trans. Communications, vol. COM-26, pp. 1463–1470,Oct. 1978.

[105] M. S. Shakiba, D. A. Johns, and K. W. Martin, “BiCMOS circuits for analogViterbi decoders,” IEEE Trans. on Circuits and Systems II: Analog and DigitalSignal Processing, vol. 45, no. 12, pp. 1527–1537, Dec. 1998.

[106] H.-A. Loeliger, F. Tarkoy, F. Lustenberger, and M. Helfenstein, “Decoding in analogVLSI,” IEEE Commun. Magazine, vol. 37, no. 4, pp. 99–101, Apr. 1999.

[107] H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarkoy, “Probability prop-agation and decoding in analog VLSI,” IEEE Trans. Inform. Theory, vol. 47, no. 2,pp. 837–843, Feb. 2001.

[108] J. Hagenauer and M. Winklhofer, “The analog decoder,” in Proc. IEEE Int. Symp.on Inform. Theory, Cambridge, MA, 1998, p. 145.

[109] C. A. Mead, Analog VLSI and Neural Systems, ser. Addison Wesley Computationand Neural Systems Series. Reading, MA: Addison Wesley, 1989.

[110] V. Gaudet and A. Rapley, “Iterative decoding using stochastic computation,” Elec-tron. Lett., vol. 39, no. 3, pp. 299–301, Feb. 2003.

[111] A. Rapley, C. Winstead, V. Gaudet, and C. Schlegel, “LDPC decoder design usingstochastic computation,” in 3rd Analog Decoder Workshop, Banff, Canada, 2004.

[112] H. P. Schmid, “Single-amplifier biquadratic MOSFET-C filters,” Ph.D. dissertation,Swiss Federal Institute of Technology, Zurich, Switzerland, 2000.

[113] V. C. Gaudet, “Architecture and implementation of analog iterative decoders,”Ph.D. dissertation, Univ. Toronto, Toronto, Canada, 2003.

[114] C. Winstead, J. Dai, S. Yu, C. Myers, R. R. Harrison, and C. Schlegel, “CMOS ana-log MAP decoder for (8,4) Hamming code,” IEEE Journal of Solid-State Circuits,vol. 39, no. 1, pp. 122–131, Jan. 2004.

[115] C. Winstead et al., “A CMOS analog (16, 11)2 turbo product de-coder,” in 3rd Analog Decoder Workshop, Banff, Canada, 2004,http://www.analogdecoding.org/docs/Winstead ADW04.pdf.

[116] V. Gaudet, R. Gaudet, and G. Gulak, “Programmable interleaver design for analogiterative decoders,” IEEE Trans. on Circuits and Systems II, vol. 49, no. 7, pp.457–464, July 2002.

[117] V. Gaudet and G. Gulak, “A 13.3Mbps 0.35µm CMOS analog turbo decoder ICwith a configurable interleaver,” IEEE Journal of Solid-State Circuits, vol. 38,no. 11, pp. 2010–2015, Nov. 2003.

[118] C. Winstead, N. Nguyen, C. Schlegel, and V. C. Gaudet, “Low-voltage CMOScircuits for analog decoders,” in Proc. Int. Symp. on Turbo Codes and RelatedTopics, Brest, France, 2003, pp. 271–274.

137

BIBLIOGRAPHY

[119] D. Nguyen, “Implementation of low voltage analog decoders,” in 3rd Analog DecoderWorkshop, Banff, Canada, 2004.

[120] F. Lustenberger, M. Helfenstein, H.-A. Loeliger, F. Tarkoy, and G. S. Moschytz,“All-analog decoder for a binary (18,9,5) tail-biting trellis code,” in Proc. EuropeanSolid-State Circuits Conference, Duisburg, 1999, pp. 362–365.

[121] M. Morz, T. Gabara, R. Yan, and J. Hagenauer, “An analog 0.25µm BiCMOStailbiting MAP decoder,” in Proc. International Solid-State Circuits Conference,San Francisco, 2000, pp. 356–357.

[122] A. Xotta, D. Vogrig, A. Gerosa, A. Neviani, A. Graell i Amat, G. Montorsi,M. Bruccoleri, and G. Betti, “An all-analog CMOS implementation of a turbodecoder for hard-disk drive read channels,” in Proc. IEEE Int. Symp. on Circuitsand Systems, ISCAS 2002, Arizona, 2002, pp. V–69–V–72.

[123] A. Graell i Amat, S. Benedetto, G. Montorsi, D. Vogrig, A. Neviani, and A. Gerosa,“An analog decoder for the UMTS standard,” in Proc. IEEE Int. Symp. on Inform.Theory, Chicago, 2004, p. 296.

[124] P. Merkli, H.-A. Loeliger, and M. Frey, “Measurements and observations on analogdecoders for an [8,4,4] extended Hamming code,” in 2nd Analog Decoder Workshop,Zurich, Switzerland, 2003.

[125] S. Hemati and A. Banihashemi, “Full CMOS min-sum analog iterative decoder,”in Proc. IEEE Int. Symp. on Inform. Theory, Yokohama, Japan, 2003, p. 347.

[126] C. Winstead, C. Myers, C. Schlegel, and R. Harrison, “Analog decoding of productcodes,” in Proc. Information Theory Workshop, Cairns, Australia, 2001, pp. 131–133.

[127] J. Dai, “Design methodology for analog VLSI implementations of error controldecoders,” Ph.D. dissertation, Univ. Utah, Utah, 2001.

[128] F. Lustenberger and H.-A. Loeliger, “On mismatch errors in analog-VLSI errorcorrecting decoders,” in Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS2001, vol. 4, Sydney, Australia, 2001, pp. 198–201.

[129] M. Frey, H.-A. Loeliger, F. Lustenberger, P. Merkli, and P. Strebel, “Analog-decoder experiments with subthreshold CMOS soft-gates,” in Proc. IEEE Int.Symp. on Circuits and Systems, ISCAS 2003, vol. 1, 2003, pp. 85–88.

[130] C. Winstead and C. Schlegel, “Density evolution analysis of device mismatch inanalog decoders,” in Proc. IEEE Int. Symp. on Inform. Theory, Chicago, 2004, p.293.

[131] S. Hemati and A. Banihashemi, “Comparison between continuous-time asyn-chronous and discrete-time synchronous iterative decoding,” in Proc. GLOBECOM2004, Dallas, TX, 2004, p. To appear.

138

BIBLIOGRAPHY

[132] M. Morz and J. Hagenauer, “Decoding of convolutional codes using an analog ringdecoder,” in 2nd Analog Decoder Workshop, Zurich, Switzerland, 2003.

[133] A. F. Mondragon-Torres and E. Sanchez-Sinencio, “Floating gate analog implemen-tation of the additive soft-input soft-output decoding algorithm,” in Proc. IEEEInt. Symp. on Circuits and Systems, ISCAS 2002, Arizona, 2002, pp. 89–92.

[134] J. Hagenauer, E. Offer, C. Measson, and M. Morz, “Decoding and equalizationwith analog non-linear networks,” European Trans. on Telecomm. (ETT), vol. 10,no. 6, pp. 659–679, Nov–Dec 1999.

[135] M. Frey and P. Merkli, “An analog circuit that locks onto a pseudo-noise signal,”in 3rd Analog Decoder Workshop, Banff, Canada, 2004.

[136] C. Winstead and C. Schlegel, “Importance sampling for SPICE-level verification ofanalog decoders,” in Proc. IEEE Int. Symp. on Inform. Theory, Yokohama, Japan,2003, p. 103.

[137] D. Haley, C. Winstead, A. Grant, V. Gaudet, and C. Schlegel, “Robust analogdecoder design with mode-switching cells,” in 3rd Analog Decoder Workshop, Banff,Canada, 2004.

[138] M. Yiu et al., “A digital built-in self-test approach for analog iterative decoders,”in 3rd Analog Decoder Workshop, Banff, Canada, 2004.

[139] B. Razavi, Design of Analog CMOS Integrated Circuits. New York: McGraw-Hill,2001.

[140] C. Winstead, V. Gaudet, and C. Schlegel, “Analog iterative decoding of error con-trol codes,” in Canadian Conf. on Electrical and Computer Eng., vol. 3, Montreal,Canada, 2003, pp. 1539–1542.

[141] J. Hagenauer, M. Morz, and A. Schaefer, “Analog decoders and receivers for high-speed applications,” Int. Zurich Seminar on Broadband Commun., Access, Trans-mission, Networking, pp. 3–1–3–8, Feb. 2002.

[142] M. Morz, J. Hagenauer, and E. Offer, “On the analog implementation of the APP(BCJR) algorithm,” in Proc. IEEE Int. Symp. on Inform. Theory, Sorrento, Italy,2000, p. 425.

[143] M. Helfenstein, F. Lustenberger, H.-A. Loeliger, F. Tarkoy, and G. S. Moschytz,“High-speed interfaces for analog, iterative decoders,” in Proc. IEEE Int. Symp.on Circuits and Systems, ISCAS ’99, vol. II, Orlando, Florida, 1999, pp. 424–427.

[144] D. Haley, C. Winstead, A. Grant, and C. Schlegel, “Architectures for error controlin analog subthreshold CMOS,” in Proc. 4th Australian Communications TheoryWorkshop, Melbourne, Australia, 2003, pp. 75–80.

[145] D. Haley, C. Winstead, A. Grant, V. Gaudet, and C. Schlegel, “Reusing analogdecoders for encoding,” in 2nd Analog Decoder Workshop, Zurich, Switzerland,2003, p. http://www.isi.ee.ethz.ch/adw/slides/haley.pdf.

139

eﬃcient architectures for error control using low-density parity-check...

Documents