optimisation conjointe de codes ldpc et de leurs ...€¦ · viterbi tanner 1969 1981 1993 v i t e...
TRANSCRIPT
research & development
Jean-Baptiste Doré
Thèse présentée devant l’INSA de Rennes en vue de l’obtention du doctorat d’Électronique
26 Octobre 2007 à 10H00 – Amphithéâtre de FT R&D Cesson-Sévigné
Optimisation conjointe de codes LDPC et de leurs architectures de décodage et mise en œuvre sur FPGA
2
Foreword
France Telecom R&D UnitsBroadband Wireless Access
• Innovative Radio Interface (RESA/BWA/IRI)• Broadcasting network Cooporation and radio access Mobility (RESA/BWA/BCM)
SupervisorsMarie-Hélène Hamon – R&D engineer at France Telecom R&DPénard Pierre – R&D engineer at France Telecom R&D Ramesh Pyndiah – ENST Bretagne
ContextsPRICE - Internal project
• Prospective Research for Infrastructure and Communication Enhancement VERITY - Internal project
• Validation and Evaluation of Research studies in digITal sYstems
3
Outline
Introduction
Structured LDPC codes
Decoding architectures for LDPC decoders
FPGA implementation of LDPC coder/decoder
Conclusion
4
Introduction
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
■ Context■ LDPC codes■ Decoding LDPC codes■ Encoding LDPC codes■ Codes construction
5
Motivations
Digital communicationsHigh data rates
• Base assumption is now Hundreds of Mbit/s• 1-10 Gbit/s in the future?
Low complexity implementation • Small component size• Low power consumption• Low cost
Physical layerForward Error Correction Scheme
• Close to the theoretical limit
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
6
Background history
1948
Shannon
SNR
BER
Theo
retic
al li
mit
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
time
▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
7
Background history
1948
1955
Shannon
Elias
1962
Gallager
Convo
lution
alco
des
LDPC
code
s
Viterbi Tanner
1969 1981
Viter
bialg
orith
m
Tann
er co
des
> 3dB
SNRB
ERTh
eore
tical
lim
itSNR
BER
Theo
retic
al li
mit
time
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
8
Background history
1948
1955
Shannon
Elias
1962
Gallager
Convo
lution
alco
des
LDPC
code
s
Viterbi Tanner
1969 1981 1993
Viter
bialg
orith
m
Tann
er co
des
Berrou & Glavieux
> 3dB
SNRB
ERTh
eore
tical
lim
itSNR
BER
Theo
retic
al li
mit
Bre
akth
roug
h
Turbo-CodesIterative principle
time
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
9
Background history
1948
1955
Shannon
Elias
1962
Gallager
Convo
lution
alco
des
LDPC
code
s
Viterbi Tanner
1969 1981 1993
Viter
bialg
orith
m
Tann
er co
des
Berrou & Glavieux
MacKay
1995
> 3dB
SNRB
ERTh
eore
tical
lim
itSNR
BER
Theo
retic
al li
mit
Bre
akth
roug
h
Re-dis
cove
ry of
LDPC
code
s
1998 2000
Irreg
ular L
DPC co
des
Duo bi
nary
Turb
o-Cod
es
SNR
BER
Theo
retic
al li
mit
Error floor
Threshold effect
Turbo-CodesIterative principle
time
<0.01 dB
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
10
Definitions
LDPC codesLow Density Parity Check codesParity check constraints, M parity equations and N bits
ModelingMatrix representation
Graphical definition• Bipartite graph (Tanner graph)
Variable node (VN)
Hidden variable node (HVN)
Check node (CN)
Context
▪ LDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
11
Decoding algorithm
Belief Propagation (BP) AlgorithmGraph based algorithmComputation of messages which are propagated along the edges
• Exchange of extrinsic information
Optimal decoding• No cycle into the code graph
ContextLDPC codes
▪ Decoding LDPC codesEncoding LDPC codesCodes construction
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
12
Problematic of encoding
Encoding LDPC codesUnconstraint parity check matrices
• Encoding through the generator matrices G
ContextLDPC codesDecoding LDPC codes
▪ Encoding LDPC codesCodes construction
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
13
Problematic of encoding
Encoding LDPC codesUnconstraint parity check matrices
• Encoding through the generator matrices G
Constraint parity check matrices• Quasi-cyclic codes
• Upper/Lower triangular matrices– Unconstraint– Strictly or not dual-diagonal structure
Simple encoding in a linear time
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
ContextLDPC codesDecoding LDPC codes
▪ Encoding LDPC codesCodes construction
14
Design of LDPC codes
Objective:Define the position of all the non null elements into the parity check matrix
• Degree distribution optimization (EXIT Chart, Density Evolution)
ContextLDPC codesDecoding LDPC codesEncoding LDPC codes
▪ Codes construction
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ICV
I VC
check nodes
variable nodes
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
15
Design of LDPC codes
Objective:Define the position of all the non null elements into the parity check matrix
• Degree distribution optimization (EXIT Chart, Density Evolution)
Unconstraint constructionPseudo random constructionProgressive Edge-Growth (PEG) algorithm [Hu01]
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ICV
I VC
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
ContextLDPC codesDecoding LDPC codesEncoding LDPC codes
▪ Codes construction
16
Design of LDPC codes
Objective:Define the position of all the non null elements into the parity check matrix
• Degree distribution optimization (EXIT Chart, Density Evolution)
Unconstraint constructionPseudo random constructionProgressive Edge-Growth (PEG) algorithm
Structured LDPC codesDual-diagonal structure (RA and IRA codes)Protograph based codes
• Quasi-Cyclic codes
Etc..
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ICV
I VC
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
ContextLDPC codesDecoding LDPC codesEncoding LDPC codes
▪ Codes construction
17
Problematic
How to define an efficient coding system using LDPC codes?
Structured LDPC codes family
Study the link between architectures and codes design
Optimize jointly codes and architectures
A joint definition of the codes and the encoding/decodingmethods is highly recommended
● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
ContextLDPC codesDecoding LDPC codesEncoding LDPC codes
▪ Codes construction
18
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
■ Structured LDPC codes design■ Codes analysis■ Decoding structured LDPC codes■ Conclusions
Structured LDPC codes
19
Motivations
Constraints on family of LDPC codes
Good codes have strictly concentrated CN degree distribution [Chung01]
Richardson et al. design rules about degree 2 variables nodes [Richardson01,03]
Simple characterization• Protograph based codes
dual-diagonal structure for H
Parity check matrices designed from permutation matrices
▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
20
Some definitions
Definition of the code consideredThe parity check matrix H (M x N) is divided into two sub matrices Hs(M x K) and Hp(M x M)
Hp is defined to be a dual-diagonal matrix• Stability condition
• No short cycles involving only degree 2 variable nodes
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions
21
Characterization
Matrix Hs of size MxK is constructed with both:Circularly shifted identity matrices of size z x z
• Notation: Iδ , δ≥0, is a right shifted identity matrix by δ positions (modulo z)
Null matrices of size z x z• Notation: Iδ , δ<0, is a null matrix
Hs can be defined by a (m x k) block matrix• Simple characterization
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions
22
Characterization
Matrix Hp of size MxM is a dual-diagonal matrixAvoid low weight codeword requires a new definition of Hp
Quasi-Cyclic Irregular Repeat Accumulate Codes (QC IRA) [Tanner99]
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions
23
Framework
Distances propertiesWhich kind of configurations are critical for performance?
Cycles propertiesHow to detect cycles into the code graph?What is the role of short cycles on decoder behavior?
Definition of a design algorithm for the family of codes studied
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes design
▪ Codes analysisDecoding Structured LDPC codesConclusions
24
Distances properties
Main resultsBased on Return To Zero properties of the dual-diagonal part of H
• Accumulator code
Bound on minimal distance• Influence the choice of parameter m and the smallest variable node degree q
Rules on permutation coefficients• Weight-Spectrum of the codes can be constrained• Avoid the generation of low weight codeword from low weight information word
Equi-repartition of permutation coefficients on [0,z-1] into a column of Hsbut not strictly…
Structured LDPC codes design
▪ Codes analysisDecoding Structured LDPC codesConclusions
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
25
Cycles detection
Detection of cycle and enumeration of the distributionGeometrical approach
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes design
▪ Codes analysisDecoding Structured LDPC codesConclusions
26
Algorithm for code design
Problematic: Find the unknown coefficient which
Maximizes the cycle length Guarantees a minimal cycle length (Target Cycle Length-TCL)
Application to the description of a design algorithm Incremental construction of the codePEG like algorithm
• Based on protograph representation of the code
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes design
▪ Codes analysisDecoding Structured LDPC codesConclusions
27
Additional constraints
Improve design algorithmTarget Cycle Length (TCL) depends on variable node degree
• ACE (Approximate Cycle Extrinsic message) [Tian03]
Avoid low weight codeword and pseudo-codeword (Trapping-set)• Better minimal distance• Better behavior of the BP decoder
Num
bero
f erro
rPseudo codeword
Decoding failure
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes design
▪ Codes analysisDecoding Structured LDPC codesConclusions
28
State of art
LDPC codes decoding algorithmNo a priori information on the code structure
• BP with flooding scheduling
When the structure of the code is known• Explore other decoding strategies
Example: Codes defined by a protographLayered BP decoding
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
29
Main idea
What’s about the dual-diagonal structure properties?“Isolate trellis-like sub graphs and locally applying the MAP algorithm is a good scheduling” [Forney01]
Modeling of the considered codesConsider the decoder as the dual of the encoder
Association with layer decoding concept
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
30
Illustration of the sequencing
Layered 1
Layered 2
Layered 3
Turbo Layered BP: Scheduling
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
31
Illustration of the sequencing
Layered 1
Layered 2
Layered 3
Turbo Layered BP: Scheduling
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
32
Illustration of the sequencing
Layered 1
Layered 2
Layered 3
Turbo Layered BP: Scheduling
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
33
Illustration of the sequencing
Layered 1
Layered 2
Layered 3
Turbo Layered BP: Scheduling
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
34
Simulation results
Comparison LBP/TLBP
Rules on permutation coefficients to reach the best possible convergence
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysis
▪ Decoding Structured LDPC codesConclusions
35
Synthesis
Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization
Analysis of codes propertiesDistance propertiesCycles properties
Studies on the decoding of structured LDPC codesA priori information on code structure is exploited
QC IRA codes
Codes design algorithm
Turbo Layered BP
Structured LDPC codes designCodes analysisDecoding Structured LDPC codes
▪ Conclusions
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
36
Synthesis
Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization
Analysis of code propertiesDistance propertiesCycles properties
Studies on the decoding of structured LDPC codesA priori information on code structure is exploited
QC IRA codes
Codes design algorithm
Turbo Layered BP
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysisDecoding Structured LDPC codes
▪ Conclusions
37
Synthesis
Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization
Analysis of code propertiesDistance propertiesCycles properties
Studies on the decoding of structured LDPC codesA priori information on code structure is exploited at the decoder side
QC IRA codes
Codes design algorithm
Turbo Layered BP
Introduction
● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion
Structured LDPC codes designCodes analysisDecoding Structured LDPC codes
▪ Conclusions
38
Decoding architectures for LDPC decoders
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
■ Conception flow■ Architectures for LBP decoding algorithm■ Architectures for TLBP decoding algorithm■ Conclusions
39
Framework
Methodology“Cross stage” design flow
Codes design
Decoding AlgorithmLayered BP
Turbo Layered BP
▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
40
Codes design
Decoding AlgorithmLayered BP
Turbo Layered BP
Architectures
Framework
Methodology“Cross stage” design flow
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
41
Framework
Methodology“Cross stage” design flow
Codes design
Decoding AlgorithmLayered BP
Turbo Layered BP
ArchitecturesOptimization of complexityOptimization of processors activityOptimization of data rate
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
42
Framework
Methodology“Cross stage” design flow
Codes design
Decoding AlgorithmLayered BP
Turbo Layered BP
Architectures
A joint design of both code and decoder architecture is highly recommended for the design of an efficient system
Hardware integration
Optimization of complexityOptimization of processors activityOptimization of data rate
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
43
Efficiency of the decoder
Problematic: Maximize the activity of processors?Layered BP with serial architecture for CNP
z is the size of a shifted identity matrixnp is the number of processors working in parallel
z
np parallel processors
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flow
▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
44
Efficiency of the decoder
Problematic: Maximize the activity of processors?
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flow
▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
45
Configurations studied
np = max zAlready studied in the literature
• WiMAX LDPC codes R=1/2 and 2/3
But• Very complex for large z• Not very efficient when z is not constant
np < max zMotivations
• Optimize the activity of processor• Target a complexity
Goals• Look for design rules on permutation coefficients in order to keep the Layered BP
properties
z=np processors
z > np processors
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flow
▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions
46
Turbo Layered BP: Summary of the work
The decoding of a window can start if all the most up-to-date extrinsic information are available
Various sequencing have been studiedConstraints on the code design
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
47
Turbo Layered BP: Summary of the work
Various sequencing have been studiedSerial scheduling (pipelined or not)
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
48
Turbo Layered BP: Summary of the work
Various sequencing have been studiedSerial scheduling (pipelined or not)Parallel scheduling (pipelined or not)
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
49
Various sequencing have been studiedSerial scheduling (pipelined or not)Parallel scheduling (pipelined or not)
Definition of an efficient multi-rate decoder
Turbo Layered BP: Summary of the work
Coding rate
Check nodedegree
Good codes
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
50
J
J
Genericity problematic
Exploit the structure of the parity check matrixProperties of the dual-diagonal structure
Throughput
ComplexitySerial architecture-Low data rate-Very flexible scheme
Parallel architecture-Very high throughput-Bad flexibility (coding rate)
Find a good tradeoff
J
J
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
51
Proposed solution
Parallel architecture for SPC processorsJ0 messages are processed in parallel
J0 ones per rows of Hs
J0=2
J0
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
52
Proposed solution
Parallel architecture for SPC processorsJ0 messages are processed in parallel
J0 ones per rows of Hs
J0=2
J0
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
53
Proposed solution
Parallel architecture for SPC processorsJ0 messages are processed in parallel
J0 ones per rows of Hs
J0=2
J0
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
54
Proposed solution
Parallel architecture for SPC processorsJ0 messages are processed in parallel
Extension of the dual-diagonal part Window decoding of the trellis (serial oriented)
• Memory size proportional to the size of the window
Efficient method for TLBP algorithmVery flexible scheme if J0 is well designed
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithm
▪ Architectures for TLBP decoding algorithmConclusions
55
Synthesis
"Architecture driven" approachJoint design of code and decoder architectures
Architecture for Layered BP decoding algorithmModeling of the CNP processorStudy the case of np < max zSome open issues
• flexible permutation network (barrel shifter)
Architecture for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi rate decoder
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm
▪ Conclusions
56
Synthesis
"Architecture driven" approachJoint design of code and decoder architectures
Architectures for Layered BP decoding algorithmModeling of the CNP processorsStudy the case of np < max zSome open issues
• Flexible permutation network (barrel shifter)
Architecture for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi rate decoder
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm
▪ Conclusions
57
Synthesis
"Architecture driven" approachJoint design of code and decoder architectures
Architectures for Layered BP decoding algorithmModeling of the CNP processorsStudy the case of np < max zSome open issues
• Flexible permutation network (barrel shifter)
Architectures for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi-rate decoder
IntroductionStructured LDPC codes
● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion
Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm
▪ Conclusions
58
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
FPGA implementation of LDPC coder/decoder
■ Implementation options■ Quantization■ Complexity considerations■ Simulation results■ Conclusion
59
Options
Turbo Layered BP algorithmWith and without pipeline
2 decoding processors• p = 2• Duplication of the buffers
Double input buffers• Optimize the processor activity• Optimize the decoding throughput
Memory banks organization• Avoid simultaneous access• Exploit code structure
▪ Implementation optionsQuantizationComplexity considerationsSimulation resultsConclusions
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
60
Options
2 decoding processors• p = 2• Duplication of the buffers
Double input memories• Optimize the processor activity• Optimize the decoding throughput
Memory banks organization• Avoid simultaneous access• Exploit code structure
Turbo Layered BP algorithmWith and without pipeline
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
▪ Implementation optionsQuantizationComplexity considerationsSimulation resultsConclusions
61
Problematic
Continuous to discrete domainInfluence the performance
• Lower granularity• Introduction of erasures
Influence the complexity of the decoder• Size of the memories• Size of internal data path• Complexity of the basic operators (+, - , < …)
What is the good trade off between performance and complexity?
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation options
▪ QuantizationComplexity considerationsSimulation resultsConclusions
62
Problematic
Continuous to discrete domainInfluence the performance
• Lower granularity• Introduction of erasures
Influence the complexity of the decoder• Size of the memories• Size of internal data path• Complexity of the basic operators (+, - , < …)
What is the good trade off between performance and complexity?
Input data• Quantization on 4 bits is a good trade off
Internal data path• Various methods have been studied
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation options
▪ QuantizationComplexity considerationsSimulation resultsConclusions
63
FPGA integration
FIFO yp18%
Memory Banks30%
Decoding processors
52%
Control4%
FIFO yp3%
Memory Banks43%
Decoding processors
50%
Memory
Logic cells
ALTERA STRATIX EP1S80 - C6
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantization
▪ Complexity considerationsSimulation resultsConclusions
64
FPGA integration
FIFO yp18%
Memory Banks30%
Decoding processors
52%
Memory
ALTERA STRATIX EP1S80 - C6
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantization
▪ Complexity considerationsSimulation resultsConclusions
65
FPGA integration
FIFO yp18%
Memory Banks30%
Decoding processors
52%
Control4%
FIFO yp3%
Memory Banks43%
Decoding processors
50%
Memory
Logic cells
ALTERA STRATIX EP1S80 - C6
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantization
▪ Complexity considerationsSimulation resultsConclusions
66
FPGA integration
FIFO yp18%
Memory Banks30%
Decoding processors
52%
Control4%
FIFO yp3%
Memory Banks43%
Decoding processors
50%
Memory
Logic cells
Drawback of a double input buffer
ALTERA STRATIX EP1S80 - C6
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantization
▪ Complexity considerationsSimulation resultsConclusions
67
Simulation context
FPGA Hardware simulation chainALTERA Stratix EPS80-C6
Source QPSK Mapping
LDPC Encoder
AWGN Channel
BER Analyzer
QPSKDe mapping
Multi-Rate LDPC
Decoder
4
2x16
Source• PRBS 20
AWGN Channel• Box Muller algorithm
LDPC decoder/coder• 4 loaded codes
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
68
Simulation context
FPGA Hardware simulation chainALTERA Stratix EPS80-C6
Source QPSK Mapping
LDPC Encoder
AWGN Channel
BER Analyzer
QPSKDe mapping
Multi-Rate LDPC
Decoder
4
2x16
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
69
Simulation context
FPGA Hardware simulation chainALTERA Stratix EPS80-C6
Source QPSK Mapping
LDPC Encoder
AWGN Channel
BER Analyzer
QPSKDe mapping
Multi-Rate LDPC
Decoder
4
2x16
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
70
Simulation context
FPGA Hardware simulation chainALTERA Stratix EPS80-C6
■ QNX real time OS● Automatic measures● Real time performance curves
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
71
Applications
Broadcast contextLarge block size (≈16 kbits)
ParametersAWGN, QPSKQc = 4 bits (+/-7)15 it TLBP
No early error floor
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
FER (dot line)
BER (continuous line)
72
Applications
Broadcast contextLarge block size (≈16 kbits)
ParametersAWGN, QPSKQc = 4 bits (+/-7)15 it TLBP
No early error floor
Validation of both• Code design algorithm• Quantization strategy
Quasi Error Free
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
FER (dot line)
BER (continuous line)
73
Duo binary Turbo-Codes vs LDPC
ObjectivesTry to do a fair comparison
• 8 states duo binary Turbo-codes• LDPC codes studied
Difficulties of comparisonsUsually context are different
• Coding size, coding rate, coding structure, Target performance
Implementation choices• Architectures, Quantizations• FPGA (Altera or Virtex), mm² on x µm for ASIC
Proposed schemeSimilar context (Valentinno)
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
74
Duo binary Turbo-Codes vs LDPC
0.2- 0.4 dB
PerformanceDBTC
• 10 iterations • Max Log Map
LDPC• 20 iterations• TLBP
ContextAWGN, QPSKQc = 4 bits (+/-7)
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
75
Duo binary Turbo-Codes vs LDPC
0.2- 0.4 dB
PerformanceDBTC
• 10 iterations • Max Log Map
LDPC• 20 iterations• TLBP
ContextAWGN, QPSKQc = 4 bits (+/-7)
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
76
Implementation aspectsFPGA – ALTERA Stratix EP1S80F C6
Decoding throughput• At same data rates
x 1.33 x 1.34
Duo binary Turbo-Codes vs LDPC
Duo binary TC
LDPC studied
pipeline
Without pipeline
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
77
Implementation aspectsFPGA – ALTERA Stratix EP1S80F C6
Decoding throughput• At same data rates
Duo binary Turbo-Codes vs LDPC
x 1.33 x 1.34
Duo binary TC
LDPC studied
pipeline
Without pipeline
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
78
Duo binary Turbo-Codes vs LDPC
PerformanceDuo binary TC have a very good decoding threshold
• Even for small size
Proposed LDPC outperforms TC at low error rate
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
79
Duo binary Turbo-Codes vs LDPC
PerformanceDuo binary TC have a very good decoding threshold
• Even for small size• ….. For large coding size, the gap is reduced
Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)
ComplexityDTC outperforms LDPC codes studied…. however
• Maturity of the work and the architecture
A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
80
Duo binary Turbo-Codes vs LDPC
PerformanceDuo binary TC have a very good decoding threshold
• Even for small size• ….. For large coding size, the gap is reduced
Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)
ComplexityDBTC outperforms LDPC codes studied…. however
• Maturity of the work and the architecture
A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
81
Duo binary Turbo-Codes vs LDPC
PerformanceDuo binary TC have a very good decoding threshold
• Even for small size• ….. For large coding size, the gap is reduced
Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)
ComplexityDBTC outperforms LDPC codes studied…. however
• Maturity of the work and architectures
A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
82
Duo binary Turbo-Codes vs LDPC
PerformanceDuo binary TC have a very good decoding threshold
• Even for small size• ….. For large coding size, the gap is reduced
Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)
ComplexityDBTC outperforms LDPC codes studied…. however
• Maturity of the work and architectures
A FEC technology should be considered into a global system,according to the target application
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerations
▪ Simulation resultsConclusions
83
Synthesis
Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration
Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path
ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerationsSimulation results
▪ Conclusions
84
Synthesis
Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration
Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path
ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerationsSimulation results
▪ Conclusions
85
Synthesis
Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration
Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path
ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders
● FPGA implementation of LDPC coder/decoderConclusion
Implementation optionsQuantizationComplexity considerationsSimulation results
▪ Conclusions
86
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
Conclusions
■ General conclusion■ Future prospects■ Discussion
87
Analysis of QC IRA codesConstraints on permutation coefficients
Definition of a new algorithm for the design of codes
Joint studies on code design and decoding algorithm definition
Contributions
▪ General conclusionFuture prospectsDiscussion
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
88
Analysis of QC IRA codes
Decoding architectures for LDPC codesLayered BP algorithm
Turbo Layered BP algorithm
Contributions
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
▪ General conclusionFuture prospectsDiscussion
89
Analysis of QC IRA codes
Decoding architectures for LDPC codes
FPGA implementation of LDPC decodersStudy of quantization effects
Definition of computational units
Complexity analysis of the proposed architectures
Contributions
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
▪ General conclusionFuture prospectsDiscussion
90
Analysis of QC IRA codes
Decoding architectures for LDPC codes
FPGA implementation of LDPC decoders
Disseminations7 international conferences1 journal submission (under review)5 patents
Contributions
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
▪ General conclusionFuture prospectsDiscussion
91
Perspectives
Extension of the workIntegration of the hardware decoder into a realistic context
• Realistic channel• Integration into a whole communication system
Turbo Layered BP decoding• Application to other parity check matrix structures
– Modified dual-diagonal matrix (WiMax)
Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?
• Practical aspect (Finite length)
Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder
• Very high throughput decoding
General conclusion
▪ Future prospectsDiscussion
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
92
Perspectives
Extension of the workIntegration of the hardware decoder into a realistic context
• Realistic channel• Integration into a whole communication system
Turbo Layered BP decoding• Application to other parity check matrix structures
– Modified dual-diagonal matrix (WiMax)
Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?
• Practical aspect (Finite length)
Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder
• Very high throughput decoding
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
General conclusion
▪ Future prospectsDiscussion
93
Perspectives
Extension of the workIntegration of the hardware decoder into a realistic context
• Realistic channel• Integration into a whole communication system
Turbo Layered BP decoding• Application to other parity check matrix structures
– Modified dual-diagonal matrix (WiMax)
Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?
• Practical aspect (Finite length)
Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder
• Very high throughput decoding
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
General conclusion
▪ Future prospectsDiscussion
94
Questions and answers
Optimisation conjointe de codes LDPC et de leursarchitectures de décodage et mise en œuvre sur FPGA
Jean-Baptiste Doré
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
General conclusion
▪ Future prospectsDiscussion
95
Bibliography
[Tanner99] - R. M. Tanner, “On quasi-cyclic repeat-accumulate codes”, in Proc. of. the 37th Allerton Conference, 1999.
[Forney01] - G. D. Forney, “Codes on graphs : Normal realizations”, IEEE Transactions on Information Theory, Feb 2001.
[Chung01] - S.-Y. Chung, T. Richardson, and R. Urbanke, “Analysis of sum-product decoding of low-density parity-check codes using a gaussian approximation”, IEEE Transactions on Information Theory, vol. 47, Feb 2001.
[Hu01] - X.-Y. Hu, E. Eleftheriou, and D.-M. Arnold, “Progressive edge-growth tanner graphs”, IEEE Global Telecommunications Conference, vol. 2, Nov 2001.
[Richardson01] - T. Richardson, A. Shokrollahi, and R. Urbanke, “Design of capacity approaching irregular low-density parity-check codes”, IEEE Transactions on Information Theory, vol. 47, Feb 2001.
[Tian03] - T. Tian, C. Jones, J. D. Villasenor, and R. D. Wesel, “Construction of irregular LDPC codes with low error floors”, IEEE International Conference on Communications, June 2003.
[Richardson03] - T. Richardson, “Error floors of LDPC codes”, 41st Annual AllertonConferenceon Communications, Control, and Computing, Oct 2003.
96
Annexes
General conclusionFuture prospectsDiscussion
▪ Annexes
Annexes
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
97
Outline
Simplification of BP Illustration of the design of codesPerformance example: effects of TCLBP/LBP/TLBPCNP modeling: case of Min-Sum approximation
IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder
● Conclusion
General conclusionFuture prospectsDiscussion
▪ Annexes
98
Simplification of BP algorithm
Practical implementation of BP algorithmApproximation of the function f(x)Limit the number of different edges messages
performance
complexity
99
Algorithm for codes design
Description of the algorithmTarget: > 6 length cyclez = 8Mask considered:
Incremental construction
100
Algorithm for codes design
101
Algorithm for codes design
Expansion of the protograph to detect the configurations
102
Algorithm for codes design
103
Algorithm for codes design
104
Algorithm for codes design
105
Simulation results
BERBLER
TCL>6 – 8 length-cycle
Algorithm parameterization: some resultsAvoid low length cycle involving low degree variable nodes
106
Simulation results
TCL depends on variable node degree
BERBLER
Algorithm parameterization: some resultsAvoid low length cycle involving low degree variable nodes
107
BP/LBP/TLBP
Comparison BP/LBP
Good convergence for LBP15-25 iterations
108
BP/LBP/TLBP
Comparison LBP/TLBP
Good convergence for TLBP10-20 iterations
109
About CNP processors: Modeling for Min-Sum algorithm (I)
Serial implementation of CNP Processorsdc cycles to compute mvc
dc cycles to compute Av
110
About CNP processors: Modeling for Min-Sum algorithm (II)
111
About CNP processors: Modeling for Min-Sum algorithm (II)
112
About CNP processors: Modeling for Min-Sum algorithm (II)
113
About CNP processors: Modeling for Min-Sum algorithm (II)
114
About CNP processors: Modeling for Min-Sum algorithm (II)
115
About CNP processors: Modeling for Min-Sum algorithm (III)
Min-Sum algorithmComparison of the input mvc
• Two smallest messages
It can be done during the forward step• The backward step is not required
but…• dc cycles are required for the computation of mvc
• At least dc cycles are required for the computation of the min (pipeline)• dc cycles are required to re-estimate Av
116
About CNP processors: Modeling for Min-Sum algorithm (IV)
Min-Sum algorithm
The proposed model is valid but parameter ε depends on the algorithm
117