[ieee - kuala lumpur, malaysia (2010.05.7-2010.05.10)] -

FPGA Implementation of High Performance LDPC Decoder using Modified 2-bit Min-Sum Algorithm

Vikram Arkalgud Chandrasetty and Syed Mahfuzul Aziz School of Electrical and Information Engineering

University of South Australia Mawson Lakes, SA 5095, Australia

[email protected], [email protected]

Abstract— In this paper, a reduced complexity Low-Density Parity-Check (LDPC) decoder is designed and implemented on FPGA using a modified 2-bit Min-Sum algorithm. Simulation results reveal that the proposed decoder has improvement of 1.5 dB Eb/No at 10-5 bit error rate (BER) and requires fewer decoding iterations compared to original 2-bit Min-Sum algorithm. With a comparable BER performance to that of 3-bit Min-Sum algorithm, the decoder implemented using modified 2-bit Min-Sum algorithm saves about 18% of FPGA slices and can achieve an average throughput of 10.2 Gbps at 4 dB Eb/No.

Keywords- digital communication; error correction coding; iterative decoding; field programmable gate array; logic design

I. INTRODUCTION Low-Density Parity-Check (LDPC) [1] codes have

become one of the most attractive error correction codes due to its excellent performance [2] and suitability in high data rate applications, such as WiMax, DVB-S2 and so on [3]. The inherent structure of the LDPC code makes the decoder achieve high degree of parallelism in practical implementation [4]. LDPC decoding algorithms are primarily iterative and are based on belief propagation message passing algorithm. The complexity of the decoding algorithm is highly critical for the overall performance of the LDPC decoder. Various algorithms have been proposed in the past to achieve tradeoff between complexity and performance [5, 6]. The Sum-Product Algorithm (SPA) [7], a soft decision based message passing algorithm can achieve best performance, but with high decoding complexity. Whereas, Bit-Flip is a hard decision based algorithm with least decoding complexity, but suffers from poor performance [6]. Min-Sum Algorithm (MSA) is the simplified version of SPA that has reduced implementation complexity with a slight degradation in performance [7].

The MSA performs simple arithmetic and logical operations that makes suitable for hardware implementation. But the performance of the algorithm is significantly impacted by the quantization of soft input messages used [8]. Reducing the quantization of the message is invariably important to reduce the implementation complexity and hardware resources of the decoder. But this advantage comes with degradation in decoding performance. Performance issues and hardware implementation of such low complexity

algorithms, especially the 2-bit MSA has limited information in the literature.

This paper discusses the performance and hardware implementation complexity associated with 2-bit MSA. Modifications are proposed to improve the overall performance of the algorithm to achieve comparable to that of 3-bit MSA. Simulation results reveal that the proposed Modified 2-bit Min-Sum (MMS2) algorithm achieves significant improvement in decoding performance, such as bit error rate (BER) and average decoding iterations compared to 2-bit MSA. With a comparable BER performance to that of 3-bit MSA, FPGA implementation of proposed MMS2 can save up-to 18% of slices and leading to 23% improvement in maximum operating frequency of the LDPC decoder.

II. PROPOSED MODIFIED 2-BIT MIN-SUM ALGORITHM Although the simplified check node operation in MSA

has reduced complexity compared to SPA, the former still requires high precision messages to be exchanged between the decoding nodes in the decoder. This is important to achieve comparable decoding performance to that of SPA, with least performance degradation. The level of quantization used in the soft channel messages represented as Log-Likelihood Ratios (LLR) and extrinsic messages of MSA directly impacts the decoding performance. As the quantization length of the message decreases, the performance and complexity of the algorithm reduces. Studies have shown that there is slight performance loss in going from 5bit to 4bit or even 3bit [8]. Using 2-bit quantized messages in MSA leads to massive reduction in implementation complexity but suffers from significant loss in decoder performance compared to 3bit MSA. The performance of 2-bit MSA has been improved through optimization reported in [9]. The performance is further improved by the Modified 2-bit Min-Sum (MMS2) algorithm proposed in this paper. The check node and variable node operations of MMS2 algorithm is described as follows:

A. Variable Node Operation The variable node operation is similar to that of the

original Min-Sum algorithm [7]. The difference in the proposed algorithm is that the variable node (Vi) performs

Second International Conference on Computer Research and Development

978-0-7695-4043-6/10 $26.00 © 2010 IEEE

DOI 10.1109/ICCRD.2010.186

881

higher precision quantized LLR operations (LLRn), but maps the computed result to 2-bit message to be passed to the check nodes, as in (1). The 2-bit message consists of a sign bit and a magnitude bit representing the computed LLR sum. The mapping is based on a threshold (Tm) obtained from simulations. Depending on the message received from the check nodes (Cj), the 2-bit information is again mapped to constant values (±W or ±w) to perform the LLR sum operation in the variable node. These constant values for mapping are also obtained from simulations. The functions for mapping the 2-bit messages are shown in (2) and (3).

ij

jni CfLLRgV )( (1)

where, n = 1, 2,….N (variable nodes) i = j = 1, 2,….dv (degree of variable node ‘n’)

m

m

m

m

TxifTxif

TyifTyif

yg

110,100,00

,01

)( (2)

11,10,00,01,

)(

xifWxifwxifwxifW

xf (3)

where, Tm is the optimized threshold for mapping

obtained from simulations; W is the optimized higher integer constant obtained from simulations; w is the optimized lower integer constant obtained from simulations. Monte Carlo simulations are carried out to obtain Tm, W and w values that provide best decoding performance.

B. Check Node Operation In MSA, the check node is expected to determine the

product of the sign of incoming messages and also find the minimum of the magnitude of the input messages [7]. In the proposed MMS2, the product of the sign of incoming messages are computed by using XOR operation (Sk) and the minimums are determined using AND operation (Mk). The check node output message (Ck) is obtained simply by concatenating the sign bit and the magnitude bit, as in (6). The message passing between the nodes continues till the parity check is satisfied or maximum iteration is reached.

klVVVS s

lss

k)()(

2)(

1 ..... (4)

klVVVM ml

mmk

)()(2

)(1 &.....&& (5)

}{ kkk MSC (6)

where, l = k = 1,2,….dc (degree of check node) S = Sign bit of check node message M = Magnitude bit of check node message Vl(s)= Sign bit of the message ‘l’ from variable node Vl(m)=Magnitude bit of the message ‘l’ from

variable node

The message mapping in the variable node described above is similar to that presented in [9]. However, the proposed MMS2 algorithm eliminates the overhead of using scaling factor used in [9], uses higher precision LLR for variable node operation and incorporates simple logic for check node operation. These modifications lead to further improvement in performance and yet retain the reduced complexity of routing only 2-bit messages between the variable and check nodes in the LDPC decoder.

III. PERFORMANCE ANALYSIS The performance of the proposed MMS2 algorithm has

been evaluated by developing a software model using C programs in the MatLab environment. The LDPC codes were generated using Progressive Edge Growth (PEG) algorithm [10]. Simulations were carried out assuming that the code words were modulated using Binary Phase Shift Keying (BPSK) and passed over an Additive White Gaussian Noise (AWGN) channel [11].

In [12], a ½ rate (3, 6) regular 1200-bit LDPC code with a maximum decoding iteration of 10 was used for FPGA implementation of 3-bit MSA. This specification has been used for simulation and comparison of the proposed MMS2 algorithm. The corresponding FPGA implementation results are compared in section IV (A). The LLR quantization used for MMS2 is 4-bit. In the variable node, for 4-bit to 2-bit mapping a threshold (Tm) of 2 is used and for 2-bit to 4-bit mapping the weights used are W=3 and w=1.

The BER performance of MMS2 compared to original 2-bit and 3bit MSA is shown in Fig. 1. It can be noted that the MMS2 achieves a gain of 1.5 dB at 10-5 BER over 2-bit MSA and suffers a loss of about 0.3 dB at 10-5 BER over 3-bit MSA. A significant improvement of average decoding iterations for MMS2 compared to 2-bit MSA can be observed in Fig. 2.

IV. FPGA IMPLEMENTATION A fully parallel LDPC decoder architecture was designed

for the proposed MMS2 algorithm. The parameterized hardware model was developed using Verilog Hardware Description Language (HDL) and synthesized using Xilinx synthesis tool. The behavioral and post synthesis simulations were carried out using ModelSim. The block diagram of the designed LDPC decoder is shown in Fig. 3.

The decoder consists of a global ‘Clock’ and synchronous ‘Reset’ inputs. The maximum permissible number of iterations is determined by the value supplied at the ‘MaxIter’ input. This can be set at a value in the range 0-15. When the ‘Configure’ input is high, the ‘MaxIter’ value is read. The LLRs are fed into the decoder using the ‘Load’ control signal. The decoding process is initiated by the ‘Start’ signal. After the decoding is completed, the ‘Decoded

882

Data’ can be obtained when indicated by the ‘DataOut Ready’ signal. The receipt of data can be acknowledged on ‘DataOut Ack’ to receive the next decoded bit. The number of iterations used for decoding can be obtained from ‘Used Iter’ port. The ‘Decoder Status’ port indicates the progress (Active/Idle) of the decoder.

Figure 1. BER performance of MMS2 compared to MSA

Figure 2. Average decoding iterations for MMS2 and MSA

Figure 3. Block diagram of the designed LDPC decoder

Note that the LLRs are loaded serially one at a time to the decoder. Similarly, the ‘Decoded Data’ is latched bit by bit serially. This technique is used because of the limited number of Input/Output ports available in the FPGA. It also provides flexibility for implementing LDPC decoders with variable codelength without modifying the port configuration.

A. Comparative Analysis A parallel architecture for a 1200-bit LDPC decoder, as

described in section III, has been designed, synthesized, placed and routed for Xilinx Virtex 4 (XC4VLX200) FPGA. The maximum operating clock frequency achievable for the decoder is 123 MHz. The throughput of the decoder is calculated based on the formula presented in [12]. This calculation excludes the serial load time of individual LLRs (before starting the decoding process) and latch time of decoded data (after decoding is complete). At an average decoding iteration of 7.2 at 4 dB Eb/No (see Fig. 2) the proposed decoder can achieve an average throughput of 10.2 Gbps. A comparison of the proposed decoder to that presented in [12] is shown in Table I.

TABLE I. TABLE I. COMPARISON OF FULLY PARALLEL LDPC DECODERS

In [12] Proposed Improvement

LDPC Code ½ rate (3,6) regular 1200-bit -

Algorithm 3-bit Min-Sum MMS2 -

BER 10-5 at 3.6 dB 10-5 at 3.9 dB – 0.3 dB

FPGA Xilinx Virtex 4 (xc4vlx200) -

Slices 40,613 33, 345 18%

LUTs 69,038 58,053 16%

Registers 18,945 15,691 17%

Clock 100 MHz 123 MHz 23%

Throughput Not Available 10.2 Gbps

(Avg.) at 4 dB -

6 Gbps (Min) at 10 iterations

7.4 Gbps (Min) at 10 iterations 23%

Results Synthesized, Placed and Routed -

LLR Input

MaxIter

Clock Reset Configure

Load

Decoder Status

Decoded Data Used Iter

DataOut Ready

DataOut Ack

4

4

LDPC Decoder

Start

4

883

B. Implementation Results The 1200-bit LDPC decoder presented above was not

implemented on the FPGA, as Xilinx Vertex 4 was not available. However, a smaller version of the decoder has been implemented using Xilinx Virtex 5 FPGA development board. A ½ rate (3, 6) regular 648-bit LDPC code that complies with WLAN standard [13] was chosen for implementation. A comprehensive testing environment was developed using RS232 serial communication [14] to test the decoder on the FPGA. The setup used to test the LDPC decoder is shown in Fig. 4. An RS232 transceiver module was embedded on the FPGA along with the LDPC decoder module to interface with the RS232 port. MatLab was used to communicate with the FPGA using the serial port. LLRs were generated and sent to FPGA with appropriate control signals for decoding. The decoded data received via the same serial port was used to analyze the performance of the decoder.

The BER performance and average iterations required by the decoder implemented on FPGA compared to the software model is shown in Fig. 5 and Fig. 6 respectively. The summary of FPGA implementation results of the LDPC decoder, including the RS232 serial communication module is shown in Table II. At a maximum operating frequency of 113 MHz, the LDPC decoder implemented can achieve an average throughput of 5.4 Gbps with an average iteration of 6.8 at 4.25 dB Eb/No.

TABLE II. TABLE II. SUMMARY OF FPGA IMPLEMENTATIONRESULTS

Resources LDPC Decoder

Slices 7,755

LUTs 22,014

Registers 8,555

Clock 113 MHz

FPGA Xilinx Virtex 5 (XC5VLX110T-3FF1136)

Figure 4. Block diagram of FPGA test setup for LDPC decoder

Figure 5. BER performance of LDPC decoder from FPGA

Figure 6. Average decoding iterations of LDPC decoder from FPGA

V. CONCLUSION In this paper, a modified 2-bit Min-Sum algorithm is

proposed to reduce the implementation complexity of LDPC decoders. It is shown that with a slight degradation in performance of about 0.3 dB at a BER of 10-5 compared to 3-bit Min-Sum, the proposed decoder leads to significant saving in hardware resource utilization and tremendous increase in average throughput. The performance of the proposed algorithm and its feasibility for practical systems are also verified by implementing the decoder suitable for WLAN. Therefore, the proposed LDPC decoder is a highly attractive solution for applications requiring high performance.

LDPC Decoder

RS232 Rx/Tx

Personal

Computer

MatLab

FPGA

Serial Port Connection

884

ACKNOWLEDGMENT The authors wish to acknowledge Dr Mark Ho of the

School of Electrical and Information Engineering, University of South Australia, for his advice on carrying out the performance simulations.

REFERENCES

[1] [1] R. Gallager, Low-density parity-check codes. IRE Transactions on Information Theory, 1962. 8(1): p. 21-28.

[2] [2] D.J.C. MacKay and R.M. Neal, Near Shannon limit performance of low density parity check codes. Electronics Letters, 1997. 33(6): p. 457-458.

[3] [3] Tetsuo Nozawa (2005) LDPC Adopted for Use in Comms, Broadcasting, HDDs. Nikkei Electronics Asia.

[4] [4] G.L.L. Nicolas Fau (2008) LDPC (Low Density Parity Check) - A Better Coding Scheme for Wireless PHY Layers Design and Reuse Industry Article.

[5] [5] S. Papaharalabos and P.T. Mathiopoulos, Simplified sum-product algorithm for decoding LDPC codes with optimal performance. Electronics Letters, 2009. 45(2): p. 116-117.

[6] [6] N. Miladinovic and M.P.C. Fossorier, Improved bit-flipping decoding of low-density parity-check codes. IEEE Transactions on Information Theory, 2005. 51(4): p. 1594-1606.

[7] [7] A. Anastasopoulos. A comparison between the sum-product and the min-sum iterative detection algorithms based on density evolution. in IEEE Global Telecommunications Conference. 2001.

[8] [8] R. Zarubica, et al. Efficient quantization schemes for LDPC decoders. in IEEE Military Communications Conference. 2008.

[9] [9] Z. Cui and Z. Wang, Improved low-complexity low-density parity-check decoding. IET Communications, 2008. 2(8): p. 1061-1068.

[10] [10] X.-Y. Hu. Software to Construct PEG LDPC code. 2008 [cited 2009 May]; Available from: http://www.inference.phy.cam.ac.uk/mackay/PEG_ECC.html.

[11] [11] J.G. Proakis, Digital communications. 5th ed. ed, ed. M. Salehi. 2008, New York: McGraw-Hill.

[12] [12] R. Zarubica, S.G. Wilson, and E. Hall. Multi-Gbps FPGA-Based Low Density Parity Check (LDPC) Decoder Design. in IEEE Global Telecommunications Conference. 2007.

[13] [13] IEEE 802.11n Wireless LAN Medium Access Control MAC and Physical Layer PHY specifications. 2006, IEEE 802.11n-D1.0.

[14] [14] RS232 Tutorial on Data Interface and Cables. 2009 [cited 2009 Sep]; Available from: http://www.arcelect.com/rs232.htm.

[15] [16]

885

[ieee - kuala lumpur, malaysia (2010.05.7-2010.05.10)] -

Documents