constant log bcjr turbo decoder with pipelined architecture

5

Click here to load reader

Category:

Documents


2 download

DESCRIPTION

Error correcting codes required to get the correct data from the corrupted signal due to noise and interference. There are many error correcting codes. Among them Turbo codes is considered to be the best because it is very close to the Shannon theoretical limit. The MAP algorithm is commonly used in the turbo decoder. Among the different versions of the MAP algorithm Constant log BCJR algorithm have less complexity and good error performance. The Constant log BCJR algorithm can be easily designed using look up table which reduces the energy consumption. In this work the proposed decoder is designed to decode two blocks of data at a time this increases the throughput and complexity of the decoder. Complexity is further reduced by the use of the add compare select (ACS) units and registers. The proposed decoder is coded in Verilog and synthesized with Spartan3 device. In comparison to the LUT log BCJR decoder, proposed design has less power consumption and memory usage. The Constant log BCJR architecture shows 7.06% of gate reduction and 38.08 % of power is reduced. Also proposed design shows good BER performance.

TRANSCRIPT

Page 1: Constant Log BCJR Turbo Decoder With Pipelined Architecture

International Journal of Scientific Research and Engineering Studies (IJSRES)

Volume 1 Issue 3, September 2014

ISSN: 2349-8862

www.ijsres.com Page 65

Constant Log BCJR Turbo Decoder With Pipelined Architecture

Aiswarya .S

Student

M. Tech, Embedded Systems

Sree Buddha College of Engineering

Alappuzha

Ragimol

Asst. professor

Dept. of ECE

Sree Buddha College of Engineering

Alappuzha

Abstract: Error correcting codes required to get the

correct data from the corrupted signal due to noise and

interference. There are many error correcting codes. Among

them Turbo codes is considered to be the best because it is

very close to the Shannon theoretical limit. The MAP

algorithm is commonly used in the turbo decoder. Among

the different versions of the MAP algorithm Constant log

BCJR algorithm have less complexity and good error

performance. The Constant log BCJR algorithm can be

easily designed using look up table which reduces the energy

consumption. In this work the proposed decoder is designed

to decode two blocks of data at a time this increases the

throughput and complexity of the decoder. Complexity is

further reduced by the use of the add compare select (ACS)

units and registers. The proposed decoder is coded in Verilog

and synthesized with Spartan3 device. In comparison to the

LUT log BCJR decoder, proposed design has less power

consumption and memory usage. The Constant log BCJR

architecture shows 7.06% of gate reduction and 38.08 % of

power is reduced. Also proposed design shows good BER

performance.

Keywords: Turbo decoder, MAP, ACS, Constant log

BCJR algorithm

I. INTRODUCTION

Over the years, there has been increase in the use of

digital communication in the field of computer, cellular and

satellite communication. The digital communication, the

information is the binary data which is modulated on the

analog waveform and transmitted through the communication

channel. In the communication channel the information may

be interfered by the noise. At the receiver station the corrupted

signal is demodulated and then it is converted in the binary

bits. Due to the noise the data may is not the correct replica of

the transmitted signal. So we use channel coding scheme to

protect against the noise and interference.

Turbo codes are known to be the best forward error

correcting code. The turbo code provides reliable data

transmission within a half decibel of the Shannon limit

[13].The iterative nature of the turbo decoding algorithm

increases the complexity. The two decoding algorithm used

for the turbo codes are Maximum A Posteriori (MAP) and

Soft Output Viterbi Algorithm (SOVA)[5]. The MAP

algorithm is based on the most likely data sequence. While

SOVA is based on the most likely connected path through the

trellis tree. The MAP algorithm performed well than the

SOVA at low SNR. However the implementation of the

decoder using the MAP algorithm is more difficult.

In order to reduce the computational complexity of the

MAP algorithm is usually implemented in the logarithmic

domain and corresponding algorithm is the log-MAP

algorithm[7]. Further the log-MAP algorithm can be

simplified. Then we get max-log-MAP, constant log MAP and

linear log map algorithm[5]. The log MAP algorithm can be

implemented using look up table. The LUT- log-BCJR

decoder is based on this. By the use of the constant log-

BCJR(constant log-MAP) algorithm we can reduce the look

up table elements than in the LUT- log- BCJR algorithm[`1].

We cannot reduce the energy consumption of the decoder by

reducing clock frequency and the throughput. So we can

reduce the energy by low complexity hardware.. So new

architecture is developed which contain only registers and add

compare and select unit for the constant log-BCJR block. Also

proposed design is worked with two blocks of data at a time.

This helps to increase the throughput twice.

In section II we discussed about the conventional

architecture of the decoder. Section III is deals with the

proposed architecture. Simulated results and comparison

results are explained in the section IV and found that the 7.06

% of area and 38.08 % of power is reduced. Section V

concludes the paper.

II. SYSTEM ARCHITECTURE

The architecture of the turbo code has an encoder part and

decoder part. The overall structure of the architecture is shown

in the figure 1.The encoder is the parallel concatenation of the

recursive systematic convolutional encoder [14-15]. That is

the convolutional encoder has a feedback structure which to

get a high weight codes. If the parity1 is low weight then

because of the interleaver the chance of producing the low

weight code by the parity 2 is very low. Thus our encoded

code becomes high weight code.

Page 2: Constant Log BCJR Turbo Decoder With Pipelined Architecture

International Journal of Scientific Research and Engineering Studies (IJSRES)

Volume 1 Issue 3, September 2014

ISSN: 2349-8862

www.ijsres.com Page 66

Figure 1: Turbo encoder and decoder block diagram

In the channel the code get corrupted by the noise or

interferences. In a typical turbo decoding system, two

decoders operate iteratively and pass their decisions to each

other after each iteration. The turbo decoder generally uses the

MAP algorithm in at least one of the component decoders. The

decoding process begins by receiving partial information from

the channel (i.e., systematic output: Yks and parity 1: Yk

P1) and

passing it to the first decoder. The rest of the information,

parity 2 (YkP2

) goes to the second decoder and wait for the rest

of the information to catch up. While the second decoder is

waiting, the first decoder make an estimate of the transmitted

information, interleaves it to match the format of parity 2, and

sent it to the second decoder. The second decoder and the

channel and re-estimate is looped back to the first encoder

where the process start again.

Turbo decoding process can be explained as follows:

Encoded information sequence Xk is transmitted over an

Additive White Gaussian Noise (AWGN) channel, and a noisy

received sequence Yk is obtained. Each decoder calculates the

Log-Likelihood Ratio (LLR) for the k-th data bit dk, as

L(dk )

= Log( p(d

k =1/Y) / ( p(d

k =0/Y) (1)

L(dk

) can be found out using different algorithms [5], [7],

[11].LLR can be decomposed into 3 independent terms, as

L(dk ) =L

apri(d

k) +L

c(d

k) +L

e(d

k ) (2)

where L apri

(dk) is the a-priori information of data bit dk L

c(d

k)

is the channel measurement.

L e(d

k ) is the extrinsic information from the other

decoder. The extrinsic information of the second decoder is

the a-priori information of the first decoder.

III. CONSTANT LOG BCJR ARCHITECTURE

The decoder used is process two blocks of data at a time.

The decoder has two soft input soft output (SISO) decoder

which receives data as soft input that is multibit input. In the

first iteration SISO1 is working and which produce soft output

which is interleaved given to the SISO2. SISO2 is not work in

the first iteration and after receiving the soft output from the

SISO1 it begins to calculate soft output and it is deinterleaved

and given to SISO1 and this process will continue up to

certain iteration. When blocks B1, B2 are processing in the

decoder new two blocks B3,B4 are buffered in.

Here used one level of pipelining across the blocks by

using separate hardware for SISO1 and SISO2 for processing

two blocks of data at a time. SISO1 is expected to receive

system1, parity 1and SISO 2 is expected to receive system2

and parity 2. Both SISO‟s receives blocks in the forward order

and reverse order. So that it can produce two outputs for the

other decoder that is forward output and backward output. If

the 10 iteration is required for the operation of the decoder,

then 20 iterations is needed for the decoding of two blocks of

data.

Both SISOs take turn processing the sequence and

exchange the soft information until the soft information is

converged to a range. The hard decision will then be made

from the final soft information sequence. The block diagram

of the decoder is shown in the fig. 2.

Figure 2: Decoder block diagram

Input buffer receives the system bits and parity bits. The

control unit generates control signals for the operation of the

SISO‟s, input buffer and hard decision units. The interleaver

permutes the data. In order to reduce the interlevar area the

interleaver coefficients are stored in the ROM. The

deinterleavers also have same circuit like interleaver. The hard

decision is taken in a way that if soft output after particular

iteration is negative then hard bit is „1‟.The internal diagram

of the SISO is shown in the figure 3. SISO is worked using

Constant log BCJR algorithm. At first we have to calculate

branch metric. It is either initialized to zero or a-priori

probability [1]. To find L(dk) we have to find out forward

recursion metric it is found by traversing trellis in the forward

direction. And also found out backward recursion metric

which is found by traversing from the last data of the block.

L(dk) = max* (Sk ,Sk-1, 1) ((γ1

-(S k-1 , Sk) +

α-k-1(Sk-1) + β

-k(Sk)) - max* (Sk ,Sk-1, 0) ((γ0

-(S k-1 , Sk) + α

-k-

1(Sk-1) + β-k(Sk)) (3)

The forward metric can be found using the equation for K

from 0 to N where K is the particular time and n is the block

length.

αk-(Sk) = max* (Sk-1, i) ((γi

-(S k-1 , Sk) + α

-k-1(Sk-1) (4)

The reverse metrics β can be found from N to 0

β-k(Sk)= max* (Sk-1, i) ((γi

-(S k-1 , Sk) + β

-k+1(Sk+1)) (5)

The branch metric from one state to another state can be

found using following equation.

Page 3: Constant Log BCJR Turbo Decoder With Pipelined Architecture

International Journal of Scientific Research and Engineering Studies (IJSRES)

Volume 1 Issue 3, September 2014

ISSN: 2349-8862

www.ijsres.com Page 67

-1) = K+ln( P(

-1 )) + 2/N

0 (y

s

k x

s

k (i) + y

p

k x

p

k )) (6)

For the constant-log-MAP algorithm, approximates the

Jacobi logarithm and the max* is calculated as

Max*(x,y) = max(x,y) + { 0 if | y-x| > T; C if |y-x|<= T} (7)

Figure 3: SISO architecture

This SISO architecture (fig.3) produces two outputs, that

is sof and sob. The sof output which produces soft output for

half blocks of data in the forward order and sob produce soft

output for another half of data in the reverse direction. This

block receives soft inputs sif and sib from the decoder.

Mf and Mb unit: It calculate branch metrics

FCAL/ BCAL UNIT:

This unit consists of register and four parallel metrics

calculation units. The single metric calculation unit shown in

the fig. 4. The ACS unit (Add Compare Select unit consists of

two adders and compare and select unit consists Max* unit.

The adder unit calculates sum of forward branch/reverse state

metric and branch metric for zero transition branch and 1

transition branch.

Figure 4: Single F/BCAL unit

The Max* unit for the forward and backward metric

calculator is given in the fig. 5[1]. R1 and R2 is loaded with

the values of the adders. It takes input from Registers R1 and

R2. Then given the result the R3. The four forward and four

backward state metrics are stored in register for the next

calculation of the metrics.

Figure 5: Max* unit

Max* can be calculated as follows

OPERATION 1

O =10110, In this clock cycle max* operation is

calculated. Input is given to

x~ and y~ . Result C0 determine the max(x~ , y~)

OPERATION 2

O =11001, x~ is loaded with the value T,|the result of first

operation is loaded in the y~ .C1 determine the outcome of

the test |x~ , y~| >T

OPERATION 3

O = 00000, x~ is provided with maximum x~ and y~. y~

is loaded with constant T. If C1 =0, Max(x~, y~) is added with

C. If C1=1 Max(x~, y~) is added with 0.

ADD unit:

This unit gets 3 input of data that is branch metrics M

state metrics F and B. this circuit consists two adder for

adding three inputs. These units produce M+F+B for 0 state

and1 state transition.

COMPARE AND SELECT

Compare and select compare the two output from the

adder this units 6 max* unit for the temporary soft outputs

calculation.

SUM/COMP:

It compare the temporary outputs add with the soft inputs.

IV. RESULTS AND DISCUSSION

The proposed architecture is simulated using VHDL and

synthesized using SPARTAN 3 device would get the

following results. The simulation result of proposed

architecture is shown in the fig. 6.

Page 4: Constant Log BCJR Turbo Decoder With Pipelined Architecture

International Journal of Scientific Research and Engineering Studies (IJSRES)

Volume 1 Issue 3, September 2014

ISSN: 2349-8862

www.ijsres.com Page 68

Figure 6: Simulation results of the constant log BCJR decoder

The Constant log BCJR algorithm and LUT log BCJR

algorithm [1] is designed for a particular architecture. Then

the comparison based on the architecture are shown in the

table 1.

Parameter LUT log BCJR Constant log

BCJR

No.4 input LUT

23235 21708

Logics Distribution

No. of occupied

slices:

No. of slices

contains only related

logic:

12,640

12,640

0

11,547

11,547

0

No. of4 input LUT

used as logic:

23235

21708

Total equivalent

gate count for

design

Power consumption

193329

140 Mw

179672

88 Mw

Table 1: comparison on the basis of synthesis report

From the synthesis report it is found that 7.06 % total gate

count is reduced. From power analysis of both the architecture

we found that 38.02% of power is reduced. The particular

design is simulated in the MATLAB shows average BER of

3.858-004 at 10 iterations.

V. CONCLUSION

The study of turbo codes starts from the error correction

codes in the information theory. The turbo codes are derived

from the convolution codes. It is the parallel concatenation of

the recursive systematic convolutional codes. Turbo codes are

considered to be having high BER performances. The turbo

encoding is very easy but decoding is difficult. There are

many algorithms for the decoding. From that constant log

BCJR algorithm is used for our decoder architecture. The low

complexity turbo decoder with constant log BCJR algorithm is

designed. Here two blocks of data is processing at same time.

Although the hardware requirement is high the advantage is

that the throughput is more due to process the two blocks of

data at a time. When the proposed architecture is compared

with the LUT log architecture on the same design, it is found

that the 7.06% of area and 38.08% of power consumption

reduced. Also the proposed architecture have has better BER

performance.

In future we can design an decoder architecture which can

processed various blocks of input in parallel and also designed

a decoder which decodes the data of various block length.

REFERENCES

[1] Liang Li, Robert G. Maunder, Bashir M. Al-Hashimi,

Fellow, and Lajos Hanzo” A Low-Complexity Turbo

Decoder Architecture for Energy-Efficient Wireless

Sensor Networks” IEEE transactions on Very Large Scale

Integration (vlsi) systems, vol. 21, no. 1, pp.14-22,January

2013.

[2] L. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo,

“An energy-efficient error correction scheme for IEEE

802.15.4 wireless sensor networks,” Trans. Circuits Syst.

II, vol. 57, no. 3, pp. 233–237, 2010.

[3] Stylianos Papaharalabos, P. Takis Mathiopoulos, Guido

Masera,and Maurizio Martina, “On Optimal and Near-

Optimal Turbo Decoding Using Generalized max*

Operator”, IEEE Communications Letters, 2009

[4] Z. Wang, “High-speed recursion architectures for MAP-

Based turbo decoders,” IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 15, no. 4, pp. 470–474, Apr.

2007

[5] Z. He, P. Fortier, and S. Roy, “Highly-parallel decoding

architectures for convolutional turbo codes,” IEEE Trans.

Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 10, pp.

1063–8210, Oct. 2006.

[6] C.M.Wu, M. D. Shieh, C. H.Wu,Y.T.Hwang, and

J.H.Chen, “VLSI architectural design tradeoffs for

sliding-window log-MAP decoders,” IEEE Trans. Very

Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 439–

447, Apr. 2005.

[7] Jagadeesh Kaza and Chaitali Chakrabarti,“Design and

Implementation of Low-Energy Turbo Decoders” IEEE

transactions on Very Large Scale Integration (VLSI)

systems, vol. 12, no. 9, September 2004.

[8] Guido Masera, Marco Mazza, Gianluca Piccinini, Fabrizio

Viglione, and Maurizio Zamboni “Architectural Strategies

Page 5: Constant Log BCJR Turbo Decoder With Pipelined Architecture

International Journal of Scientific Research and Engineering Studies (IJSRES)

Volume 1 Issue 3, September 2014

ISSN: 2349-8862

www.ijsres.com Page 69

for Low-Power VLSI Turbo Decoders” IEEE transactions

on Very Large Scale Integration (VLSI) Systems, vol. 10,

no. 3, June 2002

[9] C. Schurgers, F. Catthoor, and M. Engels, “Memory

optimization of MAP turbo decoder algorithms,” IEEE

Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no.

2, pp. 305–312, Feb. 2001.

[10] M. C. Valenti and J. Sun, “The UMTS turbo code and an

efficient decoder implementation suitable for software-

defined radios,” Int. J. Wirel. Inform. Netw., vol. 8, no. 4,

pp. 203–215, 2001.

[11] P. Robertson, E. Villebrun, and P. Hoeher, “A

comparison of optimal and sub-optimal MAP decoding

algorithms operating in the log domain,” in Proc. IEEE

Int. Conf. Commun., pp. 1009–1013, 1995.

[12] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near

Shannon limit error correcting coding and decoding:

Turbo codes,” in Proc. IEEE Int. Conf. Commun., 1993,

pp. 1064–1070.

[13] L. R. BAHL, J. COCKE, F. JELINEK, AND J. RAVIV

“Optimal Decoding of Linear Codes for Minimizing

Symbol Error Rate” IEEE Trans. Iformation theory., vol.

13, no. 4, pp. 284–288, mar. 1974.