05191026kmg

8/3/2019 05191026kmg

1/5

1020 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

A Novel Architecture for Block Interleaving Algorithm in

MB-OFDM Using Mixed Radix System

Youngsun Han, Peter Harliman, Seon Wook Kim,

Jong-Kook Kim, and Chulwoo Kim

AbstractIn this paper, we present a novel architecture of a block in-terleaver in MB-OFDM systems based on Mixed Radix System (MRS). Weprove mathematically that the proposed architecture can support bit per-mutationsin the interleaving process. The hierarchical property of our pro-

posed MRS-based design methodology allows the proposed architecture tosupport all the required data rates in the MB-OFDM systems with simplemodular design. Furthermore, the same design to be used for the inter-leaver can also be used for the operation of de-interleaving, which reducesthe implementation complexity significantly. The latency of our architec-ture is as low as 6 MB-OFDM symbols. In addition, when comparing ourproposed architecture with the conventional approach, we are able to re-duce the implementation complexity by 85.5%, 69.4%, and 40.3% for 80,200, and 480 Mb/s data rates, respectively, while improving our operating

maximum clock frequency by more than 3.3 times over the conventionaldesign. We also show that the power consumption is reduced by 87.4%,

73.6%, and 39.8% for 80, 200, and 480 Mb/s, respectively.

Index TermsArray processor, block interleaving, MB-OFDM, MixedRadix System (MRS).

I. INTRODUCTION

MB-OFDM [1] has been widely used as one of the de facto standards

for Ultra Wide Band (UWB) communication. MB-OFDM supports var-

ious data rates with low power consumption asshown in TableI. Due to

these performance requirements, the implementation of an MB-OFDM

processor becomes difficult and challenging to developers.

An interleaver reorders input bit sequence into a non-contiguous way

in order to improve the robustness against burst errors in transmission

[2][6]. The interleaver in MB-OFDM consists of three sequential sub-

processes: symbol interleaving, tone interleaving and cyclic shift. Themathematical equations for the sub-processes are represented as the

following [7]:

a

S

[ i ] = a

i

N

C B P S

+

6

N

T D S

2 m o d ( i ; N

C B P S

) (1)

a

T

[ i ] = a

S

i

N

T i n t

+ 1 0 2 m o d ( i ; N

T i n t

) (2)

b [ i ] = a

T

[ m ( i ) 2 N

C B P S

+ m o d ( i + m ( i ) 2 N

c y c

; N

C B P S

) ] (3)

where i is an index for bit sequences with a range of 0 i

8/3/2019 05191026kmg

2/5

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1021

1-bit memory cells and four 1-bit multiplexers. The hierarchical

property of our MRS-based design methodology allows the proposed

architecture to support all the required data rates in MB-OFDM sys-

tems when we apply a simple modular design technique. Furthermore,

the same design to be used for the interleaver can be also used for

the operation of de-interleaving, which reduces the implementation

complexity significantly. The performance analysis shows that our

design achieves better performance than the conventional designs [8]in terms of latency, hardware complexity, power consumption, and

maximum operating clock frequency.

This paper is organized as follows. In Section II, we derive a math-

ematical relationship between MRS and interleaving processes. Also,

we describe an architecture of our interleaver approach with MRS. The

implementation details of the architecture are shown in Section III and

the performance is analyzed in Section IV. Finally, the conclusion is

made in Section V.

II. INTERLEAVING/DE-INTERLEAVING PROCESSOR FOR MB-OFDM

A. Interleaving/De-Interleaving With MRS

The i th element of the DNS sequence can be represented as theM R S ( p

1

) matrix by the following:

[ i ] = [ b i = p

1

c ; m o d ( i ; p

1

) ] (7)

where i in DNS is represented as h b i = p1

c j m o d ( i ; p

1

) i in M R S ( p1

) .

Since i = b i = p1

c 2 p

1

+ m o d ( i ; p

1

) , the matrix in M R S ( p1

) can be

derived by the following:

[ k ; j ] = [ k 2 p

1

+ j ] (8)

where0 j < p

1

,0 k < M = p

1

, andM

is the length of the DNS

sequence .

Assume that the matrix in M R S ( p1

) is transposed into a new ma-

trix 0

in M R S ( p0

1

) , where p0

1

= M = p

1

, before being transformedback to its DNS representation

0

[ j ; k ] = [ k ; j ] : (9)

From (7)(9), a new form of DNS sequence 0 [ i ] can be generated by

the following inM R S ( p

0

1

)

:

0

[ i ] =

0

i = p

0

1

; m o d i ; p

0

1

(10)

where 0 [ j ; k ] is the digit-reversal MRS in a new radix p 01

.

Finally, we exploited the relationship between the new DNS with the

original DNS from (8)(10) as the following:

0

[ i ] = m o d i ; p

0

1

; i = p

0

1

= i = p

0

1

+ p

1

2 m o d i ; p

0

1

: (11)

It can be seen that the derived (11) is in the same form with (1) and

(2), but different from (3). The same expression enables us to imple-

ment the interleaving/de-interleaving permutations by transposing the

matrix p 01

2 p

1

in MRS with two moduli p1

and p 01

.

In the following two sub-sections, we show how the MRS permuta-

tions are mapped into our interleaver architecture for MB-OFDM.

B. Architecture for MRS Modulo Permutation

Fig. 1 shows an array processor consisting of M cells, which inter-

leaves M bits through a modular operation with modulus p1

. The array

processor consists of a 2-D array with size ( ( M = p1

) 2 ( p

1

) ) , and eachcell is connected with four adjacent cells: inputs from lower and right

Fig. 1. Interleaving/de-interleaving processor in M R S ( p ) .

adjacent cells, outputs to upper and left adjacent cells. Forexample, cell

(1,1) receives its input from cell (1,2) and (2,1), and sends its output to

cell (1,0) and (0,1). With this structure, the processor is able to transfer

the incoming bits to both horizontal (from right to left) and vertical

(from lower to upper) directions.

In the case of interleaving withM R S ( p

1

)

, the processor transfers

the incoming data horizontally, along the solid line, from I Ne n c o d e

to

the left direction until heading to the final cell (0,0) at a left and upper

corner. Each individual bit on X th position in the input stream is placed

on a unique cell( a

1

; a

0

)

in the processor, whereX = a

1

2 p

1

+

a

0

, as shown in (8). After the first bit of the input stream arrives at

the final cell (0,0), by changing a flow direction from horizontal tovertical, bit positions in M R S ( p

1

) are transposed into new positions

in M R S ( M = p1

) as shown in (9). Finally, the processor produces the

interleaved output fromO U T

e n c o d e

along the dotted line. It transforms

bit positions in M R S ( M = p1

) into interleaved bit positions in DNS as

shown in (10).

De-interleaving process can be performed with the similar way using

the same architecture, but the process starts fromI N

d e c o d e

vertically

along the dotted line, and the de-interleaved bits areoutput horizontally

along the solid line.

C. Architecture for MB-OFDM

As mentioned in Section I, the interleaving process in MB-OFDM

system consists of three consecutive permutations, (1)(3). As shown

in Fig. 2(a), the first two consecutive modulo permutations (1) and (2)

are expressed as a series of the following three sub-processes: symbol

interleaving in 1-Radix M R S ( p1

) for an M -cells block, division of

the output from the symbol interleaving into p1

sub-blocks having

( M = p

1

) -cells, and tone interleaving in 1-Radix M R S ( p2

) for the each

sub-block. Finally, the last permutation (3) is expressed as a process to

cyclically shift the bit sequence in the each sub-block, interleaved by

the previous two permutations, with Nc y c

in Table I.

Fig. 2(b) shows our architecture in 2-Radix M R S ( p2

j p

1

) for

MB-OFDM system, which combines all the three sub-processes

for the modulo permutations of (1) and (2) into a single process. It

divides the whole block in DNS into p1

sub-blocks in M R S ( p1

) ,differently colored, and divides each sub-block in M R S ( p

1

) into

8/3/2019 05191026kmg

3/5


Fig. 2. Interleaver architecture for MB-OFDM. (a) General representation forinterleaving processes. (b) Our interleaving processor in M R S ( p j p ) .

p

2

sub-sub-blocks in M R S ( p2

j p

1

) at a time. Each sub-sub-block

is represented as a vertical line in Fig. 2(b). All the operations are

supported by only alternatively arranging the sub-sub-blocks with

different colors and extending some wire connections. For example,

the X th bit position in DNS will be transformed into h a2

j a

1

j a

0

i in

M R S ( p

2

j p

1

) on the 2-D array, where X = ( a2

2 p

2

+ a

1

) 2 p

1

+ a

0

.

In Fig. 2(b),h a

2

j a

1

j a

0

i

is represented as cell( a

2

; a

1

)

witha

0

color.

The processor performs interleaving/de-interleaving in a similar wayas the 1-Radix MRS processor in Fig. 1. In the interleaving process, the

input bits from I Ne n c o d e

are moved along the solid line until reaching

to the final cell (0,0) with white color. After the first arrival to the final

cell, the processor produces the interleaved output along the dotted line

through O U Te n c o d e

. At this time, all cells with the same color are

processed to the end before the output is taken from another color. Ai

is used to connect a final cell from one color with the first cell from

another color. Also, the de-interleaving process starts fromI N

d e c o d e

along the dotted line, and the de-interleaved bits are produced through

O U T

d e c o d e

along the solid line.

Finally, in order to perform a cyclic shift shown in (3), we modified

some wire connections and added some multiplexers on the processor.

Fig. 3 shows the cyclic shift among three bit sequences, which are clas-

sified by the result of a modulo operation ( X m o d p 1

) , where X is theposition of each bit in the original input bit sequence. By taking the

Fig. 3. Design of the cyclic shift.

TABLE IIPARAMETERS OF BLOCK INTERLEAVING FOR MB-OFDM

bit sequences from the ( Nc y c

2 0 ) th, ( Nc y c

2 1 ) th, and ( Nc y c

2 2 ) th

positions, respectively, each of the bit sequences are cyclically shifted.The start point (s) of the second bit sequence is connected to the end

point (e) of the first bit sequence, and the third start point is connected

to the second end one. These connections complete the cyclic shift.

III. HARDWARE IMPLEMENTATION

Table II shows parameter values of the block interleaving for

MB-OFDM system. The first column represents data rates supported

by the system. The second column represents the block size for each

data rate, which determines the number of cells in the array processor.

The other columns show first ( p1

) and second ( p2

) moduli which are

used for the two consecutive modular operations in symbol and tone

interleaving processes.

A. Modular Design for Various Data Rates

For consecutive symbol and tone interleaving with modulip

1

andp

2

,

we employed the architecture in Fig. 2(b). However, in order to provide

a modular design for easy implementation, we do not directly use the

architecture in our real implementation.

Fig. 4 shows the schematic diagram of the real implementation that

supports all the data rates shown in Table II. The hardware consists of

three parts: A , B , and C , and each part consists of several cell proces-

sors, multiplexers, and wire connections. Each addition represents 300,

600, and 1200 bit block size, respectively. The implementation is per-

formed by duplicating part ( A ) into ( B ) and part ( A + B ) into ( C ) .

The multiplexers, located between ( A ) and ( B ) , are used to configure

the part ( A ) to be executed alone for the data rates under 80 Mb/s orthe combined part ( A + B ) to be executed for the other data rates under

8/3/2019 05191026kmg

4/5

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1023

Fig. 4. Schematic diagram of an array processor to support 300, 600, and 1200bit block sizes for all data rates in Table II.

Fig. 5. Schematic of each cell in Fig. 4.

200 Mb/s. Instead of using M R S ( 1 0 j 6 ) , we combined two array pro-

cessors( A + B )

and( C )

with 600 cells inM R S ( 1 0 j 3 )

. Additionally,

a controller is added to enable the two array processors to operate al-

ternatively in every 3 bits, according to the first modulus ( p1

) , during

receiving input bits at the data rates over 200 Mb/s. It then produces

the interleaved bit sequence by concatenating the output streams from

the two processors. Through this controller, our implementation pro-

vides the same functionality with the array processor in M R S ( 1 0 j 6 ) .

Finally, in order to support a cyclic shift shown in Fig. 3, the multi-

plexers in solid-line circles are added, and some wire connections are

changed.

B. Cell Design for the Minimum Latency

Fig. 5 shows a schematic of each cell in the array processor. Each

cell consists of two 1-bit memory cell (in our implementation, we used

a flip-flop) and four multiplexers. V_OUT and V_IN are used for the

vertical movement of bit stream, while H_OUTand H_INare used for

the horizontal movement. The role of flip-flops are changed alterna-

tively depending on the control signal SELECT.

Each cell processesthe interleaved bit sequence through one flip-flop

for the time of six symbols and through the other flip-flop for the next

six symbols.One flip-flop isusedto store a new input bit while the other

flip-flop is used for the output. By using this approach, our architecture

does not need any additional delay other than six MB-OFDM symbols,

which is the minimum required latency for the interleaving processes.

As shown in (1)(3), the interleaving algorithm is performed in the unit

of six MB-OFDM symbols. Hence in order to produce the interleavedbit stream, at least six MB-OFDM symbols latency is required.

TABLE IIIPERFORMANCE AND COMPLEXITY OF CONVENTIONAL AND

PROPOSED ARCHITECTURES

IV. PERFORMANCE ANALYSIS

In order to study the performance advantage of the proposed ar-

chitecture, we implemented the proposed architecture using Verilog

HDL, and then synthesized it with Xilinx-ISE, targeting for XilinxVirtex-4 XC4VLX100-10FF1148 FPGA logic [17]. We used a con-

ventional interleaver design [8] as a baseline architecture to compare

the performance of our proposed architecture. The conventional ar-

chitecture combined multiple permutations into pipelined processes in

order to increase the throughput. The conventional architecture per-

forms each permutation sequentially, and it uses embedded memory to

keep the temporary results for each permutation process [9].

Table III shows the performance of the proposed architecture and

conventional architecture in terms of a maximum operating clock fre-

quency, hardware complexity and a latency. Since our architecture re-

sults in a different hardware implementation for different maximum

data rates, we compare three of our hardware implementations for 80,

200, and 480 Mb/s data rates (200 and 480 Mb/s are the mandatory

and maximum data rates of MB-OFDM system, respectively) with the

conventional architecture which supports all of the data rates in Table I

using the same hardware. The proposed architecture reduces the hard-

ware complexity by 85.5% for 80 Mb/s, 69.4% for 200 Mbps, and

40.3% for 480 Mb/s data rates, while improving the maximum clock

frequency allowed by more than 3.3 times. The maximum clock fre-

quency of our architecture is about 500 MHz. In addition, our archi-

tecture incurs six MB-OFDM symbols as a latency in all three pro-

cesses of the interleaver, while the conventional architecture requires

eight MB-OFDM symbols latency. This latency difference is due to the

fact that the conventional architecture executes all the sub-processes

sequentially, while our architecture performs them at the same time.

Table IV shows a power consumption comparison between the conven-

tional architecture and our proposed architecture. The operating clockfrequency is set to 132 MHz as in [1]. Power consumption was esti-

mated by using Xilinx XPower tool. The inputs are assumed to toggle

continuously in order to get the worst case estimation of the toggle rate

at the circuit.

The proposed architecture reduces power consumption of the con-

ventional design by 54.5% in clock power, 88.3% in logic power, and

88.1% in signal power for 80 Mb/s datarate. For 200 Mb/sdatarate, the

proposed architectureonly consumes about 24.6% of logic powerin the

conventional one. This is reasonable due to the fact that the proposed

architecture only uses 30.6% of the logic elements used in the conven-

tional one. Totally, the proposed architecture consumes only around

26.4% of power consumption in the conventional one. Meanwhile, for

480 Mb/s data rate, the proposed design consumes only 60.2% of the

total power in the conventional one due to its 20.0% saving in logicpower consumption.

8/3/2019 05191026kmg

5/5


TABLE IVPOWER CONSUMPTION OF CONVENTIONAL AND PROPOSED ARCHITECTURES

V. CONCLUSION

In this paper, a mathematical relationship between interleaving pro-

cesses and MRS was derived. Based on the derivation, we proposed

an array processor architecture to support interleaving processes effi-

ciently. The performance analysis demonstrates the benefits of our pro-posed architecture in terms of latency, complexity, and power consump-

tion over the conventional approach. The latency of our architecture is

six MB-OFDM symbols, which is the minimal. Also, we reduced the

complexity by 85.5% for 80 Mb/s and 69.4% for 200 Mb/s, while im-

proving the maximum clock frequency allowed by about 3 times com-

pared with a conventional approach. For 480 Mb/s, the complexity was

also reduced 40.3% compared to the conventional one. In addition, we

reduced the power consumption by 87.4%, 73.6%, and 39.8% for 80,

200, and 480 Mb/s, respectively.

REFERENCES

[1] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak,Design of a multiband OFDM system for realistic UWB channel en-vironments, IEEE Trans. Microw. Theory Tech., vol. 52, no. 9, pp.21232138, Sep. 2004.

[2] S. Lin and D. J. Costello, Error Control Coding: Fundamentals andApplications. Englewood Cliffs, NJ: Prentice-Hall, 1983.

[3] J. L. Ramsey, Realization of optimum interleavers, IEEE Trans. Inf.Theory, vol. IT-16, no. 3, pp. 338344, May 1970.

[4] G. D. Forney, Jr., Burst-correcting codes for the classic burstychannel, IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp.772780, Oct. 1971.

[5] K. Andrews, C. Heegard, and D. Kozen, A theory of interleavers,

presented at the IEEE Int. Symp. Inf. Theory, 1997, 97-1634.[6] R. Garello, G. Montorsi, andG. C. SergioBenedetto,Interleaver prop-erties and their applications to the trellis complexity analysis of turbocodes, IEEE Trans. Commun., vol. 49, no. 5, pp. 793807, May 2001.

[7] WiMedia Alliance, MAC-PHY Interface Specification 1.0, 2005.[Online]. Available: http://www.wimedia.org

[8] J. Kim, Interleaver & Deinterleaver for MB-OFDM, AdvancedSystem IC Technology Center (ASTEC), Jul. 2007. [Online]. Avail-able: http://www.astec.re.kr:8080/ipSoC/ipInfo.jsp?ipno=576&left-image=4

[9] X. Jinsong, L. Xiaochun, W. Haitao, B. Yujing, Z. Decai, Z. Xiaolong,and W. Chaogang, Implementation of MB-OFDM transmitter base-band based on FPGA, in Proc.Int. Conf. Circuits Syst. Commun., May2008, pp. 5054.

[10] E. Tell and D. Liu, A hardware architecture for a multi mode blockinterleaver, in Proc. Int. Conf. Circuits Syst. Commun., Jun. 2004.

[11] D. F. Miller andW. S. McCormick, Anarithmetic free parallel mixed-

radix conversion algorithm, IEEE Trans. Circuits Syst. II, vol. 45, no.1, pp. 158162, Jan. 1998.

[12] B. G. Jo and M. H. Sunwoo, New continuous-flow mixed-radix(CFMR) FFT processor using novel in-place strategy, IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005.

[13] J. M. Camara, M. Moreto, E. Vallejo, R. Beivide, J. Miguel-Alonso, C.Martinez,and J. Navaridas, Mixed-radix twisted torusinterconnectionnetworks, in Proc. Int. Par. Distr. Processing Symp., Mar. 2007, pp.110.

[14] W. K. Jenkins and E. J. Altman, Self-checking properties of residuenumber error checkers based on mixed radix conversion, IEEE Trans.Circuits Syst., vol. 35, no. 2, pp. 159167, Feb. 1988.

[15] P. V. A. Mohan and A. B. Premkumar, RNS-to-binary converters fortwo four-moduli sets 2 0 1 , 2 , 2 , 2 0 1 and 2 0 1 , 2 ,2 + 1 , 2 + 1 , IEEE Trans. Circuits Syst. I, Reg. Papers , vol. 54,no. 6, pp. 12451254, Jun. 2007.

[16] M. Akkal and P. Siy, A new mixed radix conversion algorithm MRC-II, J. Syst. Arch., vol. 53, no. 9, pp. 577586, Sep. 2007.

[17] Xilinx, San Jose, CA, Virtex-4 Multi-Platform FPGA, 2006. [On-line]. Available: http://www.xilinx.com/products

05191026kmg

Documents