05191026kmg

Upload: gayathri-srinivasa

Post on 07-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 05191026kmg

    1/5

    1020 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

    A Novel Architecture for Block Interleaving Algorithm in

    MB-OFDM Using Mixed Radix System

    Youngsun Han, Peter Harliman, Seon Wook Kim,

    Jong-Kook Kim, and Chulwoo Kim

    AbstractIn this paper, we present a novel architecture of a block in-terleaver in MB-OFDM systems based on Mixed Radix System (MRS). Weprove mathematically that the proposed architecture can support bit per-mutationsin the interleaving process. The hierarchical property of our pro-

    posed MRS-based design methodology allows the proposed architecture tosupport all the required data rates in the MB-OFDM systems with simplemodular design. Furthermore, the same design to be used for the inter-leaver can also be used for the operation of de-interleaving, which reducesthe implementation complexity significantly. The latency of our architec-ture is as low as 6 MB-OFDM symbols. In addition, when comparing ourproposed architecture with the conventional approach, we are able to re-duce the implementation complexity by 85.5%, 69.4%, and 40.3% for 80,200, and 480 Mb/s data rates, respectively, while improving our operating

    maximum clock frequency by more than 3.3 times over the conventionaldesign. We also show that the power consumption is reduced by 87.4%,

    73.6%, and 39.8% for 80, 200, and 480 Mb/s, respectively.

    Index TermsArray processor, block interleaving, MB-OFDM, MixedRadix System (MRS).

    I. INTRODUCTION

    MB-OFDM [1] has been widely used as one of the de facto standards

    for Ultra Wide Band (UWB) communication. MB-OFDM supports var-

    ious data rates with low power consumption asshown in TableI. Due to

    these performance requirements, the implementation of an MB-OFDM

    processor becomes difficult and challenging to developers.

    An interleaver reorders input bit sequence into a non-contiguous way

    in order to improve the robustness against burst errors in transmission

    [2][6]. The interleaver in MB-OFDM consists of three sequential sub-

    processes: symbol interleaving, tone interleaving and cyclic shift. Themathematical equations for the sub-processes are represented as the

    following [7]:

    a

    S

    [ i ] = a

    i

    N

    C B P S

    +

    6

    N

    T D S

    2 m o d ( i ; N

    C B P S

    ) (1)

    a

    T

    [ i ] = a

    S

    i

    N

    T i n t

    + 1 0 2 m o d ( i ; N

    T i n t

    ) (2)

    b [ i ] = a

    T

    [ m ( i ) 2 N

    C B P S

    + m o d ( i + m ( i ) 2 N

    c y c

    ; N

    C B P S

    ) ] (3)

    where i is an index for bit sequences with a range of 0 i

  • 8/3/2019 05191026kmg

    2/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1021

    1-bit memory cells and four 1-bit multiplexers. The hierarchical

    property of our MRS-based design methodology allows the proposed

    architecture to support all the required data rates in MB-OFDM sys-

    tems when we apply a simple modular design technique. Furthermore,

    the same design to be used for the interleaver can be also used for

    the operation of de-interleaving, which reduces the implementation

    complexity significantly. The performance analysis shows that our

    design achieves better performance than the conventional designs [8]in terms of latency, hardware complexity, power consumption, and

    maximum operating clock frequency.

    This paper is organized as follows. In Section II, we derive a math-

    ematical relationship between MRS and interleaving processes. Also,

    we describe an architecture of our interleaver approach with MRS. The

    implementation details of the architecture are shown in Section III and

    the performance is analyzed in Section IV. Finally, the conclusion is

    made in Section V.

    II. INTERLEAVING/DE-INTERLEAVING PROCESSOR FOR MB-OFDM

    A. Interleaving/De-Interleaving With MRS

    The i th element of the DNS sequence can be represented as theM R S ( p

    1

    ) matrix by the following:

    [ i ] = [ b i = p

    1

    c ; m o d ( i ; p

    1

    ) ] (7)

    where i in DNS is represented as h b i = p1

    c j m o d ( i ; p

    1

    ) i in M R S ( p1

    ) .

    Since i = b i = p1

    c 2 p

    1

    + m o d ( i ; p

    1

    ) , the matrix in M R S ( p1

    ) can be

    derived by the following:

    [ k ; j ] = [ k 2 p

    1

    + j ] (8)

    where0 j < p

    1

    ,0 k < M = p

    1

    , andM

    is the length of the DNS

    sequence .

    Assume that the matrix in M R S ( p1

    ) is transposed into a new ma-

    trix 0

    in M R S ( p0

    1

    ) , where p0

    1

    = M = p

    1

    , before being transformedback to its DNS representation

    0

    [ j ; k ] = [ k ; j ] : (9)

    From (7)(9), a new form of DNS sequence 0 [ i ] can be generated by

    the following inM R S ( p

    0

    1

    )

    :

    0

    [ i ] =

    0

    i = p

    0

    1

    ; m o d i ; p

    0

    1

    (10)

    where 0 [ j ; k ] is the digit-reversal MRS in a new radix p 01

    .

    Finally, we exploited the relationship between the new DNS with the

    original DNS from (8)(10) as the following:

    0

    [ i ] = m o d i ; p

    0

    1

    ; i = p

    0

    1

    = i = p

    0

    1

    + p

    1

    2 m o d i ; p

    0

    1

    : (11)

    It can be seen that the derived (11) is in the same form with (1) and

    (2), but different from (3). The same expression enables us to imple-

    ment the interleaving/de-interleaving permutations by transposing the

    matrix p 01

    2 p

    1

    in MRS with two moduli p1

    and p 01

    .

    In the following two sub-sections, we show how the MRS permuta-

    tions are mapped into our interleaver architecture for MB-OFDM.

    B. Architecture for MRS Modulo Permutation

    Fig. 1 shows an array processor consisting of M cells, which inter-

    leaves M bits through a modular operation with modulus p1

    . The array

    processor consists of a 2-D array with size ( ( M = p1

    ) 2 ( p

    1

    ) ) , and eachcell is connected with four adjacent cells: inputs from lower and right

    Fig. 1. Interleaving/de-interleaving processor in M R S ( p ) .

    adjacent cells, outputs to upper and left adjacent cells. Forexample, cell

    (1,1) receives its input from cell (1,2) and (2,1), and sends its output to

    cell (1,0) and (0,1). With this structure, the processor is able to transfer

    the incoming bits to both horizontal (from right to left) and vertical

    (from lower to upper) directions.

    In the case of interleaving withM R S ( p

    1

    )

    , the processor transfers

    the incoming data horizontally, along the solid line, from I Ne n c o d e

    to

    the left direction until heading to the final cell (0,0) at a left and upper

    corner. Each individual bit on X th position in the input stream is placed

    on a unique cell( a

    1

    ; a

    0

    )

    in the processor, whereX = a

    1

    2 p

    1

    +

    a

    0

    , as shown in (8). After the first bit of the input stream arrives at

    the final cell (0,0), by changing a flow direction from horizontal tovertical, bit positions in M R S ( p

    1

    ) are transposed into new positions

    in M R S ( M = p1

    ) as shown in (9). Finally, the processor produces the

    interleaved output fromO U T

    e n c o d e

    along the dotted line. It transforms

    bit positions in M R S ( M = p1

    ) into interleaved bit positions in DNS as

    shown in (10).

    De-interleaving process can be performed with the similar way using

    the same architecture, but the process starts fromI N

    d e c o d e

    vertically

    along the dotted line, and the de-interleaved bits areoutput horizontally

    along the solid line.

    C. Architecture for MB-OFDM

    As mentioned in Section I, the interleaving process in MB-OFDM

    system consists of three consecutive permutations, (1)(3). As shown

    in Fig. 2(a), the first two consecutive modulo permutations (1) and (2)

    are expressed as a series of the following three sub-processes: symbol

    interleaving in 1-Radix M R S ( p1

    ) for an M -cells block, division of

    the output from the symbol interleaving into p1

    sub-blocks having

    ( M = p

    1

    ) -cells, and tone interleaving in 1-Radix M R S ( p2

    ) for the each

    sub-block. Finally, the last permutation (3) is expressed as a process to

    cyclically shift the bit sequence in the each sub-block, interleaved by

    the previous two permutations, with Nc y c

    in Table I.

    Fig. 2(b) shows our architecture in 2-Radix M R S ( p2

    j p

    1

    ) for

    MB-OFDM system, which combines all the three sub-processes

    for the modulo permutations of (1) and (2) into a single process. It

    divides the whole block in DNS into p1

    sub-blocks in M R S ( p1

    ) ,differently colored, and divides each sub-block in M R S ( p

    1

    ) into

  • 8/3/2019 05191026kmg

    3/5

    1022 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

    Fig. 2. Interleaver architecture for MB-OFDM. (a) General representation forinterleaving processes. (b) Our interleaving processor in M R S ( p j p ) .

    p

    2

    sub-sub-blocks in M R S ( p2

    j p

    1

    ) at a time. Each sub-sub-block

    is represented as a vertical line in Fig. 2(b). All the operations are

    supported by only alternatively arranging the sub-sub-blocks with

    different colors and extending some wire connections. For example,

    the X th bit position in DNS will be transformed into h a2

    j a

    1

    j a

    0

    i in

    M R S ( p

    2

    j p

    1

    ) on the 2-D array, where X = ( a2

    2 p

    2

    + a

    1

    ) 2 p

    1

    + a

    0

    .

    In Fig. 2(b),h a

    2

    j a

    1

    j a

    0

    i

    is represented as cell( a

    2

    ; a

    1

    )

    witha

    0

    color.

    The processor performs interleaving/de-interleaving in a similar wayas the 1-Radix MRS processor in Fig. 1. In the interleaving process, the

    input bits from I Ne n c o d e

    are moved along the solid line until reaching

    to the final cell (0,0) with white color. After the first arrival to the final

    cell, the processor produces the interleaved output along the dotted line

    through O U Te n c o d e

    . At this time, all cells with the same color are

    processed to the end before the output is taken from another color. Ai

    is used to connect a final cell from one color with the first cell from

    another color. Also, the de-interleaving process starts fromI N

    d e c o d e

    along the dotted line, and the de-interleaved bits are produced through

    O U T

    d e c o d e

    along the solid line.

    Finally, in order to perform a cyclic shift shown in (3), we modified

    some wire connections and added some multiplexers on the processor.

    Fig. 3 shows the cyclic shift among three bit sequences, which are clas-

    sified by the result of a modulo operation ( X m o d p 1

    ) , where X is theposition of each bit in the original input bit sequence. By taking the

    Fig. 3. Design of the cyclic shift.

    TABLE IIPARAMETERS OF BLOCK INTERLEAVING FOR MB-OFDM

    bit sequences from the ( Nc y c

    2 0 ) th, ( Nc y c

    2 1 ) th, and ( Nc y c

    2 2 ) th

    positions, respectively, each of the bit sequences are cyclically shifted.The start point (s) of the second bit sequence is connected to the end

    point (e) of the first bit sequence, and the third start point is connected

    to the second end one. These connections complete the cyclic shift.

    III. HARDWARE IMPLEMENTATION

    Table II shows parameter values of the block interleaving for

    MB-OFDM system. The first column represents data rates supported

    by the system. The second column represents the block size for each

    data rate, which determines the number of cells in the array processor.

    The other columns show first ( p1

    ) and second ( p2

    ) moduli which are

    used for the two consecutive modular operations in symbol and tone

    interleaving processes.

    A. Modular Design for Various Data Rates

    For consecutive symbol and tone interleaving with modulip

    1

    andp

    2

    ,

    we employed the architecture in Fig. 2(b). However, in order to provide

    a modular design for easy implementation, we do not directly use the

    architecture in our real implementation.

    Fig. 4 shows the schematic diagram of the real implementation that

    supports all the data rates shown in Table II. The hardware consists of

    three parts: A , B , and C , and each part consists of several cell proces-

    sors, multiplexers, and wire connections. Each addition represents 300,

    600, and 1200 bit block size, respectively. The implementation is per-

    formed by duplicating part ( A ) into ( B ) and part ( A + B ) into ( C ) .

    The multiplexers, located between ( A ) and ( B ) , are used to configure

    the part ( A ) to be executed alone for the data rates under 80 Mb/s orthe combined part ( A + B ) to be executed for the other data rates under

  • 8/3/2019 05191026kmg

    4/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1023

    Fig. 4. Schematic diagram of an array processor to support 300, 600, and 1200bit block sizes for all data rates in Table II.

    Fig. 5. Schematic of each cell in Fig. 4.

    200 Mb/s. Instead of using M R S ( 1 0 j 6 ) , we combined two array pro-

    cessors( A + B )

    and( C )

    with 600 cells inM R S ( 1 0 j 3 )

    . Additionally,

    a controller is added to enable the two array processors to operate al-

    ternatively in every 3 bits, according to the first modulus ( p1

    ) , during

    receiving input bits at the data rates over 200 Mb/s. It then produces

    the interleaved bit sequence by concatenating the output streams from

    the two processors. Through this controller, our implementation pro-

    vides the same functionality with the array processor in M R S ( 1 0 j 6 ) .

    Finally, in order to support a cyclic shift shown in Fig. 3, the multi-

    plexers in solid-line circles are added, and some wire connections are

    changed.

    B. Cell Design for the Minimum Latency

    Fig. 5 shows a schematic of each cell in the array processor. Each

    cell consists of two 1-bit memory cell (in our implementation, we used

    a flip-flop) and four multiplexers. V_OUT and V_IN are used for the

    vertical movement of bit stream, while H_OUTand H_INare used for

    the horizontal movement. The role of flip-flops are changed alterna-

    tively depending on the control signal SELECT.

    Each cell processesthe interleaved bit sequence through one flip-flop

    for the time of six symbols and through the other flip-flop for the next

    six symbols.One flip-flop isusedto store a new input bit while the other

    flip-flop is used for the output. By using this approach, our architecture

    does not need any additional delay other than six MB-OFDM symbols,

    which is the minimum required latency for the interleaving processes.

    As shown in (1)(3), the interleaving algorithm is performed in the unit

    of six MB-OFDM symbols. Hence in order to produce the interleavedbit stream, at least six MB-OFDM symbols latency is required.

    TABLE IIIPERFORMANCE AND COMPLEXITY OF CONVENTIONAL AND

    PROPOSED ARCHITECTURES

    IV. PERFORMANCE ANALYSIS

    In order to study the performance advantage of the proposed ar-

    chitecture, we implemented the proposed architecture using Verilog

    HDL, and then synthesized it with Xilinx-ISE, targeting for XilinxVirtex-4 XC4VLX100-10FF1148 FPGA logic [17]. We used a con-

    ventional interleaver design [8] as a baseline architecture to compare

    the performance of our proposed architecture. The conventional ar-

    chitecture combined multiple permutations into pipelined processes in

    order to increase the throughput. The conventional architecture per-

    forms each permutation sequentially, and it uses embedded memory to

    keep the temporary results for each permutation process [9].

    Table III shows the performance of the proposed architecture and

    conventional architecture in terms of a maximum operating clock fre-

    quency, hardware complexity and a latency. Since our architecture re-

    sults in a different hardware implementation for different maximum

    data rates, we compare three of our hardware implementations for 80,

    200, and 480 Mb/s data rates (200 and 480 Mb/s are the mandatory

    and maximum data rates of MB-OFDM system, respectively) with the

    conventional architecture which supports all of the data rates in Table I

    using the same hardware. The proposed architecture reduces the hard-

    ware complexity by 85.5% for 80 Mb/s, 69.4% for 200 Mbps, and

    40.3% for 480 Mb/s data rates, while improving the maximum clock

    frequency allowed by more than 3.3 times. The maximum clock fre-

    quency of our architecture is about 500 MHz. In addition, our archi-

    tecture incurs six MB-OFDM symbols as a latency in all three pro-

    cesses of the interleaver, while the conventional architecture requires

    eight MB-OFDM symbols latency. This latency difference is due to the

    fact that the conventional architecture executes all the sub-processes

    sequentially, while our architecture performs them at the same time.

    Table IV shows a power consumption comparison between the conven-

    tional architecture and our proposed architecture. The operating clockfrequency is set to 132 MHz as in [1]. Power consumption was esti-

    mated by using Xilinx XPower tool. The inputs are assumed to toggle

    continuously in order to get the worst case estimation of the toggle rate

    at the circuit.

    The proposed architecture reduces power consumption of the con-

    ventional design by 54.5% in clock power, 88.3% in logic power, and

    88.1% in signal power for 80 Mb/s datarate. For 200 Mb/sdatarate, the

    proposed architectureonly consumes about 24.6% of logic powerin the

    conventional one. This is reasonable due to the fact that the proposed

    architecture only uses 30.6% of the logic elements used in the conven-

    tional one. Totally, the proposed architecture consumes only around

    26.4% of power consumption in the conventional one. Meanwhile, for

    480 Mb/s data rate, the proposed design consumes only 60.2% of the

    total power in the conventional one due to its 20.0% saving in logicpower consumption.

  • 8/3/2019 05191026kmg

    5/5

    1024 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

    TABLE IVPOWER CONSUMPTION OF CONVENTIONAL AND PROPOSED ARCHITECTURES

    V. CONCLUSION

    In this paper, a mathematical relationship between interleaving pro-

    cesses and MRS was derived. Based on the derivation, we proposed

    an array processor architecture to support interleaving processes effi-

    ciently. The performance analysis demonstrates the benefits of our pro-posed architecture in terms of latency, complexity, and power consump-

    tion over the conventional approach. The latency of our architecture is

    six MB-OFDM symbols, which is the minimal. Also, we reduced the

    complexity by 85.5% for 80 Mb/s and 69.4% for 200 Mb/s, while im-

    proving the maximum clock frequency allowed by about 3 times com-

    pared with a conventional approach. For 480 Mb/s, the complexity was

    also reduced 40.3% compared to the conventional one. In addition, we

    reduced the power consumption by 87.4%, 73.6%, and 39.8% for 80,

    200, and 480 Mb/s, respectively.

    REFERENCES

    [1] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak,Design of a multiband OFDM system for realistic UWB channel en-vironments, IEEE Trans. Microw. Theory Tech., vol. 52, no. 9, pp.21232138, Sep. 2004.

    [2] S. Lin and D. J. Costello, Error Control Coding: Fundamentals andApplications. Englewood Cliffs, NJ: Prentice-Hall, 1983.

    [3] J. L. Ramsey, Realization of optimum interleavers, IEEE Trans. Inf.Theory, vol. IT-16, no. 3, pp. 338344, May 1970.

    [4] G. D. Forney, Jr., Burst-correcting codes for the classic burstychannel, IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp.772780, Oct. 1971.

    [5] K. Andrews, C. Heegard, and D. Kozen, A theory of interleavers,

    presented at the IEEE Int. Symp. Inf. Theory, 1997, 97-1634.[6] R. Garello, G. Montorsi, andG. C. SergioBenedetto,Interleaver prop-erties and their applications to the trellis complexity analysis of turbocodes, IEEE Trans. Commun., vol. 49, no. 5, pp. 793807, May 2001.

    [7] WiMedia Alliance, MAC-PHY Interface Specification 1.0, 2005.[Online]. Available: http://www.wimedia.org

    [8] J. Kim, Interleaver & Deinterleaver for MB-OFDM, AdvancedSystem IC Technology Center (ASTEC), Jul. 2007. [Online]. Avail-able: http://www.astec.re.kr:8080/ipSoC/ipInfo.jsp?ipno=576&left-image=4

    [9] X. Jinsong, L. Xiaochun, W. Haitao, B. Yujing, Z. Decai, Z. Xiaolong,and W. Chaogang, Implementation of MB-OFDM transmitter base-band based on FPGA, in Proc.Int. Conf. Circuits Syst. Commun., May2008, pp. 5054.

    [10] E. Tell and D. Liu, A hardware architecture for a multi mode blockinterleaver, in Proc. Int. Conf. Circuits Syst. Commun., Jun. 2004.

    [11] D. F. Miller andW. S. McCormick, Anarithmetic free parallel mixed-

    radix conversion algorithm, IEEE Trans. Circuits Syst. II, vol. 45, no.1, pp. 158162, Jan. 1998.

    [12] B. G. Jo and M. H. Sunwoo, New continuous-flow mixed-radix(CFMR) FFT processor using novel in-place strategy, IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005.

    [13] J. M. Camara, M. Moreto, E. Vallejo, R. Beivide, J. Miguel-Alonso, C.Martinez,and J. Navaridas, Mixed-radix twisted torusinterconnectionnetworks, in Proc. Int. Par. Distr. Processing Symp., Mar. 2007, pp.110.

    [14] W. K. Jenkins and E. J. Altman, Self-checking properties of residuenumber error checkers based on mixed radix conversion, IEEE Trans.Circuits Syst., vol. 35, no. 2, pp. 159167, Feb. 1988.

    [15] P. V. A. Mohan and A. B. Premkumar, RNS-to-binary converters fortwo four-moduli sets 2 0 1 , 2 , 2 , 2 0 1 and 2 0 1 , 2 ,2 + 1 , 2 + 1 , IEEE Trans. Circuits Syst. I, Reg. Papers , vol. 54,no. 6, pp. 12451254, Jun. 2007.

    [16] M. Akkal and P. Siy, A new mixed radix conversion algorithm MRC-II, J. Syst. Arch., vol. 53, no. 9, pp. 577586, Sep. 2007.

    [17] Xilinx, San Jose, CA, Virtex-4 Multi-Platform FPGA, 2006. [On-line]. Available: http://www.xilinx.com/products