05191026kmg
TRANSCRIPT
-
8/3/2019 05191026kmg
1/5
1020 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010
A Novel Architecture for Block Interleaving Algorithm in
MB-OFDM Using Mixed Radix System
Youngsun Han, Peter Harliman, Seon Wook Kim,
Jong-Kook Kim, and Chulwoo Kim
AbstractIn this paper, we present a novel architecture of a block in-terleaver in MB-OFDM systems based on Mixed Radix System (MRS). Weprove mathematically that the proposed architecture can support bit per-mutationsin the interleaving process. The hierarchical property of our pro-
posed MRS-based design methodology allows the proposed architecture tosupport all the required data rates in the MB-OFDM systems with simplemodular design. Furthermore, the same design to be used for the inter-leaver can also be used for the operation of de-interleaving, which reducesthe implementation complexity significantly. The latency of our architec-ture is as low as 6 MB-OFDM symbols. In addition, when comparing ourproposed architecture with the conventional approach, we are able to re-duce the implementation complexity by 85.5%, 69.4%, and 40.3% for 80,200, and 480 Mb/s data rates, respectively, while improving our operating
maximum clock frequency by more than 3.3 times over the conventionaldesign. We also show that the power consumption is reduced by 87.4%,
73.6%, and 39.8% for 80, 200, and 480 Mb/s, respectively.
Index TermsArray processor, block interleaving, MB-OFDM, MixedRadix System (MRS).
I. INTRODUCTION
MB-OFDM [1] has been widely used as one of the de facto standards
for Ultra Wide Band (UWB) communication. MB-OFDM supports var-
ious data rates with low power consumption asshown in TableI. Due to
these performance requirements, the implementation of an MB-OFDM
processor becomes difficult and challenging to developers.
An interleaver reorders input bit sequence into a non-contiguous way
in order to improve the robustness against burst errors in transmission
[2][6]. The interleaver in MB-OFDM consists of three sequential sub-
processes: symbol interleaving, tone interleaving and cyclic shift. Themathematical equations for the sub-processes are represented as the
following [7]:
a
S
[ i ] = a
i
N
C B P S
+
6
N
T D S
2 m o d ( i ; N
C B P S
) (1)
a
T
[ i ] = a
S
i
N
T i n t
+ 1 0 2 m o d ( i ; N
T i n t
) (2)
b [ i ] = a
T
[ m ( i ) 2 N
C B P S
+ m o d ( i + m ( i ) 2 N
c y c
; N
C B P S
) ] (3)
where i is an index for bit sequences with a range of 0 i
-
8/3/2019 05191026kmg
2/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1021
1-bit memory cells and four 1-bit multiplexers. The hierarchical
property of our MRS-based design methodology allows the proposed
architecture to support all the required data rates in MB-OFDM sys-
tems when we apply a simple modular design technique. Furthermore,
the same design to be used for the interleaver can be also used for
the operation of de-interleaving, which reduces the implementation
complexity significantly. The performance analysis shows that our
design achieves better performance than the conventional designs [8]in terms of latency, hardware complexity, power consumption, and
maximum operating clock frequency.
This paper is organized as follows. In Section II, we derive a math-
ematical relationship between MRS and interleaving processes. Also,
we describe an architecture of our interleaver approach with MRS. The
implementation details of the architecture are shown in Section III and
the performance is analyzed in Section IV. Finally, the conclusion is
made in Section V.
II. INTERLEAVING/DE-INTERLEAVING PROCESSOR FOR MB-OFDM
A. Interleaving/De-Interleaving With MRS
The i th element of the DNS sequence can be represented as theM R S ( p
1
) matrix by the following:
[ i ] = [ b i = p
1
c ; m o d ( i ; p
1
) ] (7)
where i in DNS is represented as h b i = p1
c j m o d ( i ; p
1
) i in M R S ( p1
) .
Since i = b i = p1
c 2 p
1
+ m o d ( i ; p
1
) , the matrix in M R S ( p1
) can be
derived by the following:
[ k ; j ] = [ k 2 p
1
+ j ] (8)
where0 j < p
1
,0 k < M = p
1
, andM
is the length of the DNS
sequence .
Assume that the matrix in M R S ( p1
) is transposed into a new ma-
trix 0
in M R S ( p0
1
) , where p0
1
= M = p
1
, before being transformedback to its DNS representation
0
[ j ; k ] = [ k ; j ] : (9)
From (7)(9), a new form of DNS sequence 0 [ i ] can be generated by
the following inM R S ( p
0
1
)
:
0
[ i ] =
0
i = p
0
1
; m o d i ; p
0
1
(10)
where 0 [ j ; k ] is the digit-reversal MRS in a new radix p 01
.
Finally, we exploited the relationship between the new DNS with the
original DNS from (8)(10) as the following:
0
[ i ] = m o d i ; p
0
1
; i = p
0
1
= i = p
0
1
+ p
1
2 m o d i ; p
0
1
: (11)
It can be seen that the derived (11) is in the same form with (1) and
(2), but different from (3). The same expression enables us to imple-
ment the interleaving/de-interleaving permutations by transposing the
matrix p 01
2 p
1
in MRS with two moduli p1
and p 01
.
In the following two sub-sections, we show how the MRS permuta-
tions are mapped into our interleaver architecture for MB-OFDM.
B. Architecture for MRS Modulo Permutation
Fig. 1 shows an array processor consisting of M cells, which inter-
leaves M bits through a modular operation with modulus p1
. The array
processor consists of a 2-D array with size ( ( M = p1
) 2 ( p
1
) ) , and eachcell is connected with four adjacent cells: inputs from lower and right
Fig. 1. Interleaving/de-interleaving processor in M R S ( p ) .
adjacent cells, outputs to upper and left adjacent cells. Forexample, cell
(1,1) receives its input from cell (1,2) and (2,1), and sends its output to
cell (1,0) and (0,1). With this structure, the processor is able to transfer
the incoming bits to both horizontal (from right to left) and vertical
(from lower to upper) directions.
In the case of interleaving withM R S ( p
1
)
, the processor transfers
the incoming data horizontally, along the solid line, from I Ne n c o d e
to
the left direction until heading to the final cell (0,0) at a left and upper
corner. Each individual bit on X th position in the input stream is placed
on a unique cell( a
1
; a
0
)
in the processor, whereX = a
1
2 p
1
+
a
0
, as shown in (8). After the first bit of the input stream arrives at
the final cell (0,0), by changing a flow direction from horizontal tovertical, bit positions in M R S ( p
1
) are transposed into new positions
in M R S ( M = p1
) as shown in (9). Finally, the processor produces the
interleaved output fromO U T
e n c o d e
along the dotted line. It transforms
bit positions in M R S ( M = p1
) into interleaved bit positions in DNS as
shown in (10).
De-interleaving process can be performed with the similar way using
the same architecture, but the process starts fromI N
d e c o d e
vertically
along the dotted line, and the de-interleaved bits areoutput horizontally
along the solid line.
C. Architecture for MB-OFDM
As mentioned in Section I, the interleaving process in MB-OFDM
system consists of three consecutive permutations, (1)(3). As shown
in Fig. 2(a), the first two consecutive modulo permutations (1) and (2)
are expressed as a series of the following three sub-processes: symbol
interleaving in 1-Radix M R S ( p1
) for an M -cells block, division of
the output from the symbol interleaving into p1
sub-blocks having
( M = p
1
) -cells, and tone interleaving in 1-Radix M R S ( p2
) for the each
sub-block. Finally, the last permutation (3) is expressed as a process to
cyclically shift the bit sequence in the each sub-block, interleaved by
the previous two permutations, with Nc y c
in Table I.
Fig. 2(b) shows our architecture in 2-Radix M R S ( p2
j p
1
) for
MB-OFDM system, which combines all the three sub-processes
for the modulo permutations of (1) and (2) into a single process. It
divides the whole block in DNS into p1
sub-blocks in M R S ( p1
) ,differently colored, and divides each sub-block in M R S ( p
1
) into
-
8/3/2019 05191026kmg
3/5
1022 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010
Fig. 2. Interleaver architecture for MB-OFDM. (a) General representation forinterleaving processes. (b) Our interleaving processor in M R S ( p j p ) .
p
2
sub-sub-blocks in M R S ( p2
j p
1
) at a time. Each sub-sub-block
is represented as a vertical line in Fig. 2(b). All the operations are
supported by only alternatively arranging the sub-sub-blocks with
different colors and extending some wire connections. For example,
the X th bit position in DNS will be transformed into h a2
j a
1
j a
0
i in
M R S ( p
2
j p
1
) on the 2-D array, where X = ( a2
2 p
2
+ a
1
) 2 p
1
+ a
0
.
In Fig. 2(b),h a
2
j a
1
j a
0
i
is represented as cell( a
2
; a
1
)
witha
0
color.
The processor performs interleaving/de-interleaving in a similar wayas the 1-Radix MRS processor in Fig. 1. In the interleaving process, the
input bits from I Ne n c o d e
are moved along the solid line until reaching
to the final cell (0,0) with white color. After the first arrival to the final
cell, the processor produces the interleaved output along the dotted line
through O U Te n c o d e
. At this time, all cells with the same color are
processed to the end before the output is taken from another color. Ai
is used to connect a final cell from one color with the first cell from
another color. Also, the de-interleaving process starts fromI N
d e c o d e
along the dotted line, and the de-interleaved bits are produced through
O U T
d e c o d e
along the solid line.
Finally, in order to perform a cyclic shift shown in (3), we modified
some wire connections and added some multiplexers on the processor.
Fig. 3 shows the cyclic shift among three bit sequences, which are clas-
sified by the result of a modulo operation ( X m o d p 1
) , where X is theposition of each bit in the original input bit sequence. By taking the
Fig. 3. Design of the cyclic shift.
TABLE IIPARAMETERS OF BLOCK INTERLEAVING FOR MB-OFDM
bit sequences from the ( Nc y c
2 0 ) th, ( Nc y c
2 1 ) th, and ( Nc y c
2 2 ) th
positions, respectively, each of the bit sequences are cyclically shifted.The start point (s) of the second bit sequence is connected to the end
point (e) of the first bit sequence, and the third start point is connected
to the second end one. These connections complete the cyclic shift.
III. HARDWARE IMPLEMENTATION
Table II shows parameter values of the block interleaving for
MB-OFDM system. The first column represents data rates supported
by the system. The second column represents the block size for each
data rate, which determines the number of cells in the array processor.
The other columns show first ( p1
) and second ( p2
) moduli which are
used for the two consecutive modular operations in symbol and tone
interleaving processes.
A. Modular Design for Various Data Rates
For consecutive symbol and tone interleaving with modulip
1
andp
2
,
we employed the architecture in Fig. 2(b). However, in order to provide
a modular design for easy implementation, we do not directly use the
architecture in our real implementation.
Fig. 4 shows the schematic diagram of the real implementation that
supports all the data rates shown in Table II. The hardware consists of
three parts: A , B , and C , and each part consists of several cell proces-
sors, multiplexers, and wire connections. Each addition represents 300,
600, and 1200 bit block size, respectively. The implementation is per-
formed by duplicating part ( A ) into ( B ) and part ( A + B ) into ( C ) .
The multiplexers, located between ( A ) and ( B ) , are used to configure
the part ( A ) to be executed alone for the data rates under 80 Mb/s orthe combined part ( A + B ) to be executed for the other data rates under
-
8/3/2019 05191026kmg
4/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010 1023
Fig. 4. Schematic diagram of an array processor to support 300, 600, and 1200bit block sizes for all data rates in Table II.
Fig. 5. Schematic of each cell in Fig. 4.
200 Mb/s. Instead of using M R S ( 1 0 j 6 ) , we combined two array pro-
cessors( A + B )
and( C )
with 600 cells inM R S ( 1 0 j 3 )
. Additionally,
a controller is added to enable the two array processors to operate al-
ternatively in every 3 bits, according to the first modulus ( p1
) , during
receiving input bits at the data rates over 200 Mb/s. It then produces
the interleaved bit sequence by concatenating the output streams from
the two processors. Through this controller, our implementation pro-
vides the same functionality with the array processor in M R S ( 1 0 j 6 ) .
Finally, in order to support a cyclic shift shown in Fig. 3, the multi-
plexers in solid-line circles are added, and some wire connections are
changed.
B. Cell Design for the Minimum Latency
Fig. 5 shows a schematic of each cell in the array processor. Each
cell consists of two 1-bit memory cell (in our implementation, we used
a flip-flop) and four multiplexers. V_OUT and V_IN are used for the
vertical movement of bit stream, while H_OUTand H_INare used for
the horizontal movement. The role of flip-flops are changed alterna-
tively depending on the control signal SELECT.
Each cell processesthe interleaved bit sequence through one flip-flop
for the time of six symbols and through the other flip-flop for the next
six symbols.One flip-flop isusedto store a new input bit while the other
flip-flop is used for the output. By using this approach, our architecture
does not need any additional delay other than six MB-OFDM symbols,
which is the minimum required latency for the interleaving processes.
As shown in (1)(3), the interleaving algorithm is performed in the unit
of six MB-OFDM symbols. Hence in order to produce the interleavedbit stream, at least six MB-OFDM symbols latency is required.
TABLE IIIPERFORMANCE AND COMPLEXITY OF CONVENTIONAL AND
PROPOSED ARCHITECTURES
IV. PERFORMANCE ANALYSIS
In order to study the performance advantage of the proposed ar-
chitecture, we implemented the proposed architecture using Verilog
HDL, and then synthesized it with Xilinx-ISE, targeting for XilinxVirtex-4 XC4VLX100-10FF1148 FPGA logic [17]. We used a con-
ventional interleaver design [8] as a baseline architecture to compare
the performance of our proposed architecture. The conventional ar-
chitecture combined multiple permutations into pipelined processes in
order to increase the throughput. The conventional architecture per-
forms each permutation sequentially, and it uses embedded memory to
keep the temporary results for each permutation process [9].
Table III shows the performance of the proposed architecture and
conventional architecture in terms of a maximum operating clock fre-
quency, hardware complexity and a latency. Since our architecture re-
sults in a different hardware implementation for different maximum
data rates, we compare three of our hardware implementations for 80,
200, and 480 Mb/s data rates (200 and 480 Mb/s are the mandatory
and maximum data rates of MB-OFDM system, respectively) with the
conventional architecture which supports all of the data rates in Table I
using the same hardware. The proposed architecture reduces the hard-
ware complexity by 85.5% for 80 Mb/s, 69.4% for 200 Mbps, and
40.3% for 480 Mb/s data rates, while improving the maximum clock
frequency allowed by more than 3.3 times. The maximum clock fre-
quency of our architecture is about 500 MHz. In addition, our archi-
tecture incurs six MB-OFDM symbols as a latency in all three pro-
cesses of the interleaver, while the conventional architecture requires
eight MB-OFDM symbols latency. This latency difference is due to the
fact that the conventional architecture executes all the sub-processes
sequentially, while our architecture performs them at the same time.
Table IV shows a power consumption comparison between the conven-
tional architecture and our proposed architecture. The operating clockfrequency is set to 132 MHz as in [1]. Power consumption was esti-
mated by using Xilinx XPower tool. The inputs are assumed to toggle
continuously in order to get the worst case estimation of the toggle rate
at the circuit.
The proposed architecture reduces power consumption of the con-
ventional design by 54.5% in clock power, 88.3% in logic power, and
88.1% in signal power for 80 Mb/s datarate. For 200 Mb/sdatarate, the
proposed architectureonly consumes about 24.6% of logic powerin the
conventional one. This is reasonable due to the fact that the proposed
architecture only uses 30.6% of the logic elements used in the conven-
tional one. Totally, the proposed architecture consumes only around
26.4% of power consumption in the conventional one. Meanwhile, for
480 Mb/s data rate, the proposed design consumes only 60.2% of the
total power in the conventional one due to its 20.0% saving in logicpower consumption.
-
8/3/2019 05191026kmg
5/5
1024 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010
TABLE IVPOWER CONSUMPTION OF CONVENTIONAL AND PROPOSED ARCHITECTURES
V. CONCLUSION
In this paper, a mathematical relationship between interleaving pro-
cesses and MRS was derived. Based on the derivation, we proposed
an array processor architecture to support interleaving processes effi-
ciently. The performance analysis demonstrates the benefits of our pro-posed architecture in terms of latency, complexity, and power consump-
tion over the conventional approach. The latency of our architecture is
six MB-OFDM symbols, which is the minimal. Also, we reduced the
complexity by 85.5% for 80 Mb/s and 69.4% for 200 Mb/s, while im-
proving the maximum clock frequency allowed by about 3 times com-
pared with a conventional approach. For 480 Mb/s, the complexity was
also reduced 40.3% compared to the conventional one. In addition, we
reduced the power consumption by 87.4%, 73.6%, and 39.8% for 80,
200, and 480 Mb/s, respectively.
REFERENCES
[1] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak,Design of a multiband OFDM system for realistic UWB channel en-vironments, IEEE Trans. Microw. Theory Tech., vol. 52, no. 9, pp.21232138, Sep. 2004.
[2] S. Lin and D. J. Costello, Error Control Coding: Fundamentals andApplications. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[3] J. L. Ramsey, Realization of optimum interleavers, IEEE Trans. Inf.Theory, vol. IT-16, no. 3, pp. 338344, May 1970.
[4] G. D. Forney, Jr., Burst-correcting codes for the classic burstychannel, IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp.772780, Oct. 1971.
[5] K. Andrews, C. Heegard, and D. Kozen, A theory of interleavers,
presented at the IEEE Int. Symp. Inf. Theory, 1997, 97-1634.[6] R. Garello, G. Montorsi, andG. C. SergioBenedetto,Interleaver prop-erties and their applications to the trellis complexity analysis of turbocodes, IEEE Trans. Commun., vol. 49, no. 5, pp. 793807, May 2001.
[7] WiMedia Alliance, MAC-PHY Interface Specification 1.0, 2005.[Online]. Available: http://www.wimedia.org
[8] J. Kim, Interleaver & Deinterleaver for MB-OFDM, AdvancedSystem IC Technology Center (ASTEC), Jul. 2007. [Online]. Avail-able: http://www.astec.re.kr:8080/ipSoC/ipInfo.jsp?ipno=576&left-image=4
[9] X. Jinsong, L. Xiaochun, W. Haitao, B. Yujing, Z. Decai, Z. Xiaolong,and W. Chaogang, Implementation of MB-OFDM transmitter base-band based on FPGA, in Proc.Int. Conf. Circuits Syst. Commun., May2008, pp. 5054.
[10] E. Tell and D. Liu, A hardware architecture for a multi mode blockinterleaver, in Proc. Int. Conf. Circuits Syst. Commun., Jun. 2004.
[11] D. F. Miller andW. S. McCormick, Anarithmetic free parallel mixed-
radix conversion algorithm, IEEE Trans. Circuits Syst. II, vol. 45, no.1, pp. 158162, Jan. 1998.
[12] B. G. Jo and M. H. Sunwoo, New continuous-flow mixed-radix(CFMR) FFT processor using novel in-place strategy, IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005.
[13] J. M. Camara, M. Moreto, E. Vallejo, R. Beivide, J. Miguel-Alonso, C.Martinez,and J. Navaridas, Mixed-radix twisted torusinterconnectionnetworks, in Proc. Int. Par. Distr. Processing Symp., Mar. 2007, pp.110.
[14] W. K. Jenkins and E. J. Altman, Self-checking properties of residuenumber error checkers based on mixed radix conversion, IEEE Trans.Circuits Syst., vol. 35, no. 2, pp. 159167, Feb. 1988.
[15] P. V. A. Mohan and A. B. Premkumar, RNS-to-binary converters fortwo four-moduli sets 2 0 1 , 2 , 2 , 2 0 1 and 2 0 1 , 2 ,2 + 1 , 2 + 1 , IEEE Trans. Circuits Syst. I, Reg. Papers , vol. 54,no. 6, pp. 12451254, Jun. 2007.
[16] M. Akkal and P. Siy, A new mixed radix conversion algorithm MRC-II, J. Syst. Arch., vol. 53, no. 9, pp. 577586, Sep. 2007.
[17] Xilinx, San Jose, CA, Virtex-4 Multi-Platform FPGA, 2006. [On-line]. Available: http://www.xilinx.com/products