a novel multicore sdr architecture for smart vehicle systems-yao-hua2012

5
2012 12th International Conference on ITS Telecommunications A novel multicore SDR architecture for smart vehicle systems Yao-Hua Chen, Chia-Pin Chen, Pei-Wei Hsu, Chun- fan Wei, Wei-Min Cheng, Hsun-Lun Huang, Tai-Yuan Cheng Information and Counications Research Laboratories Industrial Technology Research Institute Taiwan, R.O.C. { YaoHuaChen, apple.chen, pwhsu, PatrickWei, wmcheng, cf, tychen } @itri.org.tw Abstct- A transceiver architecture with multi-core soſtware- defined radio (SDR) technology is proposed for the physical layer inner processing of IEEE S02.Up in intelligent transportation systems (ITS). By localizing the data transmissions between the adjacent digital signal processors (DSP), concatenate memories and concatenate buses are introduced to ease the bandwidth requirement for the data communication among multicores. The proposed transceiver architecture is verified by the electronic system-level (ESL) virtual platform with two application-specific instruction-set processors (ASIP). The high level power estimation results are also provided in this paper. To enhance of the channel estimation and equalization performance of IEEE S02.Up, the capability of the proposed architecture with the decision feedback algorithm is analyzed. Keywords-IEEE S02.Up; intelligent transportation systems (ITS); soſtware-defined radio (SDR); application-specific instruction-set processor (ASIP); electronic system-level (ESL) I. I NTRODUCTION IEEE 802.11p[5] is an approved amendment to the IEEE 8 02.11 standard to add wireless access in vehicular environments (WAVE) for supporting Intelligent Transportation Systems (ITS) applications. The physical layer spec is almost the same as that of IEEE 802.11a except that the channel bandwidth and the data rate of IEEE 802.11 p are only half of those of IEEE 802.11a. Intuitively, the transceiver for IEEE 8 02.11 a can be applied for IEEE 8 02.11 p. However, the design constraints of the receivers for IEEE 8 02.11 p and other IEEE 8 02.11 families are quite different. For those IEEE 8 02.11 families in low mobility applications, achieving data throughput and processing latency requirements are the main design issues. However, for IEEE 8 02.11 p, the channel estimation dominates the system performance due to the high mobility in the vehicular environments. Thus to achieve maximal hardware sharing, the Soſtware-Defined Radio (SDR) technology is proposed for implementation of 8 02.11 p and other IEEE 8 02.11 standard families in this paper. The SDR can offer significant advantages over the dedicated hardware designs for its high flexibility, short design cycle and even high performance when cooperating with hardware accelerators. By selecting the appropriate soſtware modules, different radio 978-1-4673-3070-1/12/$31.00 ©2012 IEEE 275 Albe y.P. Chen SVP/CTO Inventec Taiwan, R.O.C. [email protected] applications can co-exist in the same equipment. Furthermore, the system specs can be easily upgraded by loading the soſtware with updated programs. The SDR architectures can be classified into two categories: reconfigurable architectures and DSP-centered with accelerator-assisted architectures [2]. The second approach has high degree of flexibility and is capable of supporting multiple standards in mobile devices. To meet the throughput and latency requirements of high data rate applications, an application-specific insuction-set processor (ASIP) [4] is usually used to cover the coon operations for SDR. These specific instructions may include complex MAC, complex butterfly, etc[3]. Moreover, multi-core architectures are utilized for further improvement of the processing data throughput. Conventionally, the data transmissions among DSPs in a multicore system are through a shared bus with an arbitrator, a network with routers and switches or a cache with synchronization mechanisms. The transmitted data is usually stored in a shared memory hooked on the shared bus or the network and visible by all DSPs, as shown in Figure I. Due to the frequent accesses of the shared memory, the shared memory must have high bandwidth requirement and may become the performance bottleneck of the system. Moreover, complicated arbitration, routing design or synchronization mechanisms is required to avoid the data collision. In this paper, an ASIP for the baseband processing with instruction set simulator, disassemble and linker is designed. A multicore architecture with concatenate memories and concatenate buses is proposed to ease the bandwidth and throughput requirements of the shared bus and shared memory in the conventional multicore systems. An ESL virtual platform is built for functionality verification, power estimation and decision feedback analysis for the inner receiver processing of IEEE 802.11p. The paper is organized as follows. Sections II provides the proposed universal modem architecture. The processing criterions and the algorithms for the inner receiver of IEEE 8 02.11 p are introduced in Section III. Section IV describes the detailed implementations of ESL virtual platform for IEEE802.11 p and the simulation results. The conclusions are drawn in Section V.

Upload: phuc-hoang

Post on 12-Jan-2016

9 views

Category:

Documents


0 download

DESCRIPTION

A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

TRANSCRIPT

Page 1: A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

2012 12th International Conference on ITS Telecommunications

A novel multicore SDR architecture for smart vehicle

systems

Yao-Hua Chen, Chia-Pin Chen, Pei-Wei Hsu, Chun­fan Wei, Wei-Min Cheng, Hsun-Lun Huang, Tai-Yuan

Cheng

Information and Communications Research Laboratories Industrial Technology Research Institute

Taiwan, R.O.C. { YaoHuaChen, apple.chen, pwhsu, PatrickWei, wmcheng,

cf, tychen } @itri.org.tw

Abstract- A transceiver architecture with multi-core software­

defined radio (SDR) technology is proposed for the physical layer

inner processing of IEEE S02.Up in intelligent transportation

systems (ITS). By localizing the data transmissions between the

adjacent digital signal processors (DSP), concatenate memories

and concatenate buses are introduced to ease the bandwidth

requirement for the data communication among multicores. The

proposed transceiver architecture is verified by the electronic

system-level (ESL) virtual platform with two application-specific

instruction-set processors (ASIP). The high level power

estimation results are also provided in this paper. To enhance of

the channel estimation and equalization performance of IEEE

S02.Up, the capability of the proposed architecture with the

decision feedback algorithm is analyzed.

Keywords-IEEE S02.Up; intelligent transportation systems

(ITS); software-defined radio (SDR); application-specific

instruction-set processor (ASIP); electronic system-level (ESL)

I. INTRODUCTION

IEEE 802.11p[5] is an approved amendment to the IEEE 8 02.11 standard to add wireless access in vehicular environments (W AVE) for supporting Intelligent Transportation Systems (ITS) applications. The physical layer spec is almost the same as that of IEEE 802.11a except that the channel bandwidth and the data rate of IEEE 802.11 p are only half of those of IEEE 802.11a. Intuitively, the transceiver for IEEE 8 02.11 a can be applied for IEEE 802.11 p. However, the design constraints of the receivers for IEEE 8 02.11 p and other IEEE 8 02.11 families are quite different. For those IEEE 8 02.11 families in low mobility applications, achieving data throughput and processing latency requirements are the main design issues. However, for IEEE 802.11 p, the channel estimation dominates the system performance due to the high mobility in the vehicular environments. Thus to achieve maximal hardware sharing, the Software-Defined Radio (SDR) technology is proposed for implementation of 8 02.11 p and other IEEE 802.11 standard families in this paper. The SDR can offer significant advantages over the dedicated hardware designs for its high flexibility, short design cycle and even high performance when cooperating with hardware accelerators. By selecting the appropriate software modules, different radio

978-1-4673-3070-1/12/$31.00 ©2012 IEEE 275

Albert y.P. Chen

SVP/CTO Inventec

Taiwan, R.O.C. [email protected]

applications can co-exist in the same equipment. Furthermore, the system specs can be easily upgraded by loading the software with updated programs.

The SDR architectures can be classified into two categories: reconfigurable architectures and DSP-centered with accelerator-assisted architectures [2]. The second approach has high degree of flexibility and is capable of supporting multiple standards in mobile devices. To meet the throughput and latency requirements of high data rate applications, an application-specific instruction-set processor (ASIP) [4] is usually used to cover the common operations for SDR. These specific instructions may include complex MAC, complex butterfly, etc[3]. Moreover, multi-core architectures are utilized for further improvement of the processing data throughput. Conventionally, the data transmissions among DSPs in a multicore system are through a shared bus with an arbitrator, a network with routers and switches or a cache with synchronization mechanisms. The transmitted data is usually stored in a shared memory hooked on the shared bus or the network and visible by all DSPs, as shown in Figure I. Due to the frequent accesses of the shared memory, the shared memory must have high bandwidth requirement and may become the performance bottleneck of the system. Moreover, complicated arbitration, routing design or synchronization mechanisms is required to avoid the data collision.

In this paper, an ASIP for the baseband processing with instruction set simulator, disassemble and linker is designed. A multi core architecture with concatenate memories and concatenate buses is proposed to ease the bandwidth and throughput requirements of the shared bus and shared memory in the conventional multi core systems. An ESL virtual platform is built for functionality verification, power estimation and decision feedback analysis for the inner receiver processing of IEEE 8 02.11p. The paper is organized as follows. Sections II provides the proposed universal modem architecture. The processing criterions and the algorithms for the inner receiver of IEEE 802.11 p are introduced in Section III. Section IV describes the detailed implementations of ESL virtual platform for IEEE802.11 p and the simulation results. The conclusions are drawn in Section V.

Page 2: A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

DSP2 DSP3

./

......... Shared Me�ii�

••............... � Data flow without CC Bus

Figure 1. Conventional architecture

.. CCBus

Figure 2. Proposed architecture of universal modem

II. PROPOSED UNIVERSAL MODEM ARCHITECTURE

The baseband data processing can be partitioned into two categories: streaming-based processing and block based processing. The streaming-based processing performs symbol by symbol operations, whereas the block-based processing must wait for collection of a block data before starting to perform the necessary operations. The streaming-based processing includes modulation, demodulation, channel estimation, equalization etc., and the block based processing includes interleaving, deinterleaving, and channel decoding etc. Based on the processing partitions, a multi core SDR architecture is investigated for universal modem.

Figure 2. is an example of the proposed architecture. It comprises of DSPs, concatenate memories (CC Mem), shared memory, concatenate bus (CC Bus) and public bus. Accelerating coprocessors may also be included in the architecture for performance enhancement if necessary. The DSPs are configured to perform the software functions required by the target radio application. The CC bus connects DSPs, hardware accelerators, and concatenate memories serially. The public bus connects DSPs, hardware accelerators and the shared memory. The streaming-based processing is performed by the elements concatenated by the CC bus, and the data for block-based processing is transmitted from the DSPs or hardware accelerators on the CC Bus to the shared memory via the public bus. The block-based processing can be started once the block data in the shared memory is ready.

276

16+16=32 p.s

Long Preamble Short Preamble

�------�, � �,------------�,> <,�----�"�------------

Signal Detect, AGe, Diwrsity

Selection

Coarse Freq. Channel and Fine

Offset Estimation Frequency Offset

Timing Synchroni7.e Estimation

RATE LENGTH

DATA DATA

Figure 3. The physical layer frame structure of IEEE 802.11 p

Figure 4. The baseband signals processing of generic recievers for OFDM­modulated IEEE 802.11 families.

The concatenate memory, CC Mem ij, is only accessible by the hardware accelerators or DSPs at stage i and j on the CC Bus. By localizing the data transmissions between the adjacent DSPs or hardware accelerators on the CC Bus, the stream­based data transmission can be achieved by passing or exchanging data in CC Mems. The data for the block-based processing, broadcasting operations, feedback operations, or traveling between the non-adjacent elements on the CC Bus can be stored in the shared memories via the public bus. Since most operations in the inner transceiver processing are streaming-based, the bandwidth and throughput requirements of the shared bus and shared memory can be greatly reduced by the CC bus and CC Mems.

III. INNVER RECEIVER OF IEEE S02.11 P

The physical layer frame structure of IEEE S02.11 P is given in Figure 3 . . Each frame contains three fields: preamble, signal and data. The preamble field is used for signal detection, automatic gain control (AGC), timing synchronization and initial channel estimation etc. The signal field carries the information about the data field, such as the data length and data rate. The data field carries the baseband processed OFDM symbols of the user data.

A. The timing-related parameters of IEEE 802.1 Ip

For IEEES02.11 p, SO subcarriers (including the cyclic prefix) must be processed in the duration of one OFDM symbol, which is Sf.ls, by the streaming-based processing. Thus the throughput criterion is 10M subcarriers per second. The latency criteria for the baseband processing is determined by the parameter SIFS (short inter frame spacing), which is the small time interval between the data frame and its acknowledgement. For S02.l lp, the SIFS parameter is 32f.ls.

Page 3: A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

frequency Nt=1

+-+ 63

57

I

43

21

7

0 time

Figure 5. The pilot allocation of IEEE 802.llp

B. The baseband signal processing of the receiver

The baseband processing of the generic OFDM-based receiver for IEEE 802.11 standard families is shown in Figure 4 . . The processing can be divided into two parts: an inner and an outer part [ 6]. The inner part deals with the streaming-based processing, carrier synchronization, channel estimation and channel compensation. The outer part deals with the block­based processing, such as de-interleaving and error corrections. In this paper, the outer receiver is implemented by ASIC hardware accelerators due to the high computational capability requirement, and we focus on the ASIP design for the inner receiver.

C. The decision feedback equalization for 1EEE802.11 p

Since IEEE 802.11p is designed for outdoor and high­mobility environment, the channel coherence time may be smaller than a packet transmission time and the Doppler effects are obvious. Thus the pilot density is crucial for channel estimation. Figure 5. shows the pilot allocation for IEEE 8 02.11 p with grey squares representing pilot positions. Let Nt and Nt be the maximal spacing of adjacent pilots in the frequency domain and time domain respectively. From Figure 5. it can be seen that Nt=1 and Nt=14 for IEEE 8 02.l lp. To fulfill the sampling theorem for channel estimation [13], Nf and Nt must satisfy the following equations

Nt ::;Nt_min =NIL

1 N <N . = , r- I_uull 2*f,,*(l+L1N)

(1)

where N is the number of subcarriers in the OFDM symbol, L is the taps of delay spread and fd is the normalized Doppler frequency. For the case with vehicle speed 120kmlhr, we have Nernin = I I I and NCrnin = 10. Obviously, the pilot allocation in IEEE 8 02.11p cannot meet the channel estimation constraints in (1) when the vehicle speed is 120kmlhr. To enhance the channel estimation performance of the IEEE 8 02.11 P receiver, a decision feedback algorithm is utilized to increase the pilot density. The hard decision results of the modulated data symbols are used as the pseudo pilots for re-estimation of the channel equalizer. Figure 6. shows the block diagram of the decision feedback processing of IEEE 8 02.11 p inner receiver.

277

numerator

Figure 6. The decision fee dback processing of IEEE 802.11 P inner receiver

IV.

Figure 7. The virtual platform of IEEE 802.11 P receiver.

MULTlCORES SDR VIRTUAL PLATFORM FOR IEEE 8 02.l lp

In order to examine whether the proposed architecture meets the timing parameters of IEEE 802.11 p, the ESL virtual platform is introduced. In this paper, the instruction set architecture (ISA) of PACDSP [8] is selected as the initial reference. After analysis of the assembly codes of IEEE 8 02.11 p, it is found that the sine and cosine calculations for frequency offset compensation consume too many instructions and cycles. In order to meet the timing criterion of IEEE 8 02.11p, the hardware accelerator, CORDIC, is introduced to perform the sine, cosine and phase calculations.

A. The ASl?

Three types of application specific instructions are implemented in the ISA design of the ASIP. The first type contains complex vector instructions, such as complex vector multiplication, addition etc. The second type is for the FFT acceleration, which contains radix-2 and radix-4 complex butterflies and bit-reorder instructions. The last type is for speeding up the soft output calculation when demapping the QAM constellations [9], such as the instruction "subabs", which calculates X-ABS(Y) with two inputs X and Y.

To verify the proposed processor, the Language for Instruction­Set Architectures (LISA) [10] is applied for generating the instruction set simulator. LISA is a mixed behavioral/structural modeling language for programmable processor architectures with peripherals and interfaces. The commercial tool, Synopsys Processor Designer (PD) [11], is used to generate the tool chain (including assembler, linker, simulator and debugger) of

Page 4: A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

DSPl

DSP2 C" detection

LPoi LP1 1 5ignal dataO I datal datal I

Hl. DeOAM

Figure 8. Task partition of DSPI and DPS2 for the IEEE 802.llp inner receiver

the DSP. The assembly codes for each ASIP are verified on the ISS generated by PD before integrated to the multicore system.

B. The ESL virtual platform

To verify the functionality of the IEEE S02.11 p inner receiver, the ESL virtual platform based on the Synopsys Platform Architect (P A) [12] is built, as shown in Figure 7 . . The virtual platform is in a heterogeneous multi-core architecture, and composed of an ARM92 6 processor, two DSPs, two CC Mems, two CORDICs, two CC Buses, input memory (InMem), output memory (OutMem) and the shared memory (Shared Memory). Figure S. shows the task partition of DSPI and DPS2.

The InMem and OutMem in Figure 7. store the received data from digital front end and the operation results of the inner receiver, respectively. The common information and the data for the block based operations are stored in the Shared Memory. The ARM® processor, built from a commercial IP on Synopsys Platform Architecture, is only used to initialize and trigger the two DSPs in this virtual platform. The functional SystemC models of the ASIPs generated by Synopsys Processor Designer are used for the two DSPs. The hardware accelerator, CORDIC, is modeled with SystemC in transaction-level modeling (TLM). The CCBus and the interfaces of the hardware in the virtual platforms are modeled complying with the TLM 2.0 standard, and all memories are modeled as storage arrays with SystemC. The simulation results on the ESL virtual platform shows that the public bus is activated only when the ARM processor initializes the two DSPs, or the DSPs access the outer receiver processed MCS (modulated coding scheme). Thus the bandwidth of the shared bus and shared memory can be greatly reduced compared with the conventional multi core architecture.

C Simulation Results

Since the bandwidth of IEEE S02.lIp is 10M subcarriers per second, the clock frequency of the universal modem system is selected as 240MHz, which is 24 times of the throughput criteria. TABLE I. shows the simulation results for the system using the generic baseband processing of the receivers for OFDM-modulated IEEE S02.11 families in Figure 4 . . It can be seen that both the throughput and latency criterion are met. Moreover, at least 40% timing margin of DSPI is left and can be used for decision feedback algorithm to enhance the channel estimation of IEEES02.11 p.

278

D. High level power estimation

In order to estimate the power consumption of the proposed architecture, an instruction based power analysis is applied.

TABLE!. SIMULATION RESULTS OF THE PROPOSED SYSTEM

us Cycle count Throughput criteria 8 1920

(per OFDM symbol)

SIFS 32 7680

Latency cri teria (16 us 16 3840

for outer receiver)

DSPI averaged usage 5 1203

DSP2 averaged usage 4.6 1100

Universal modem total 9.6 2303

latency

TABLE n. INSTRUCTION BASED POWER ANALYSIS

Type Instructions Cycle Approximated counts power(mW)

Type I Complex vector

Cl PI = 26 multiplication

Type 2 Hardware

C2 P2 = 15.38 accelerator

Type 3 Memory access C3 P3 = 6.5

Type 4 Arithematic C4 P4 = 7.26

Type 5 Others C5 P5 = 0.244

The instructions are classified into S types according to the functionary, as shown in TABLE II. The total cycle counts for the simulation is C = CI +C2+C3+C4+C5. The total consumed energy is E = (CI *P l + C2 *P2 + C3 *P3+ C4 *P4 + CS *PS), and the average power P = E/C In this paper, the power approximation is derived from the TSMC CLN90G90nm specs. For Type I and Type 4 instructions, the power data is estimated by the power of 1 6 and 4 1 6 *1 6 multipliers respectively. The power data for Type 2 and Type 3 instructions is approximated by the active power of the CCMem, which is a 92S *32 single port SRAM, plus 5 and I 1 6 *1 6 multipliers respectively. For Type S instructions, no arithmetic operations are involved. Thus the power of Type S instructions is estimated by the idle power of the CCMem. Figure 9. shows the instruction statistics of the two DSPs in the proposed system according to the instruction classification in TABLE II . . Figure 10. shows the high level power analysis results of the proposed architecture. The average active powers for DSPI and DSP2 are 4. 6mW and 7. 7mW respectively. Thus the total active power consumption

of the proposed system is about 123. 6mW for IEEE S02.lIp inner receiver processing.

E. Decision Feedback Analysis

According to the simulation results in TABLE I. , at least 40% timing margin of DSPI can be used for the decision feedback algorithm to enhance the channel estimation performance of IEEES02.lIp. TABLE III. shows the cycle count analysis when applying the decision feedback algorithm in Figure 6. . It can be seen that for the data modulated by

Page 5: A Novel Multicore SDR Architecture for Smart Vehicle Systems-yao-hua2012

1 6QAM, 45 extra cycles per subcarrier are needed for performing decision feedback operations. Thus 15 subcarries can be used as pseudo pilots for channel estimation enhancement. Similarly, for the data modulated by 64QAM, 12 subcarries can be used as pseudo pilots. If the all the pseudo

DSPl Instruction Statistics DSP2 Instruction Statistics

Co m p l ex

5%

Memory Complex

vector i i 1%

accelerator 1%

Figure 9. The instruction statistics for IEEE 802.llp inner receiver

DSPl Power Analysis DSP2 Power Analysis

""etor mll i t i pliu tioo

'"

Comple� y(!cto r mlAtipliC<ltion

5%

Others 3%

Figure 10. The active power analysis for IEEE 802.1 I P inner receiver

pilots are distributed evenly, we have Nf = 4. Thus the pilot allocation criterion to fulfill the sampling theorem of channel estimation can be satisfied.

V. CONCLUSIONS

In this paper, a multi-core SDR architecture targeted on IEEE 802.l lp standard in the ITS system is proposed. Due to the increasing requirements of shared buses bandwidth on the SDR platform with multicore systems, the platform with concatenate buses and concatenate memories is proposed to avoid the bandwidth bottleneck. The proposed transceiver architecture is verified by the electronic system-level (ESL) virtual platform with two application-specific instruction-set processors (ASIP). According to the simulation results, both the throughput and latency requirements are satisfied. The high level power estimation results are also provided in this paper. To enhance of the channel estimation and equalization performance of 8 02.11 p, the capability of the proposed architecture with the decision feedback algorithm is analyzed

REFERENCES

[1] T. Ulversoy, "Software defined radio: challenges and opportunities," IEEE Communications Surveys & Tutorials, vol. 12, no. 4, pp. 531-550, 2010.

[2] [2] U. Ramacher, "Software-detined radio prospects for multistandard mobile phones," IEEE Comupter, vol. 40, no. 10, pp. 62-69, 2007.

[3] A. Nilsson, E. Tell, D. Liu, "An 11 mm2 , 70 mW Fully Programmable Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12 !lm CMOS," IEEE Journal of Solid-State Circuits, vol. 44 , no. 1, pp. 90-97, 2009.

[4] [4] G. Xuan, "Hierarchical design of an application-specific instruction set processor for high-throughput and scalable FFT

279

processing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 3, pp. 551-563, 2012.

[5] IEEE P802.11p: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specitications: Amendment 6: Wireless Access in Vehicular Environments, IEEE Std. 802.11 p-20l O.

TABLE Ill. DECISION FEEDBACK CYCLE COUNT ANALYSIS

Cycle counts of Cycle counts of Function 16QAM 64QAM

modulated data modulated data Phase

2 2 compensation

Hard decision 20 32

Re-estimate 4 4

equlizer

Update equlaizer 19 19

Totol cycle counts 45 57

per subcarrier

[6] M. Sandell and O. Edfors, "A comparative study of pilot-based channel estimators for wireless OFDM," Sep. 1996. (http://http://www.sm.luth.se/csee/sp/research/reportlsae96r.pdf)

[7] M. Speth, S.A. Fechtel, G. Fock, H. Meyr,"Optimum receiver design for wireless broad-band systems using OFDM -part I," IEEE Transactions on Communications, vol. 47 , no. 11, pp. 1668-1677,1999 .

[8] T. Vogt, N. Wehn,"A recontigurable ASIP for convolutional and Turbo decoding in an SDR environment," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 10, pp.l309-1320, 2008.

[9] C.-N. Liu, "Optimization techniques of AAC decoder on PACDSP VLlW processor," IEEE International Symposium on Circuits and Systems, pp.l468-l47l, May 2008.

[10] F. Tosato, P. Bisaglia,"Simplitied soft-output demapper for binary interleaved COFDM with application to HlPERLAN/2," IEEE International Conference on Communications, vol. 2, pp. 664-668, Aug. 2002.

[11] U. Meyer-Baese, G. Botella, S. Mookherjee, E. Castillo, A. Garcia,"Energy Optimization of Application-Specific Instruction-Set Processors by using Hardware Accelerators in Semicustom ICs Technology," Microprocessors and Microsystems, vol. 36, no. 2, pp. 127-137, 2012.

[12] Synopsys Inc., Synopsys Processor Designer, http://www.synopsys.comlSystems/BlockDesigniprocessorDev/Pages/de fault.aspx

[13] Synopsys Inc., Synopsys Platform Architect, http://www.synopsys.comlSystems/ ArchitectureDesign/pages/PlatformA rchitect.aspx