cognitive radio baseband processing on a reconfigurable platform

14
Physical Communication 2 (2009) 33–46 Contents lists available at ScienceDirect Physical Communication journal homepage: www.elsevier.com/locate/phycom Full length article Cognitive Radio baseband processing on a reconfigurable platform Q. Zhang * , A.B.J. Kokkeler, G.J.M. Smit, K.H.G. Walters Department of Electrical Engineering, Mathematics and Computer Science, P.O.Box 217, 7500 AE, Enschede, The Netherlands article info Keywords: Cognitive Radio Reconfigurable platform Montium abstract Cognitive Radio is considered as a promising technology to address the paradox of spectrum scarcity and spectrum under-utilization. It has to operate in different bands under various data rates and combat adverse channel conditions. Therefore, Cognitive Radio needs an adaptive physical layer which has to be supported by a reconfigurable baseband processing platform. In this paper, we propose an Multiprocessor System-on-Chip (MPSoC) platform to fulfill the requirements of reconfigurability, speed and energy efficiency. The key element on this platform is the Montium tile processor. To support adaptive DSP algorithms for Cognitive Radio on this platform is our main interest. This paper discusses several key algorithms for Cognitive Radio baseband processing and the implementation on the Montium based reconfigurable platform. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Multimedia wireless applications have been increas- ing rapidly in recent years and this trend will continue in the future. The large demand for radio spectrum will leave no room to accommodate new wireless applications. However, recent studies have shown that most of the as- signed radio spectrum is under-utilized. Cognitive Radio [1] is considered as a promising technology to address the paradox of spectrum scarcity and spectrum under- utilization. In Cognitive Radio, a spectrum sensing process locates the unused spectrum segments in a targeted spec- trum pool. These segments will be used optimally without harmful interference to licensed users (users who have the legal license for the spectrum). This technology is called spectrum pooling [2]. Cognitive Radio has to operate in different bands under various data rates and combat adver- sary channel conditions. Therefore, Cognitive Radio needs an adaptive physical layer which has to be supported by a reconfigurable baseband processing platform. This plat- form should offer high performance, reconfigurability and * Corresponding address: University of Twente, Department of Electri- cal Engineering, Mathematics and Computer Science, Zilverling 4047, P.O. BoX 217, 7500 Enschede, AE, Netherlands. Tel.: +31 4893788. E-mail address: [email protected] (Q. Zhang). energy efficiency. To address these challenges, we propose a reconfigurable multiprocessor architecture for the base- band processing of Cognitive Radio. We investigate several key algorithms for Cognitive Radio baseband processing and implement them into the proposed platform. Our work is part of the Adaptive Ad-hoc Freeband (AAF) project [3]. The aim of the AAF project is to design a Cognitive Radio based wireless ad-hoc network for emergency situations. The large amounts of multimedia data in the emergency networks during disasters require a large amount of radio resources. Therefore to alleviate this spectrum shortage problem, a radio which dynamically accesses free spec- trum resources turns out to be an interesting solution. We focus on the design of the physical layer, which has a profound impact on the feasibility of communication pro- cesses at higher layers. The physical layer of a Cognitive Radio node considered in this paper includes the data transmission and signaling physical layer and the spectrum sensing physical layer. Multicarrier based techniques are considered for the data transmission of AAF radio nodes. The idea of spectrum pooling can be applied to such a mul- ticarrier system by switching on/off subcarriers. Some requirements, application scenarios and system parame- ters for an OFDM based AAF node are considered in [4]. The physical layer sensing focuses on signal estimation and detection problem. We mainly consider two detection 1874-4907/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.phycom.2009.02.008

Upload: q-zhang

Post on 11-Sep-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cognitive Radio baseband processing on a reconfigurable platform

Physical Communication 2 (2009) 33–46

Contents lists available at ScienceDirect

Physical Communication

journal homepage: www.elsevier.com/locate/phycom

Full length article

Cognitive Radio baseband processing on a reconfigurable platformQ. Zhang ∗, A.B.J. Kokkeler, G.J.M. Smit, K.H.G. WaltersDepartment of Electrical Engineering, Mathematics and Computer Science, P.O.Box 217, 7500 AE, Enschede, The Netherlands

a r t i c l e i n f o

Keywords:Cognitive RadioReconfigurable platformMontium

a b s t r a c t

Cognitive Radio is considered as a promising technology to address the paradoxof spectrumscarcity and spectrum under-utilization. It has to operate in different bands under variousdata rates and combat adverse channel conditions. Therefore, Cognitive Radio needs anadaptive physical layerwhich has to be supported by a reconfigurable baseband processingplatform. In this paper,wepropose anMultiprocessor System-on-Chip (MPSoC) platform tofulfill the requirements of reconfigurability, speed and energy efficiency. The key elementon this platform is the Montium tile processor. To support adaptive DSP algorithmsfor Cognitive Radio on this platform is our main interest. This paper discusses severalkey algorithms for Cognitive Radio baseband processing and the implementation on theMontium based reconfigurable platform.

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

Multimedia wireless applications have been increas-ing rapidly in recent years and this trend will continuein the future. The large demand for radio spectrum willleave no room to accommodate newwireless applications.However, recent studies have shown that most of the as-signed radio spectrum is under-utilized. Cognitive Radio[1] is considered as a promising technology to addressthe paradox of spectrum scarcity and spectrum under-utilization. In Cognitive Radio, a spectrum sensing processlocates the unused spectrum segments in a targeted spec-trum pool. These segments will be used optimally withoutharmful interference to licensed users (users who have thelegal license for the spectrum). This technology is calledspectrum pooling [2]. Cognitive Radio has to operate indifferent bands under various data rates and combat adver-sary channel conditions. Therefore, Cognitive Radio needsan adaptive physical layer which has to be supported bya reconfigurable baseband processing platform. This plat-form should offer high performance, reconfigurability and

∗ Corresponding address: University of Twente, Department of Electri-cal Engineering, Mathematics and Computer Science, Zilverling 4047, P.O.BoX 217, 7500 Enschede, AE, Netherlands. Tel.: +31 4893788.E-mail address: [email protected] (Q. Zhang).

1874-4907/$ – see front matter© 2009 Elsevier B.V. All rights reserved.doi:10.1016/j.phycom.2009.02.008

energy efficiency. To address these challenges, we proposea reconfigurable multiprocessor architecture for the base-band processing of Cognitive Radio. We investigate severalkey algorithms for Cognitive Radio baseband processingand implement them into the proposed platform. Ourworkis part of the Adaptive Ad-hoc Freeband (AAF) project [3].The aim of the AAF project is to design a Cognitive Radiobased wireless ad-hoc network for emergency situations.The large amounts of multimedia data in the emergencynetworks during disasters require a large amount of radioresources. Therefore to alleviate this spectrum shortageproblem, a radio which dynamically accesses free spec-trum resources turns out to be an interesting solution.We focus on the design of the physical layer, which has aprofound impact on the feasibility of communication pro-cesses at higher layers. The physical layer of a CognitiveRadio node considered in this paper includes the datatransmission and signaling physical layer and the spectrumsensing physical layer. Multicarrier based techniques areconsidered for the data transmission of AAF radio nodes.The idea of spectrum pooling can be applied to such amul-ticarrier system by switching on/off subcarriers. Somerequirements, application scenarios and system parame-ters for an OFDMbased AAF node are considered in [4]. Thephysical layer sensing focuses on signal estimation anddetection problem. We mainly consider two detection

Page 2: Cognitive Radio baseband processing on a reconfigurable platform

34 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

methods: energy detection and feature detection. Energydetection is based on the signal energy distribution in thefrequency domain. Feature detection exploits signal fea-tures for detection.The paper is organized as follows. Section 2 will

introduce our proposed multiprocessor architecture forCognitive Radio and an example System-on-Chip (SoC).In Section 3, we will discuss several interesting basebandalgorithms for Cognitive Radio and how these algorithmsare mapped onto the proposed architecture. We presenteda prototype demonstration platform in Section 4. Finally,the paper is concluded in Section 5.

2. A reconfigurable multiprocessor architecture forcognitive radio

Current General Purpose Processor (GPP) and DigitalSignal Processors (DSP) based Software Define Radio(SDR) platforms are inadequate for future high data ratewireless communications in terms of throughput andenergy efficiency for battery-operated terminals. With theadvance of semiconductor technology, wireless basebandprocessors are moving toward Multiprocessor System-on-Chips (MPSoCs) which integrate heterogeneous processingelements tailored for different processing tasks.

2.1. A template tile MPSoC architecture

As already foreseen byMitola [1], Cognitive Radio is thefinal point of software-defined radio platform evolution: afully reconfigurable radio that changes its communicationfunctions depending on network and/or user demands.It is interesting to consider using MPSoCs (also knownas multi-core architectures) to support Cognitive Radiobaseband processing in the future. Therefore, we proposea tiledMPSoC (Fig. 1) architecture which includes differentinterconnected heterogeneous tile processors: bit-levelreconfigurable tiles (e.g. embedded FPGAs), word-levelreconfigurable cores (e.g. Domain Specific ReconfigurableHardware (DSRH)), general-purpose programmable cores(e.g. DSPs and microprocessor cores) and memory blocks.The reason for heterogeneity is that typically, somealgorithms runmore efficiently on bit-level reconfigurablearchitectures (e.g. PN-code generation), some on DSP-likearchitectures and some perform optimal on word-levelreconfigurable platforms (e.g. FIR filters or FFT algorithms).Application designers or high-level compilers can choosethemost efficient processing core for the type of processingneeded for a given application task. The reason forheterogeneity is that typically, some algorithms run moreefficiently on fine-grained reconfigurable architectures(e.g. PN-code generation), some perform optimally oncoarse-grained reconfigurable architectures (e.g. FIR filtersor FFT algorithms). The general purpose core is wellsuited for control oriented tasks. Application designersor high-level compilers can choose the most efficientprocessing core for the type of processing needed for agiven application task.

Fig. 1. Heterogeneous multiprocessor tile SoC.

A multi-core architecture has a number of advantages:

• It is a future-proof architecture, as the processing coresdo not grow in complexity with technology. Instead,as technology scales, the number of cores on the chipsimply grows.• Amulti-core organization can contribute to the energy-efficiency of a SoC. The best energy savings can beobtained by simply switching off cores that are notused, which also helps for reducing the static powerconsumption. Furthermore, the processing of local datain small autonomous cores abides by the localityof reference principle. Moreover, a core processormight not need to run at full clock speed to achievethe required Quality-of-Service (QoS) at a particularmoment in time.• When one of the cores is discovered to be defective(either due to a manufacturing fault or discoveredat operating-time by built-in-diagnosis) this defectivecore can be switched-off and isolated from the rest ofthe design.• A multi-core approach also eases verification of anintegrated circuit design, since the design of identicalcores only has to be verified once. The design of a singlecore is relatively simple and therefore a lot of effort canbe put in (area/power) optimizations on the physicallevel of integrated circuit design.• The computational power of a multi-core architecturescales linearlywith the number of cores. Themore coresthere are on a chip, themore computations can be donein parallel (providing that the network capacity scaleswith the number of cores and there is sufficient work).• Although cores operate together in a complex system,an individual tile operates quite autonomously. Ina multi-core architecture every processing core isconfigured independently. In fact, a core is a naturalunit of partial reconfiguration. Unused cores can beconfigured for new tasks, while at the same time othercores continue performing their tasks. That is to say,a multi-core architecture can be reconfigured partiallyand dynamically.

A heterogenous multi-core architecture combinesperformance, flexibility and energy-efficiency. It supportshigh performance through massive parallelism, it matchesthe computational model of the algorithm with the gran-ularity and capabilities of the processing entity, it can

Page 3: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 35

Fig. 2. Block diagram of the Annabelle chip.

operate at minimum supply voltage and clock frequencyand hence provides energy-efficiency and flexibility at theright granularity only when and where needed anddesirable.Dynamic reconfiguration is another important feature

of the multiprocessor architecture since Cognitive Radiorequires the system to adapt dynamically to a changingenvironment in real time. Reconfigurability also has theeconomic motivation: it will be important to have a fasttrack from sparkling ideas to the final design. Time tomarket is crucial. If the design process takes too long,the return on investment will be less. The combinationof high-level design tools and reconfigurable hardwarearchitectures will enable designers to develop highlyflexible, efficient and adaptive systems and applications forfuture systems such as Cognitive Radio.A multi-core architecture has to be supported by a pre-

dictable inter-core communication network e.g. Network-on-Chip (NoC). A NoC that routes data items has a higherbandwidth than an on-chip bus, as it supports multipleconcurrent communications. The well-controlled electri-cal parameters of an on-chip interconnection networkenable the use of high-performance circuits that resultin significantly lower power dissipation, higher propaga-tion velocity and higher bandwidth than is possible withconventional circuits.

2.2. Case study: Annabelle SoC

Based on the template architecture in Fig. 1, a SoCcalled Annabelle has been developed in our group andwas processed with ATMEL’s proprietary process (130 nmprocess).In the Annabelle SoC (see Fig. 2) a conventional ARM926

architecture is complemented by ASIC blocks (for exampleViterbi and DDC) and four domain specific coarse-grainreconfigurable Montium cores [5]. The key issue in thedesign of future SoC platforms for streaming applicationsis to find a good balance between flexibility and highprocessing power on one side, and area and energy-efficiency of the implementation on the other side. Thekey component on the Annabelle SoC is the Montiumprocessor developed by the University of Twente andrecently commercialized by the Recore Systems [6].

Fig. 3. The Montium tile processor.

2.2.1. MontiumThe Montium targets the 16-bit DSP algorithm domain.

A single Montium core is depicted in Fig. 3. At firstglance the Montium architecture bears a resemblance toa Very Long Instruction Word (VLIW) processor. However,the control structure of the Montium is optimized tominimize the control overhead which is imperativefor energy efficiency. The lower part of Fig. 3 showsthe Communication and Configuration Unit (CCU) andthe upper part shows the reconfigurable Tile Processor(TP). The CCU implements the interface for off-tilecommunication. The off-tile interface depends on theinterconnect technology that is used in the SoC.The TP is the computing part that can be configured to

implement a particular algorithm. The five identical ALUs(ALU1. . .ALU5) in a tile can exploit spatial concurrencyto enhance performance. The data path of the ALUs hasa width of 16-bits and the ALUs support both signedinteger and signed fixed-point arithmetic. The parallelismdemands a very high memory bandwidth, which isobtained by having 10 local memories (M01. . .M10) inparallel. The local memories imply a good locality ofreference. A relatively simple sequencer controls the entiretile processor. The sequencer selects configurable tileinstructions that are stored in the decoders (see Fig. 3).Each local SRAM is 16-bit wide and has a depth of 1024

positions,which adds up to a storage capacity of 16Kbit per

Page 4: Cognitive Radio baseband processing on a reconfigurable platform

36 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

local memory. A reconfigurable address generation unit(AGU) accompanies each memory. The AGU can generatethe most frequently used address patterns, but whenneeded also an ALU can generate address patterns. It isalso possible to use the memory as a lookup table forcomplicated functions that cannot be calculated usingan ALU, such as sine or division (with one constant). Amemory can be used for both integer and fixed-pointlookups.Each one of four 16-bit inputs to an ALU has a private

input register file that can store up to four operands. Theinput register file cannot be bypassed, i.e. an operand isalways read from an input register. Input registers canbe written by various sources via a flexible interconnect.Two 16-bit outputs from an ALU are connected to theinterconnect. Neighboring ALUs can also communicatedirectly: the West-output of an ALU connects to the East-input of the ALU neighboring on the left.

2.2.2. CCU network interface and predictable network-on-chipThe CCU is an interface developed to serve the

Montium [7]. It provides the functionality to configure theMontium, to manage the memories by means of directmemory access (DMA) and to control the computation ofthe configured algorithm. Both streaming-mode andblock-mode communication are supported by the CCU. On theAnnabelle SoC, the ARM926 processor serves as the controlprocessor by sending configuration messages.A predictable circuit-switched Network-on-Chip (NoC)

[8] interconnects the four Montium cores. Circuit switch-ing has been chosen as it simplifies the network interface,because it does not have to embed the data in a specificnetwork protocol and does not have to include the routinginformation. The connections in the NoC, i.e. the routers’configuration, are controlled via the AMBA bridge. Therouters’ control interfaces are included in thememorymapof the bridge.

2.2.3. Implementation resultsIn the end of 2007, we received the first samples

of the chip. An ASIC synthesis of the Annabelle wasperformed using ATMELs proprietary 130 nm processtechnology. Results obtained with the synthesis show thatthe Montium has a small footprint with only 3.5 mm2and it is very energy efficient for its algorithm domain [9].Table 1 gives the static power consumption. Table 2provides the dynamic power consumption in mW/MHzof various Montium blocks for FFT-512 (denoting 512point FFT). Finally, the energy consumption of severalDSP algorithms on the Montium and on the ARM926 arecompared on Annabelle in Table 3.

3. Cognitive radio basebandprocessing on theMontiumbased reconfigurable platform

The baseband processing of Cognitive Radio includesDSP algorithms for spectrum sensing and digital trans-mission. Since Cognitive Radio is highly adaptive, thesealgorithms are often adaptive and dynamic, but also com-putationally complex. Complexity and adaptivity bring a

Table 1Static power consumption of one Montium on Annabelle.

Module Static power [mW]

Datapath 0.09Memories 0.06Sequencer 0.01Decoders 0.03CCU 0.01Total 0.20

Table 2Dynamic power consumption of one Montium on Annabelle.

Mondule FFT-512 [mW/MHz]

Datapath 0.24Memories 0.27Sequencer 0.07Decoders 0.0CCU 0.02Total 0.60

Table 3Energy comparison Montium/ARM926.

Montium ARM926Algorithm [µJ] [µJ]

FIR-5 0.243 –FFT-512 1.563 30FFT-1920 5.054 168

big challenge to the baseband processing platform. Withthe combination of energy efficiency and reconfigurability,the Montium is a good candidate architecture to supportadaptive baseband processing of Cognitive Radio. Since thedevelopment of our Montium based MPSoC platform,mapping DSP algorithms onto the Montium has been acontinuous effort [10,11]. In particular, we have inves-tigated several popular wireless standards and mappedsome of the key baseband processing algorithms onto theMontium based platform in our previous work [11]. In thecontext of Cognitive Radio, we have considered several keyalgorithms for physical layer transmission and spectrumsensing. These algorithms include: a novel sparse FFT forOFDM based Cognitive Radio and spectrum sensing; filterbank multicarrier transmission; cyclostationary featuredetection. In this section, we will introduce these alg-orithms and emphasize their importance for CognitiveRadio. The implementation results of these algorithms onthe Montium will be presented.

3.1. A novel sparse FFT for OFDM based Cognitive Radio andmulti-resolution sensing

In the context of Cognitive Radio, we proposed anovel reconfigurable sparse FFT [12] for OFDM basedspectrum pooling system. This sparse FFT can reducethe computation of an OFDM system where a largenumber of subcarriers are switched off. Later we proposeda multi-resolution energy based spectrum sensing [13]scheme for Cognitive Radio based on this sparse FFT.The scheme enable Cognitive Radio to focus on a smallpart of the interested bands with finer resolutions at lowcomputational cost.

Page 5: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 37

3.1.1. OFDM based Cognitive RadioIn spectrum pooling [2], an OFDM based Cognitive

Radio load zeros onto the subcarriers which causeinterference to licensed users and loadmodulated symbolson others. Adaptive bit loading and power loading canbe applied to optimally use the remaining spectrum.The bit loading and power loading are determined bythe spectrum occupancy information and the channelestimation. The bit loading information is disseminatedvia a common control channel and is assumed to beconstant during the channel coherence time. Based onthis OFDM based Cognitive Radio system, we made animportant observation: there could be a large number ofzero inputs/outputs for the IFFT/FFT when a large partof the spectrum is not available to Cognitive Radio orthere are many bad channels. In this case, the normalradix-2 IFFT/FFT will be inefficient due to the wastedoperations on zeros. In mathematical terms, arrays ormatrices where most of the elements have the samevalue (called the default value, usually 0) and only afew elements have a non-default value are sparse. It isbeneficial and often necessary to take advantage of thesparse structure algorithmically to reduce the operationsof the standard algorithms. From a system point of view,the FFT and IFFT are the most computationally intensivepart of OFDM transceivers. Therefore, an efficient FFT isthe key to reduce the system complexity of OFDM basedCognitive Radio.The basic idea is to divide a large FFT into smaller size

FFTs, which offers the opportunity to reduce computation.We explain the algorithm in detail here. The DFT isdefined as:

X(k) =N−1∑n=0

x(n)W nkN k = 0, 1, . . . ,N − 1, (1)

where W nkN = e−j2πnkN . We consider the case where L

outputs are nonzero. Let N be factorized as two integers N1and N2, so N = N1N2. To facilitate the implementation, wemake the size of FFT N a power-of-two integer. We chooseN1 as the nearest power-of-two integer larger than L andas a factor of N , denoted as N1 = dLepow2. Therefore, N,N1and N2 are all power-of-two integers. The index n can bewritten as:n = N2n1 + n2 (2)n1 = 0, 1, . . . ,N1 − 1 n2 = 0, 1, . . . ,N2 − 1.Substituting n in (1) with (2), the DFT can be rewritten as:

X(k) =N2−1∑n2=0

N1−1∑n1=0

x(N2n1 + n2)W(N2n1+n2)kN

=

N2−1∑n2=0

[N1−1∑n1=0

x(N2n1 + n2)WN2n1kN

]W n2kN . (3)

We define:

Xn2(〈k〉N1) =N1−1∑n1=0

x(N2n1 + n2)Wn1kN1

=

N1−1∑n1=0

xn2(n1)Wn1kN1

(4)

where 〈〉N1 denotes modulo N1. So (3) can be written as:

X(k) =N2−1∑n2=0

Xn2(〈k〉N1)Wn2kN . (5)

The original N-point DFT with L nonzero outputs isdecomposed into two major parts: N2 times an N1-pointDFT in (4) which can be implemented as N2 times anN1-point FFT and the multiplications with twiddle factorsand recombinations of the multiplications in (5). Becausethe index k only consists of L nonzero values, only Ltwiddle factors are multiplied with each Xn2(〈k〉N1) forn2 = 1, 2, . . . ,N2. This multiplication part results in acomputation reduction.The number of complex multiplications in a N point

sparse FFT with L nonzero outputs equals:

Mulsparse = (N2 − 1) ∗ L+N2log2 N1. (6)

From Eq. (6), we can see the algorithm is more efficient forsmall L. The complexity increases with the number of non-zeros L and reaches the break-even point when L = N

2 .Since we set the constraints N1 = dLepow2 and N = N1N2,the sparse FFT will be calculated as a normal N point FFT incase L > N

2 . These constraints are important modificationsto the algorithm in [14] in the sense that the constraintsfacilitate hardware implementations by exploiting power-of-two integers. In [14], the discussions are restricted tothe case when only the first L consecutive values arenon-zeros. Our proposed algorithm can be applied to anynon-zero distributions.Assuming the size of the FFT equals N = 2048, we

compared the complexity of the sparse FFT and the radix-2 FFT in Fig. 4. The sparse FFT has less complexity thanthe radix-2 FFT when the non-zero ratio L

N < 0.5 and itbecomes evenmore efficient with the decrease of the non-zero ratio. Clearly it is more beneficial to use the sparse FFTif a large number of subcarriers is deactivated. Therefore,a dynamically reconfigurable FFT module is needed toswitch from a normal radix-2 FFT to a sparse FFT andvice versa based on the number of deactivated subcarriers.Moreover, this algorithm can be applied to an OFDMAsystem such as the Cognitive Radio based 802.22 standard,where multiuser data is across several non-contiguousbands.

3.1.2. Multi-resolution spectrum sensingThe topic of multi-resolution for Cognitive Radio has

been treated in [15,16]. Although different methods havebeen applied in their papers, the basic idea is the same. Thetotal bandwidth is first sensed using a coarse resolution.Fine resolution sensing is performed on a portion of theinteresting bands for Cognitive Radio. In such a way,Cognitive Radio avoids sensing the whole band at themaximum frequency resolution. Therefore, the sensingtime is reduced and power of unnecessary computationshas been saved.Let us consider a typical energy based multi-resolution

sensing case for a single antenna Cognitive Radio receiverin Fig. 5. Suppose the receiver can digitize the total targeted

Page 6: Cognitive Radio baseband processing on a reconfigurable platform

38 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

Fig. 4. Complexity comparison of the sparse FFT.

Fig. 5. An multi-resolution sensing example.

bandwidth of Btot . The frequency resolution equals fr =BtotK , where K is the size of the FFT which produces thespectrum. A coarse resolution sensing is done by using asmaller K1 size FFT with a resolution fr1 = Btot

K1. The energy

on each FFT bin Ei for i = 0, 1, . . . K1 is compared with athreshold th1. We define Per as the percentage of the totalnumber of bins where the energy is larger than th1. If thispercentage is larger than a limit p, Per > p, we assumethe total band is too crowded to accommodate CognitiveRadio. If no bins have been found with significant energy(no i where Ei > th1), namely Per = 0, we assume theband is empty. In these two conditions, fine resolutionsensing is not needed and Cognitive Radio will either startcommunication or wait for a licensed user to free thespectrum. Otherwise, Cognitive Radio will continue withfine resolution sensing with a resolution of fr2 to focus onthose high energy bands (like the colored portion in Fig. 5)where licensed users are potentially active. However, thespecific method to select the interested bands is notconsidered in our discussion. The interesting portion ofthe spectrum is actually a part of the output bins of alarger K2 size FFT, where K2 = Btot

fr2. Based on the result of

fine resolution sensing, Cognitive Radio will determine thetransmission scheme andwait for the next sensing cycle. Aflowchart describing the multi-resolution sensing schemeis shown in Fig. 6. The total cost ofmulti-resolution sensing

Fig. 6. Flowchart of multi-resolution sensing.

can be expressed as:

Ctot ={Ccoarse + Cfine if 0 < Per < pCcoarse others,

(7)

where C denotes costs.A similar observation can be made: only a portion of

the larger FFT outputs is needed for fine resolution sensing.Therefore, the sparse FFT proposed for OFDM basedCognitive Radio can be reused for this multi-resolutionspectrum sensing scheme. By switching froma small radix-2 FFT (coarse sensing) to a large sparse FFT (fine sensing),a small part of the interested bands can be produced withfiner resolution at low computational cost. We show thebenefit of the algorithm in a concrete example. First, coarseresolution sensing is done by FFT-128 with a resolutionof Btot128 . If Cognitive Radio finds 5% of the total bandwidthneeds fine resolution sensing, the system is reconfigured tothe sparse FFT-2048 with the resolution of Btot2048 (16 timesfiner) to focus only on interesting bands. In this case, thetotal cost of multi-resolution sensing is the complexityof the sparse FFT-2048 with 0.05 non-zero ratio plus thecomplexity of the radix-2 FFT-128, in total about 9000complex multiplications. Comparing this with the fixedresolution sensing by the radix-2 FFT-2048, it gives about20% saving. However, this saving will diminish with anincreasing percentage of bandwidth that requires fineresolution sensing. The break-even point in our exampleis about 25% of the total bandwidth that requires fineresolution sensing. Beyond this point, the complexity ofthe radix-2 FFT-128 and sparse FFT-2048 will exceed thecomplexity of the radix-2 FFT-2048. Clearly, the conditionto apply the proposed multi-resolution sensing method isthat only a small fraction of total bandwidth requires fineresolution sensing.

3.1.3. Dynamically reconfigurable implementation on theMontiumThe core of the proposed algorithm is a reconfigurable

FFT which has to constantly switch from radix-2 FFT to

Page 7: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 39

Fig. 7. Computational structure of the sparse FFT.

the sparse FFT and vice versa. The reconfiguration hasto be done in real-time with minimum overhead. Thecomputational structure of the sparse FFT is presentedin Fig. 7. The sparse FFT decomposes the DFT into twomajor parts: N2 blocks of N1-point radix-2 FFTs and themultiplications with twiddle factors together with therecombination of the multiplications. We can find theregularity in the memory addressing: a constant hop fromthe same position in one block to another (e.g. from x0(k)to x1(k)). The normal radix-2 FFT is reusable in the first partof the sparse FFT. These interesting properties of the sparseFFT can be exploited to make efficient and reconfigurablehardware solutions.We made a dynamically reconfigurable implementa-

tion on the Montium. It can switch between differentradix-2 FFTs and also between radix-2 FFT and sparse FFT.The basic idea to make radix-2 FFTs reconfigurable is toreuse the computation structure and the twiddle factors oflarger FFTs for smaller FFTs. During the initialization stage,we load the configuration of the largest FFT into thesequencer. Switching to small FFTs only involves adaptinga small part of the configuration memory by run-timereconfiguration. The same trick is done for switching backto large FFTs. Each entry in the configuration memorydefines an instruction and contains a control part whichimplements a sequencing statemachine.When reconfigur-ing to the sparse FFT, the radix-2 FFT configuration can bere-used for each memory block in the first part (see Fig. 7).TheAGU for eachmemory of theMontiumcan generate therequired block addressing pattern aswementioned earlier.In the second part where multiplications with twiddlefactors and recombination are done, we use the 5th ALU tocalculate the indices to bemultiplied andused as amemoryaddress for AGU. However, this address generation costs 3extra clock cycles for each nonzero value, which is theefficiency bottleneck of the sparse FFT on the Montium.Due to the limited configuration space of the current

Montium, the largest FFT in our implementation is 512.However, large FFTs can be supported by simply extendingthe configuration space. Table 4 shows the costs of thereconfiguration for both the radix-2 FFT and the sparseFFT (less than 64 non-zeros) in terms of bytes. When

Fig. 8. The performance of the sparse FFT vs radix-2 FFT for FFT-512 onthe Montium.

using reconfigurable code, a large configuration is sentto the Montium at initialization and changing FFT sizeinvolves small pieces of reconfigurable code. When usingstatic code, changing FFT size requires a complete reloadof configurations. Fig. 8 shows the number of clock cyclesof the sparse FFT vs. the radix-2 FFT for 512 samples onthe Montium as a function of the number of nonzeros.Due to 3 extra clock cycles per non-zero value for addressgeneration, the final sparse FFT implementation on theMontium turns out not as efficient as the theoreticalprediction. However, it is stillmore efficient than the radix-2 FFT in a case of large number of non-zero values (e.g. 480out of 512). An additional benefit of the sparse FFT isthat only the non-zero outputs are sent through the on-chip communication architecture. Therefore it results inlower communication costs which is considered to be thebottleneck for smaller feature sized SoCs in the future.

3.2. Filter bank multicarrier for Cognitive Radio

Due to the rectangular window in the time domain, anOFDM systemhas large sidelobeswhich cause interferenceto adjacent bands. This fact has also been recognizedin [2]. They proposed two methods to mitigate theinterference to the licensed user: deactivating moresubcarriers adjacent to the licensed system or applyingnon-rectangular windows to reduce the spectrum leakage.Both methods mitigate the interference at the cost ofbandwidth efficiency. Moreover, the two methods did notconsider the system implementation issues. Therefore, theindication is that other multicarrier schemes could beinteresting candidates for Cognitive Radio. This fact hasalso been observed in a recent publication [17]. In [18], wepresented an oversampled filter bank multicarrier systemfor Cognitive Radio based on the generalized filter bankimplementation. The same idea has been applied to veryhigh-speed digital subscriber line technology to achievehigh-level spectral containment in subchannels [19].

Page 8: Cognitive Radio baseband processing on a reconfigurable platform

40 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

Table 4Bytes that need to be sent for reconfiguration.

Reconfigurable code Static code

Initialization 1792 0to 512-point 104 1138to 256-point 82 1054to 128-point 78 970to 64-point 80 886to 512-sparse(<64 nonzeros) 78 –Total 2214 4048

3.2.1. An oversampled filter bank multicarrier for CognitiveRadioThe basic idea of multicarrier transmission is to divide

a broadband channel into parallel subchannels and thehigh-rate data stream is split into low-rate streams andtransmitted on each subchannel. This transmission schemecan be modeled as a filter bank system [20]. At thetransmitter, M complex symbols are upsampled by afactor of N and filtered by a base band prototype filter.The output of each M symbol stream will be properlyshifted in frequency and added for transmission. Thereceiver demodulates the signal by a matched filter anddownsampling by a factor of N . The transmitter and thereceiver are in fact an M band synthesis bank and ananalysis filter respectively. When critical sampling applies,N = M and the prototype filter is selected as a sinc shapedfilter in frequency, the multicarrier system becomes anOFDM system. When N > M , the filter bank system iscalled an oversampled filter bank (OSFB). The oversamplingwill increase intercarrier spacing by a factor of NM (N > M).In this way intercarrier interference (the overlapping partbetween two subcarriers) is largely reduced, which is thebasic idea of FMT [19]. One could argue that the increasedintercarrier spacing results in less subcarriers in a givenbandwidth, thus losing the bandwidth efficiency. However,compared with OFDM systems where extra cyclic prefixalways has to be introduced, OSFB is no worse than OFDMin terms of bandwidth efficiency.The simplified OSFB based multicarrier Cognitive Radio

system is shown in Fig. 9. The subcarriers causing inter-ference to licensed users are deactivated. The deactivationcan be realized by loading zeros on the intended subcarri-ers while others are loaded with modulated complexsymbols at the transmitter which is an M band oversam-pled synthesis filter bank. AnM band oversampled analysisfilter bank on the receiver reconstructs the signal and onlysends symbols from active subcarriers for demodulation.The deactivation information is sent to both the trans-mitter and the receiver through a control channel. Theadaptive bit loading for OFDM based Cognitive Radio mayalso be applied to the oversampled filter bank system.The implementation of the OSFB is not as straightfor-

ward as the critically sampled filter bank. The authorsin [19] indicate implementing periodically time-varyingfilters in the OSFB. However, this is difficult in practice.Therefore, we suggest an efficient implementation basedon the generalized DFT filter bank (GDFT) model in [21].The transmitted analog signal sa can be expressed as:

sa(t) =M−1∑m=0

+∞∑n=−∞

xm(nT )hs,m,a(t − nT ), (8)

Fig. 9. An OSFB multicarrier system for Cognitive Radio.

where T denotes the symbol duration, xm(nT ) is the symbolon mth subcarrier at nth instance, hs,m,a(t) is the analogsynthesis prototype filter on the mth subband. Since thesymbol rate is 1

T , the sampling rate should beNT . For

each sampling instance k, the discrete signal s(k) can bewritten as:

s(k) =M−1∑m=0

+∞∑n=0

xm(n)hs,m(k− nN) =+∞∑n=0

sn(k− Nn), (9)

where xm(n) = xm(nT ), hs,m(k) denotes the digitizedsynthesis filter and only symbol instances for n ≥ 0 areconsidered. We define signal sn(k) at each instance n as:

sn(k) =M−1∑m=0

xm(n)hs,m(k). (10)

The subband filter hs,m(k) is derived from a real valuedprototype filter p(k) by modulation as:

hs,m(k) = p(k−

L− 12

)ej2π

(m−M−12

)(k− L−12

)/M, (11)

where L is the filter length, M−12 is set as carrier frequencyand the delay of L−12 is introduced tomake a causal system.Thus, hs,m(k) = 0 for k < 0 or k ≥ L. From Eq. (10), we cansee that sn(k) is the summation of the multiplication of Mband symbols with L filter coefficients. Thus the length ofsn(k) is L. We can write Eq. (10) in a matrix form as:

sn = HTs xn, (12)

where xn is the symbol vector and Hs is anM × Lmatrix:

Hs =[p(k−

L− 12

)ej2π

(m−M−12

)(k− L−12

)/M]M×L

. (13)

Page 9: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 41

Amatrixmultiplication ofHTs and xn canbedone to producesignal sn, however it costsM × L complex multiplications.To reduce the computational complexity, we reconstructHs from the M × M generalized DFT matrix T [21] and adiagonal matrixΛp where the diagonal holds L coefficientsas:

Hs = T × [IM (−1)M−1IM ]

×[I2M I2M . . . I2M I2M,u] ×Λp (14)

IM and I2M denote theM×M and2M×2M identitymatricesrespectively and I2M,u is the first u column submatrix of I2M ,where u is L modulo 2M . The generalized DFT matrix T isexpressed as:

T = Λ1W ∗MΛ2, (15)

where WM denotes an M point DFT matrix. Λ1 and Λ2are diagonal matrices where the ith diagonal elementsfor Λ1 and Λ2 are e−jπ(i−

M−12 )(L−1)/M and e−jπ i(M−1)/M

respectively. From Eqs. (12), (14) and (15), we have:

sn = Λp × [I2M I2M . . . I2M I2M,u]T

×[IM (−1)M−1IM ]T ×Λ2W ∗MΛ1xn. (16)

Similarly at the receiver, the recovered symbol xn can bewritten in matrix form:

xn = HarTn , (17)

where xn denotes the symbols from M bands and rn is thereceived signal with length L. In order to satisfy the perfectreconstruction condition, the analysis filter matrix Ha =H∗s [20]. The recovered symbol xn can be expressed as:

xn = Λ∗1WMΛ∗

2 × [IM (−1)M−1IM ]

×[I2M I2M . . . I2M I2M,u] ×Λprn. (18)

The generalized DFT implementation is based onEqs. (16) and (18), where the filter coefficient matricesconsist of periodically varying GDFT matrices. Unlike theimplementation in [19] where the coefficients are timevarying, we can incorporate the periodicity into the filterinputs. Figs. 10 and 11 show the implementations of theGDFT filter bank transmitter and receiver respectively.At the transmitter, M symbols are first transformed

by T , which can be implemented as 2M phase shifts ofcomplex number and an M point IFFT (here we considerthat M is a power-of-two integer) based on Eq. (15). TheM transformed symbols Xi (i = 0, 1, . . . ,M − 1) are usedto make sequence X2M = [Xi=0,1,...,M−1 − Xi=0,1,...,M−1].By repeating sequence X2MbL/Mc (bc denotes integerdivision) times and appending the first Lmod2M elementsin X2M at the end, an L-element sequence is produced tobe multiplied with L filter coefficients. The multiplicationresults are accumulated to a length L shift register Dwhich is set to be zeros at the initialization. After theaccumulation, the first N samples in D are shifted out astransmitted symbols and all other samples are shifted Npositions ahead with N zeros shifted in.At the receiver, L received symbols in a shift register

are multiplied with L filter coefficients. The imod2M (i = 0,1, . . . , L − 1)multiplication results are combined to form

Fig. 10. The GDFT filter bank transmitter implementation.

Fig. 11. The GDFT filter bank receiver implementation.

a 2M sequence R. Then the second half of R is negated andcombined with the first half to produce M symbols to betransformed by T ∗ which is the conjugate of T . After thetransform, M recovered symbols are obtained and N newsymbols will be shifted in.Based on the implementation, we made a computa-

tional complexity analysis by counting the number of com-plex multiplications. To transmit and receive M symbols,weneed2L complexmultiplicationswith filter coefficients,4M for the phase shift, anM point IFFT and anM point FFT.The total computational complexity of the OSFB COSFB canbe expressed as:

COSFB = 2L+ 4M + 2×(M2log2M

). (19)

The computational complexity of OFDM (COFDM ) is:

COFDM = 2×(M2log2M

). (20)

From (19) and (20), the OSFB is more computationallycomplex than OFDM due to the extra filtering. Especially

Page 10: Cognitive Radio baseband processing on a reconfigurable platform

42 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

Fig. 12. BER performance on AWGN.

Fig. 13. Transmitted spectrum with null subcarriers.

if the length of prototype filter L is large, the compu-tational complexity increases enormously. This complex-ity raises a question on how to make efficient hardwareimplementations.We consider a OSFB based Cognitive Radio with

32 subcarriers where 8 are deactivated to avoid theinterference to licensed users. The oversampling ratio ischosen as N = 36. The group delay of the prototypefilter K = 9 and the filter length L = 649. The BERperformance is compared with the OFDM with the samenumber of subcarriers and deactivation pattern in Fig. 12.Fig. 13 shows the transmitted spectrum for both systems.The BER performance of the OSFB is slightly better thanthe OFDM due to its less intercarrier interference. Thesideband power rejection of the OSFB is much superiorthan the OFDM.There is growing interest in the application of filter

banks in Cognitive Radio. A recent publication [22]indicates that the analysis filter bank can be used as aspectrum analyzer for Cognitive Radio. The performanceis comparable to one of the state-of-the-art spectrumsensing methods. Moreover, the author also suggests the

Table 5The number of cycles for the GDFT synthesis filter bank on the Montium.

Task #cycles

GDFT 156Filter MAC 651Total 807

possibility of combining filter bank based sensing with afilter bank multicarrier system could largely reduce thecomputational cost and enhance the hardware reusability.

3.2.2. Implementation on the MontiumThe computational complexity of the filter bank

multicarrier approach is much higher than the OFDMsolution. Since the Montium is targeted for such computa-tionally complex algorithms,we analyze how the proposedscheme can be mapped onto the Montium based platform.The mapping is based on the GDFT based implementationmodel. From Figs. 10 and 11, both the synthesis filter bankon the transmitter side and the analysis filter bank on thereceiver side consist of two major computation tasks: thegeneralized DFT transform and the multiply-accumulate(MAC) operationswith the filter coefficients. The two tasksis mapped onto two Montiums connected by the NoC.The difference between generalizedDFT and the normal

DFT is that the GDFT introduces the phase shifts. Thephase shifts can be implemented as the multiplicationswith two complex vectors. The GDFT transform can bemapped onto one Montium processor by combining thephase shifts with an existing FFT/IFFT module. The phaseshift vectors and intermediate results can be stored in away that a minimum of clock cycles are wasted for thememory pipeline. The Montium can perform one complexmultiplication with 4 ALUs in one clock cycle. Thus, thephase shifts costs 2M clock cycles. An M point FFT/IFFTmodule,withM a power-of-two, costs (M2 +2) log2M clockcycles on the Montium [5]. The computation of a GDFTcosts 2M + (M2 + 2) log2M clock cycles in total on theMontium. Since theMontium offers plenty of computationand storage resources (5 ALUs and 10 memories), we haverelatively large freedom to implement theMAC operationswith the filter coefficients.As an experiment, an oversampled filter bank multi-

carrier transmitter is mapped onto two Montiums. Theparameters are taken from the simulation model in [18],where the number of subcarrier M = 32, the oversam-pling ratio N = 36 and the length of the prototype filterL = 649. We simulated the GDFT task and the filter MACtask on a single Montium tile with the Montium simula-tor. Table 5 gives an overview of the number of processorcycles required to calculate one data symbol (32 samples)for each task. The result in Table 5 indicates that the fil-ter MAC is more computationally intensive than the GDFTtransform. If theMontium runs at 100MHz, the processingtime for filter bank multicarrier transmission takes about8 µs in this case. As indicated in [5], the power consump-tion of theMontium in 0.13µm technology is estimated at0.577 mW/MHz. The energy for processing one datasymbol (32 samples) is estimated as 465 nJ. We may con-clude that mapping the filter bank multicarrier onto the

Page 11: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 43

Montium results in acceptable latency and relatively lowpower consumption. Moreover, the Montium supports areconfigurable filter bank system. Therefore, the Montiumis a good candidate architecture to support the adaptivefilter bank multicarrier system for Cognitive Radio.

3.3. Cyclostationary feature detection

Cyclostationary feature detection has been proposedas a promising but computationally intensive alternativeto energy detection [23]. The system performance of thecyclostationary feature detection has been analyzed bymany recent publications such as [23]. However, ourinterestsmainly focus on the computation and the efficientimplementation.

3.3.1. Discrete cyclostationary feature detectionCyclostationary Feature Detection (CFD) consists of a

combination of an energy detector and a single correlatorblock. Because we aim at the implementation of CFD in thedigital domain, we will give the time discrete expressionsfor CFD (DCFD). We first define the sampled signal.

xk = x(k ·1fs

)(21)

where fs indicates the sampling frequency. The DiscreteFourier Transform (DFT) is applied to K samples.

Xn,v =K−1∑k=0

xn+k · ej2π n+kfs v. (22)

Finally, the Discrete Spectral Correlation Function (DSCF)is determined.

Saf =1N

N−1∑n=0

Xn,f+a · X∗n,f−a (23)

where ∗ indicates the complex conjugate. In case N = 2n,where n = 1, 2 . . ., the Discrete Fourier Transform beco-mes a Fast Fourier Transform (FFT) and the numberof complex multiplications that are involved becomes12N (

2 logN). Calculating the DSCF involves 14N2 complex

multiplications.Determining the DSCF involves a summation over n

(expression (23)). For each n, a similar type of computationhas to be executed which is illustrated in Fig. 14. Itillustrates the structure of the calculations for a singlen, a = −3..3 and for f = i, i + 1, i + 2, i + 3.For other values of a and f , the structure is similar andfor that reason in the remainder of this paper we willuse examples where f = 0, 1, 2, 3 and a = −3..3.The results of the FFT (Xn,v , expression (22)) and theircomplex conjugates (X∗n,v) are at the top of Fig. 14. Thesolid dots represent the multiplications within expression(23). A solid line connects a spectral value Xn,v to differentmultiplications, a dotted line connects a conjugatedvalue X∗n,v to different multiplications. The interconnectionpattern connects every multiplication to a ‘normal’ valueand to a conjugated value. Within a row, all Saf values aredetermined for a specific frequency f andwithin a column,all Saf values are determined for a specific frequency offseta. The summation over n is illustrated in Fig. 15. Forsimplicity, we omitted the reshuffling of the conjugated

Fig. 14. Structure for single n.

values. For the specific value S−32 , it is illustrated that itis the result of the summation over n of correspondingmultiplications. For all values Saf , a similar summation isexecuted.

3.3.2. Implementation on the Montium based MPSoCplatformFollowing a two two-step methodology, we mapped

cyclostationary feature detection onto the Montium basedplatform [24]. In the first step, we determine the tasksto be executed by the processing cores and analyze theinterconnection patterns between the processing cores.In the second step, the Montiums are used as processingcores for implementing the tasks. As an example, 256-point spectra are analysed for the implementation. Thetargeted platform is the Annabelle platformwhich consistsof 4 Montium cores. The input values are mapped ontomemories M09 and M10. Each memory contains 256input values. From each memory a value is read everyclock cycle. The read-address is generated by the AddressGeneration Unit (AGU) which accompanies each memory.We simulated the tasks for a single Montium tile withthe Montium simulator. Table 6 gives an overview ofthe number of processor cycles required for the differenttasks. The total number of complex multiply accumulateoperations equals 4064. Simulations show that a multiply-accumulate requires three clock cycles, so the requirednumber of clock cycles is 12192. For each 32 multiplyaccumulate operations, 3 additional clock cycles areneeded to read data which leads to 381 additional clockcycles. The input of theDSCF is a 256-point spectrumwhichcan be calculated by one Montium in 1040 clock cycles.The reshuffling of the conjugated values (Fig. 14) is donein 256 clock cycles and initially loading the Montium with

Page 12: Cognitive Radio baseband processing on a reconfigurable platform

44 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

Fig. 15. Representation of expression (23).

Table 6Number of processor cycles.

Task #cycles

Multiply accumulate 12192Read data 381FFT 1040Reshuffling 256Initialization 127Total 13 996

data requires 127 clock cycles. The total number of clockcycles for 1 integration step in the calculation as a DSCFthen equals 13996. If the Montium runs at 100 MHz, thetime required for the calculation of one integration step inthe calculation of the DSCF equals 139.96 µs.

4. Multiprocessor Demonstration Platform

In order to demonstrate the proposed algorithms,a multiprocessor demonstration platform called BasicConcept Verification Platform (BCVP) is used, see Fig. 16.Major components in the BCVP are two ARM processors,ARM920 and ARM946, where ARM946 is the processorused by default. The ARM946 use two tightly coupledmemories, one instruction memory of 32 Kbytes and onedata memory of 64 Kbytes. Whole BCVP has access toexternal Memory of 3 Mbytes in total. Another majorpart of BCVP is FPGA, which emulates three Montiumsconnected by a circuit switched Network-on-Chip. BCVPalso includes a serial port and a USB port.Since our focus is baseband processing on a multipro-

cessor platform, the RF frontend is not considered thusthe input data are from computer simulations. We havedemonstrated an OFDM based Cognitive Radio receiver. Itcan support various sizes of subcarriers by the reconfig-urable FFTs implemented on the Montium. When a largenumber of subcarriers is switched off, a normally radix-2FFT can be reconfigured to a sparse FFT on the Montium.Moreover, the dynamically reconfigurable FFTmodule alsoenables the multi-resolution sensing scheme discussed inSection 3.1.2. The FFT processing can also be switched from

Fig. 16. The major components in the BCVP.

the Montium processor to the ARM processor in order tomake performance comparison.The demonstration setup is shown in Fig. 17, which

has been part of the demonstration at DySPAN2008conference [25]. The PC loads the Montium initializationconfiguration and input data to the BCVP via a USB port.A graphical user interface on the PC can control thereconfiguration and send new reconfigurations and datato the BCVP and visualize the results (e.g. constellationplots of received signal and power spectral densityfrom spectrum sensing) retrieved from the BCVP via theUSB port.Some performance measurements on the demonstra-

tion platform have been presented in [26]. Due to the lim-itation of the FPGA, the emulated Montium on the BCVPonly runs at 6.8 MHz while the ARM processor runs at86 MHz on the BCVP. Although the Montium runs at 12times lower frequency than the ARM, it still outperformsthe ARM by more than 4 times in the speed of processingthe dynamically reconfigurable FFT module [26], which isthe core algorithm of the OFDM based Cognitive Radioreceiver.

Page 13: Cognitive Radio baseband processing on a reconfigurable platform

Q. Zhang et al. / Physical Communication 2 (2009) 33–46 45

Fig. 17. The AAF Cognitive Radio demonstration for the DySPAN2008conference.

5. Conclusions

In this paper, we proposed an MPSoC architectureto support adaptive baseband processing of CognitiveRadio. The key element on this platform is a home growncoarse grain reconfigurable processor called the Montiumdeveloped in our group. The Montium tile processoroffers reconfigurability in an energy efficient manner. Weinvestigated three DSP algorithms particularly interestingfor Cognitive Radio, namely reconfigurable sparse FFT,filter bank and DCFD and implemented these algorithmsonto the Montium based reconfigurable platform.A sparse FFT is proposed as a novel alternative to

a normal FFT for OFDM based Cognitive Radio wherea large number of subcarriers is deactivated. Based onthis sparse FFT, we also proposed a novel energy basedmulti-resolution spectrum sensing method which enablesCognitive Radio to focus on a small part of the spectrumin a finer resolution with low computational cost. The coreof this algorithm, a dynamically reconfigurable sparse FFTmodule, has been mapped onto the Montium.We proposed an oversampled filter bank multicarrier

for Cognitive Radio based on the generalized DFT imple-mentation. Based on the GDFT model, the oversampledfilter bank has been mapped onto the Montium. The filterbank processing on the Montium results in acceptablelatency and relatively low power consumption.Due to its high computational complexity, the DCFD has

been analyzed and mapped onto the Montium. The resultof the analysis is that on this platform, a spectrum (256points) and a DSCF (127 × 127 points) can be determinedwithin approximately 140 µs.Finally, someof the proposed algorithmshave been suc-

cessfully demonstrated on a prototype platform. The realmeasurement data confirms the advantage of performingthe OFDM based Cognitive Radio baseband processing onthe Montium.

References

[1] J. Mitola III, Cognitive radio: An integrated agent architecture forsoftware defined radio, Ph.D. Thesis, Royal Institute of Technology,Sweden, May. 2000.

[2] T.A.Weiss, F.K. Jondral, Spectrumpooling: An innovative strategy forthe enhancement of spectrum efficiency, IEEE Commun. Mag. (Mar.)(2004).

[3] aaf.freeband.nl.[4] F.W. Hoeksema, M. Heskamp, R. Schiphorst, C.H. Slump, A nodearchitecture for disaster relief networking, in: Proceedings of thefirst IEEE SymposiumonNew Frontiers in Dynamic SpectrumAccessNetworks, DySPAN2005, Baltimore, USA, Nov. 2005.

[5] Paul Heysters, Coarse-grained reconfigurable processors; Flexibilitymeets efficiency, Ph.D. Thesis, University of Twente, Sep. 2004.

[6] www.recoresystems.com.[7] M.D. van de Burgwal, G.J.M. Smit, G.K. Rauwerda, P.M. Heysters,Hydra: An energy-efficient and reconfigurable network interface, in:Proceedings of the 2006 International Conference on Engineering ofReconfigurable Systems and Algorithms, USA, 2006.

[8] P.T. Wolkotte, G.J.M. Smit, G.K. Rauwerda, L.T. Smit, An energy-efficient reconfigurable circuit switched network-on-chip, in:Proceedings of the 19th IEEE International Parallel and DistributedProcessing Symposium, IPDPS’05, USA, 2005.

[9] G.J.M. Smit, A.B.J. Kokkeler, P.T. Wolkotte, M.D. van de Burgwal,Multi-core architectures and streaming applications, in: Proceed-ings of the Tenth InternationalWorkshop on System-Level Intercon-nect Prediction, SLIP 2008, UK, 2006.

[10] P.M. Heysters, G.J.M. Smit, Mapping of DSP algorithms on the mon-tium architecture, in: Proceedings of Reconfigurable ArchitcturesWorkshop 2003, France, 2003.

[11] G.K. Rauwerda, P.M. Heysters, G.J.M. Smit, Towards software definedradios using coarse-grained reconfigurable hardware, IEEE Trans.Very Large Scale Integr. (VLSI) Syst. (Janurary) (2008).

[12] Q. Zhang, A.B.J. Kokkeler, G.J.M. Smit, An efficient FFT forOFDMbasedcognitive radio on a reconfigurable architecture, in: Proceeding ofIEEE International Conference on Communications 2007, UK, 2007.

[13] Q. Zhang, A.B.J. Kokkeler, G.J.M Smit, An efficient multi-resolutionspectrum sensing method for cognitive radio, in: Proceeding ofCHINACOM 2008.

[14] H.V. Sorensen, B. Sidney, Efficient computation of the DFT with onlya subset of input or output points, IEEE Trans. Signal Process. (Mar.)(1993).

[15] Y. Hur, J. Park, W. Woo, K. Lim, C.H. Lee, H.S. Kim, J. Laskar,A wideband analog multi-resolution spectrum sensing (MRSS)technique for cognitive radio (CR) systems, in: InternationalSymposium on Circuits and Systems, ISCAS, USA, 2006.

[16] Nathan M. Neihart, Sumit Roy, David J. Allstot, A parallel, multi-resolution sensing technique for multiple antenna cognitive radios,in: International Symposium on Circuits and Systems, ISCAS, USA,2007.

[17] P. Amini, R. Kempter, R-R. Chen, L. Lin, B. Farhang-Boroujeny, Filterbankmultitone: A candidate for physical layer of cognitive radio, in:SDR Forum Technical Conference, USA, 2005.

[18] Q. Zhang, A.B.J. Kokkeler, G.M.J. Smit, An oversampled filter bankmulticarrier system for cognitive radio, in: Proceeding of PIMRC2008.

[19] G. Cherubini, E. Eleftheriou, S. Olcer, J.M. Cioffi, Filter bankmodulation techniques for very high speed digital subscriber lines,IEEE Commun. Mag. (May) (2000).

[20] P.P. Vaidyanathan, Multirate System and Filter Banks, Prentice-Hall,1993.

[21] R.E. Crochiere, L.R. Rabiner, Multirate Digital Signal Processing,Prentice-Hall, 1983.

[22] B. Farhang-Boroujeny, Filter bank spectrum sensing for cognitiveradios, IEEE Trans. Signal Process. (May) (2008).

[23] R.B.D. Cabric, S.M. Mishra, Implementation issues in spectrumsensing for cognitive radios, in: 38th Annual Asilomar Conferenceon Signals, Systems and Computers, USA, 2004.

[24] A.B.J. Kokkeler, G.J.M. Smit, T. Krol, J. Kuper, Cyclostationary featuredetection on a tiled-SoC, in: Proceedings of DATE2007, France, 2007.

[25] S.B. Raghunathan, M Van den Oever, R. Doost-Mohammady,P. Pawelczak, I. Budiarjo, M. Heskamp, Q. Zhang, A.B.J. Kokkeler,H. Nikookar, Z. Qin, R. Hekmat, L.P. Lighart, Dynamic SpectrumAccess AAF Platform, in: The demonstration track of IEEE Interna-tional Symposium on New Frontier in Dynamic Spectrum Access2008, DySpan2008, USA.

[26] Q. Zhang, K.H.G. Walters, A.B.J. Kokkeler, G.J.M. Smit, Dynamicallyreconfigurable FFTs for cognitive radio on amultiprocessor platform,in: International Conference on Engineering of ReconfigurableSystems and Algorithms 2008, ERSA2008, USA.

Page 14: Cognitive Radio baseband processing on a reconfigurable platform

46 Q. Zhang et al. / Physical Communication 2 (2009) 33–46

Q. Zhang received his B.S. degree in electricalengineering from Jilin University, Changchun,China, in 2002 and his M.Sc. (with distinction)degree in wireless communication from theUniversity of Southampton, Southampton, UK,in 2004.Currently he is working towards his Ph.D.

degree in the University of Twente, Enschede,The Netherlands. His research interests includeCognitive Radio, wireless communication, signalprocessing and low-power embedded system

design.

A.B.J. Kokkeler received hisM.Sc. degree in elec-trical engineering from theUniversity of Twente,Enschede, The Netherlands. He finished his Ph.D.thesis entitled ‘‘Analog-Digital Codesign usingCoarse Quantization’’ in 2005. He currently is anassistant professor with the faculty of EEMCS,University of Twente, where he is involved inresearch projects, sponsored by the Dutchgovernment and industry.After receiving the M.Sc. degree, he worked

for eight years at the Netherlands foundation forresearch in astronomy (ASTRON) as a scientific projectmanager andmorethan 6 years at Ericsson as a system engineer. Since 2003 he has workedat the University of Twente. He has a background in telecommunication,mixed-signal design and signal processing (beamforming). Currently, hismain interest lies in the area of applying low-power design techniquesfor computationally intensive applications. The emphasis is on reconfig-urable architectures for streaming applications.

G.J.M. Smit received his M.Sc. degree in electri-cal engineering from the University of Twente,Enschede, The Netherlands. He finished his Ph.D.thesis entitled ‘‘The Design of Central SwitchCommunication Systems for Multimedia Appli-cations’’ in 1994. He currently is a Full Professorwith the faculty of EEMCS, University of Twente,where he is responsible for a number of re-search projects sponsored by the EC, industry,and Dutch government in the field of multime-dia and efficient reconfigurable systems.

After receiving the M.Sc. degree, he worked for four years at theResearch Laboratory of Océ, Venlo, The Netherlands. In 1994, he was aVisiting Researcherwith the Computer Laboratory, CambridgeUniversity,Cambridge, MA, and, in 1998, he was a Visiting Researcher with LucentTechnologies Bell Labs Innovations, Murray Hill, NJ. Since 1999, he hasbeen leading the CHAMELEON group, which investigates new hardwareand software architectures for energy-efficient systems. Currently, hisresearch interests include low-power communication, and reconfigurablearchitectures for energy reduction.

K.H.G. Walters finished his M.Sc. thesis, titledCognitive Radio on a Reconfigurable platform, incomputer science from theUniversity of Twente,Enschede, The Netherlands in 2007.Currently he is working towards his Ph.D.

degree in the University of Twente. His researchinterests include, signal processing, floating-point hardware, low-powermulti-core architec-tures and compiler construction.