06152180

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 59, NO. 5, MAY 2012 1107

VLSI Architecture for a Reconfigurable SpectrallyEfficient FDM Baseband Transmitter

Paul N. Whatmough, Member, IEEE, Marcus R. Perrett, Member, IEEE, Safa Isam, Student Member, IEEE, andIzzat Darwazeh, Senior Member, IEEE

Abstract—Spectrally efficient FDM (SEFDM) systems employnon-orthogonal overlapped carriers to improve spectral efficiencyfor future communication systems. One of the key research chal-lenges for SEFDM systems is to demonstrate efficient hardwareimplementations for transmitters and receivers. Focusing on trans-mitters, this paper explains the SEFDM concept and examines thecomplexity of published modulation algorithms, with particularconsideration to implementation issues. We then present twonew variants of a digital baseband transmitter architecture forSEFDM, based on a modulation algorithm which employs thediscrete Fourier transform (DFT) implemented efficiently usingthe fast Fourier transform (FFT). The algorithm requires multipleFFTs, which can be configured either as parallel transforms, whichis optimal for throughput or using a multi-stream FFT architec-ture, for reduced circuit area. We propose a simplified approachto IFFT pruning for pipeline architectures, based on a token-flowcontrol style, specifically optimized for the SEFDM application.Reconfigurable implementations for different bandwidth com-pression ratios, including conventional OFDM, are easily derivedfrom the proposed implementations. The SEFDM transmittershave been synthesized, placed and routed in a commercial 32nm CMOS process technology and also verified in FPGA. Wereport circuit area and simulated power dissipation figures, whichconfirm the feasibility of SEFDM transmitters.

Index Terms—Bandwidth efficiency, multicarrier modulation,transmitter, wireless communications.

I. INTRODUCTION

I N ORDER to accommodate for the ever growing demandfor bandwidth, spectrally efficient FDM (SEFDM) systems

emerged as multicarrier communication systems promotinghigher spectral efficiency than the well-known orthogonalfrequency division multiplexing (OFDM). The first systems toappear were Fast OFDM (FOFDM) [1] and m-ary amplitudeshift keying OFDM (MASK) [2], both of which halve thespectrum utilization, but are constrained to one dimensional

Manuscript received September 01, 2011; revised November 04, 2011 andDecember 15, 2011; accepted December 19, 2011. Date of publication February14, 2012; date of current version May 09, 2012. This work was supported inpart by the U.K. Engineering and Physical Sciences Research Council (EPSRC)through the Engineering Doctorate program and ARM Ltd., Cambridge, U.K.This paper was recommended by Associate Editor A. B. da Silva.P. N. Whatmough is with the Electrical and Electronic Engineering Depart-

ment, University College London, London SW7 2AZ, U.K., and ARM Ltd,Cambridge, CB1 9NJ, U.K. (e-mail: [email protected]).M. R. Perrett, S. Isam, and I. Darwazeh are with the Electrical and Electronic

Engineering Department of University College London, London SW7 2AZ,U.K. (e-mail: [email protected]; [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TCSI.2012.2185304

modulations such as BPSK and M-ary ASK. Following this,came spectrally efficient FDM (SEFDM) [3], high compactionmulticarrier-communications (HC-MCM) [4], overlappedFDM (Ov-OFDM) [5] and multi-stream faster than Nyquistsignaling (FTN) [6]–[8] all of which promote variable spectralutilization savings for two dimensional modulations. All vari-ants of SEFDM systems are basically multicarrier modulationschemes that multiplex non-orthogonal overlapped sub-car-riers. In principle, non-orthogonal multicarrier systems achievespectral savings by either reducing the spacing between thesubcarriers in frequency and/or transmission time, thus, com-municating information at a faster than Nyquist rate. In theory,such spectral utilization improvement is supported by the Mazolimit established in [9] stating that signaling at rates beyond theNyquist can be achieved without performance degradation.Despite the favorable spectral savings on offer, in practice,

the loss of orthogonality complicates both signal generation anddetection. For the detection problem, many detectors have beenproposed and evaluated in the literature. Maximum likelihood(ML) is suggested for detection as the optimum technique inadditive white Gaussian noise (AWGN) channels [1]. Never-theless, ML detection is overly complex, with a computationalcomplexity that grows exponentially with the size of the system.On the other hand, linear detectors, such as minimum meansquare error (MMSE) and zero forcing (ZF), constrain the sizeof the SEFDM system in order to yield competitive bit errorrate (BER) performance [10]. Truncated singular value decom-position (TSVD)-based detection proposed in [11], has demon-strated improved error performance for systems with relativelylarge numbers of carriers. Furthermore, iterative detection inthe form of sphere decoders (SDs) proposed in [12] and [13]showed optimum BER performance with a decreased computa-tional complexity which is later extended to fading channels in[14]. However, the variable complexity of the SD algorithms ledto the proposal of the fixed complexity sphere decoder (FSD),where the complexity is tradable with error performance [15]and [16].As for the generation of SEFDM modulated symbols, recent

work has shown in [17]–[19] that the SEFDM signal can be re-alized with a similar complexity to OFDM system, by utilizingstandard inverse discrete Fourier transform (IDFT) blocks, judi-ciously arranged for SEFDM modulation. Minor modificationson the input streams are needed and the designs rely mainly onstandard IDFT operations that can be efficiently realized withthe inverse fast Fourier transform (IFFT) algorithm. However,there are a number of practical implementation challenges forSEFDM systems, most significantly the minimization of areaand power dissipation overheads which are critical in meeting

1549-8328/$31.00 © 2012 IEEE

1108 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 59, NO. 5, MAY 2012

Fig. 1. SEFDM conceptual block diagram.

market demands for manufacturing cost and end product batterylife.SEFDM transmitters using field programmable gate arrays

(FPGAs) have been reported recently [20]. Meanwhile, thepractical VLSI implementation of the closely related FTNsignaling scheme has been investigated in both FPGA and65-nm CMOS technology [8]. A first study of SEFDM trans-mitter VLSI implementation in 65-nm CMOS technology wasreported by the authors in [21], utilizing an IFFT-based archi-tecture for a reconfigurable design which enables modificationof signal subcarrier spacing and demonstrates the feasibilityof ASIC integration. In addition to this prior work targetingwireless applications, there have also been a number of recentdemonstrations of practical optical communications systemsmaking use of spectrally efficient FDM modulation [22]–[24]and reporting substantial increases in system capacity.In this paper, we significantly extend our work in [21]

by adding novel contributions in three main areas. First, wepresent a detailed algorithmic complexity analysis, comparingasymptotic complexity for known SEFDM modulation algo-rithms over the dimensions of number of sub-carriers and alsobandwidth compression ratio. Second, we introduce a novelVLSI architecture based on the multi-stream FFT, which offerssubstantially reduced circuit area and power consumptioncompared to the previously published parallel approach. Boththe parallel and novel multi-stream architectures resembleconventional OFDM modulators in that they are based on IFFToperations, with the former optimal for highest throughput andthe latter offering a significant saving in circuit area and powerdissipation. Both architectures can be reconfigurable for dif-fering degrees of sub-carrier overlap, enabling the transmitterto switch from conventional OFDM to a more spectrally effi-cient FDM system on a symbol-by-symbol basis. Finally, wereport a new implementation technique to address the increase

in silicon area and power dissipation required for SEFDM,relative to previous generations of OFDM transmitters [25],[26]. This is achieved by a novel application of token-flowcontrol to perform IFFT pruning [27] of butterfly calculationsand first-in, first-out (FIFO) entries. Implementation results aregiven in 32-nm CMOS.The rest of the paper is organized as follows. Section II

gives a brief theoretical introduction to the SEFDM systemand achievable receiver performance, Section III introducesthe IDFT-based signal generation algorithm and discussesthe complexity reduction potential. Section IV describes thealgorithm-hardware mapping with optimizations and Section Vgives corresponding implementation results. Section VI con-cludes the paper with a focus on the advantages of the proposedarchitectures.

II. THE SEFDM SYSTEM

A generic SEFDM system is shown in Fig. 1. The signal isgenerated by the superposition of several non-orthogonal car-riers each carrying a complex symbol denoted asto represent two-dimensional modulations.The carriers in SEFDM systems are spaced by a fraction of

the inverse of the symbol duration, thereby violating the orthog-onality condition of the OFDM system, where the spacing isequal to the inverse of the symbol duration. The distance be-tween the carriers in frequency, denoted by , is given by

, where denotes the amount of bandwidth com-pression and is the duration of one SEFDM symbol. Equation(1) gives the baseband-time domain representation of a singleframe of an SEFDM signal denoted by . We follow a rep-resentation similar to that in [25], where is the number ofsubcarriers, denotes a complex valued symbol modulated on

WHATMOUGH et al.: VLSI ARCHITECTURE FOR A RECONFIGURABLE SEFDM BASEBAND TRANSMITTER 1109

Fig. 2. BER performance of a 16-subcarrier SEFDM system carrying QPSKsymbols for –0.9, corresponds to OFDM.

the th subcarrier and windowed by a time limited rectangularfunction defined over the period :

(1)

A discrete representation of SEFDM signals can be obtainedby sampling each SEFDM frame, shown in (1) above, at a rate

. Thus, a single discrete SEFDM will be given by

(2)

where . In the next section, it will be shown howthe samples of the SEFDM signal as in (2) can be generatedbased on IDFT operations [17].At the receiver, the SEFDM signal contaminated by AWGN,, is represented by

(3)

The SEFDM receiver generates statistics of the incomingsignal, , through employing the dual of the transmitter oper-ations [15], [16]. Such statistics are fed to a detector to generateestimates of the transmitted signal [1]. Many techniques havebeen investigated for the detection of SEFDM signal [3], [5]and it is confirmed that successful detection is achievable forwide range of bandwidth compression levels. As a demon-stration, Fig. 2 is provided to illustrate the BER performanceof a received SEFDM signal detected with the modified FSDalgorithm [16]. The curves in the figure demonstrate slight errorperformance degradation for attractive bandwidth savings.The preceding analysis in AWGN channels is included here as

a general introduction to SEFDM system performance. Not withstanding, the effects of frequency selective fading are of key sig-nificance and transmission over different fading channel condi-tions is an important area of study for wireless systems. Opera-tion under various frequency selective fading conditions in [14]

Fig. 3. Unit circle representation of the relationship between the frequencysamples of SEFDM system (diamonds) and IDFT operation (diamonds and cir-cles).

and [28] has shown the suitability of joint equalization and de-tection, leading to the development of bespoke SEFDM channelestimation techniques. Furthermore, recent work on SEFDMhas reported lower peak to average power ratio (PAPR) relativeto an equivalent OFDM signal [28]. This is an important advan-tage, as it relaxes the detrimental effects of analogue front endnonlinearities and allows improved efficiency in the RF poweramplifier, which can be enhanced further using a PAPR reduc-tion algorithm for SEFDM signals [29]. Additionally, varioustechniques for signal precoding, optimized for specific valuesof bandwidth compression and number of carriers, to facilitatesimplified reception and decoding, have been published recently[30]. However, the details of signal coding, fading channel andPAPR reduction are beyond the scope of the work presentedhere, which focuses on transmitter implementation aspects.

III. IDFT-BASED SEFDM MODULATION ALGORITHMS

A. General Description

Equation (2) describes the samples of the discrete SEFDMsignal. In analogy to OFDM, it is shown that the SEFDM signalcan be expressed by IDFT operations [17]. It is shown that thereare ways to express the SEFDM signal with an IDFT operationwith simple manipulations of the input symbol vectors. Thesemanipulations are merely in the form of zero insertions either atthe end of the vector only, in a manner similar to zero paddingand/or between the symbols. The change in length ensures thealignment of the IDFT frequency samples and the SEFDM sub-carriers and the zeros suppress the unwanted frequencies. Fig. 3illustrates how the frequency samples are related to the IDFTsamples. The figure depicts the frequency samples of a 4-car-rier SEFDM system for and the frequency samplesof an 8-point IDFT operation on the unit circle of the complexz-plane. The figure shows that the SEFDM system is equivalentto the 8-point IDFT of a vector whose last four elements areequal to zero.


Fig. 4. Generating SEFDM signal using Type-I algorithm, based on a singleIDFT operation.

For integer values of (i.e., ), the work in[17]–[19] shows that a discrete SEFDM signal can be de-scribed as

(4)

where

(5)

and is the point IDFT of the argument, withbeing a vector of length , whose elements take the valuesof either input symbols or zeros as

(6)

Thus, the SEFDM signal can be realized with a single IDFTblock, with a length longer than , which we refer to as theType-I modulation algorithm. The SEFDM transmitter in thiscase is depicted in Fig. 4. Furthermore, it is shown in [20] thatby expressing the term as a rational number, that is by taking

, where both and are integers and , the SEFDMsignal can be expressed as

(7)

As for the case above, here we define to be a vector of lengthwhose elements take the values of either the input sym-

bols or zeros as

otherwise(8)

and . Equation (7) can be rearrangedas

(9)

by substituting with .

Equation (9) clearly shows that the samples of the SEFDMsignal can be generated using IDFT operations each of lengthof points. The input symbols are padded with zerosand then arranged as a matrix in column major order.An IDFT operation is then performed on each row. The signalis finally composed by combining rotated versions of the IDFToutputs as depicted in Fig. 5. We refer to this approach usingmultiple IDFT operations as the Type-II algorithm.

B. Algorithm Complexity Analysis

The major advantage of the IDFT-based generation of thesignal is the reduction in complexity. The computational com-plexity of the directly generated SEFDM signal as in (2) iscomplex multiplications and complex additions. Ina digital system, this would typically be implemented using di-rect digital synthesis (DDS) [31], with asymptotic complexityof . On the other hand, the IDFT-based transmitters canbe economically realized by means of the FFT algorithm withcomplexity of . The complexity of the Type-I al-gorithm is given by a single large IFFT with points, thus

(10)

By contrast, the complexity of the Type-II transmitter is a com-bination of the complexity of IFFT blocks each of length ,giving

(11)

Fig. 6 illustrates the complexity of the two proposed IFFTSEFDM modulation algorithms (with ), along withthe DDS approach and conventional OFDM, as a functionof . Clearly the Type-I algorithm requires fewer operationsthan Type-II for the same value of and . Both proposedapproaches require several orders of magnitude fewer opera-tions than DDS for many combinations of and . However,even the Type-I algorithm demands a significant increase incomputational power compared to conventional OFDM.It is important to note that, unlike the DDS approach, the

complexity of IFFT-based algorithms varies not only with di-mension , but also . To this end, Fig. 7 shows how the com-plexity of the proposed algorithms varies as a function of ,for . Since the number of IFFT points required by theType-I algorithm is given by , we find that for large valuesof the complexity reduces. Conversely, the Type-II algorithmexhibits increasing complexity with .Due to the zero padding operations previously described,

both IFFT-based modulation algorithms require the computa-tion of IFFTs with a number of zero bins. There is thus scope to“prune” the IFFT trellis to remove redundant operations withzero-operands [27], [32]. For a transform of size , withnon-zero inputs, the complexity for a maximally pruned IFFTtends to . Hence, we also consider ideally prunedversions, where all operations with zero operands are skipped.Thus, the ideally pruned Type-I modulation algorithm achievescomplexity of

(12)


Fig. 5. SEFDM IDFT-based transmitter using Type-II algorithm with multiple IDFT operations.

Fig. 6. Asymptotic complexity of DDS- and IDFT-based SEFDM modulationalgorithms as a function of . Conventional OFDM is included for comparison.

and the ideally pruned Type-II is lower bounded by

(13)

These are also plotted in Fig. 7 for comparison. Generally,the pruned Type-I algorithm shows little benefit, especially forlarger bandwidth compression ratios where there are very fewzero bins in the single large transform. The number of zero binsin the Type-II algorithm increases with bandwidth compressionratio and hence using a pruned transform gives an increasinglybeneficial complexity reduction. In general, it is often difficult toachieve such a complexity reduction in a hardware implementa-tion and hence the asymptotic notation is merely a lower boundin practice. We will discuss this in more detail in Section IV.In summary, both IFFT-based algorithms show a strong

advantage in complexity reduction compared to the directlygenerated DDS approach. Since they are based on IFFT blocks,there is also excellent compatibility with existing systems formulti-carrier orthogonal signaling. However, the number ofIFFT points required for the Type-I algorithm oftenresults in non integer and/or non power of 2 values andconsequently is only suitable for a small subset of possible pa-rameters, i.e., where . Hence, although the Type-II

Fig. 7. Asymptotic complexity of DDS- and IDFT-based SEFDM modulationalgorithms as a function of , with . Conventional OFDM is includedfor comparison.

algorithm has less attractive complexity, which increases with, it is better suited to hardware implementation using IFFTblocks. Since the length IFFT is common to conventionalOFDM transmitters, it is possible to reuse this block, retainingbackward compatibility with the incumbent OFDM systems.Advantageously, an SEFDM receiver may use the dual of thetransmitter algorithm, and thus all the optimizations describedin this paper, while focused on the transmitter, are equallyrelevant to receiver implementation.

IV. VLSI ARCHITECTURE

A. Parallel Transform Architecture

In this section, we describe a VLSI architecture which resultsfrom directly implementing the Type-II SEFDM signal genera-tion algorithm described in Section III. Such an architecture isbased on parallel length- IFFTs which are the key buildingblocks of current OFDM transmitters and are therefore attrac-tive both for reasons of backwards compatibility and general fa-miliarity. For the sake of brevity, we describe here only aspectsthat differ from the traditional IFFT-based OFDM transmitter[26]. The algorithms presented in Section III are suitable for the


Fig. 8. Parallel SEFDM transmitter architecture, consisting of input reorder logic, parallel IFFTs and postprocessing complex multipliers. Diamonds representtrivial complex multiplications.

Fig. 9. General operation of symbol reordering for arbitrary . The resultingmatrix arising from the column major reordering of the preceding vector(defined in (8)) which is derived from the input symbol vector, and containscomplex values.

generation of signals with arbitrary values of , but complexityand power dissipation rise linearly with , so we focus on im-plementations optimized for , which arerealistic values for practical systems.A reconfigurable transmitter with adjustable sub-carrier

spacing is also attractive for two key reasons. First, it allowsus to adapt in order to maximize the tradeoff between re-ceiver complexity, spectral efficiency and prevailing channelconditions. Second, supporting allows us to maintainbackward compatibility with the many incumbent OFDM sys-tems [25]. To this end, we ensure that can be reconfigured foreach FDM symbol. We assume -ary digital QAMmodulationfor sub-carrier symbols, and do not consider mapping of pilotsymbols.Fig. 8 shows a high-level block diagram of the reconfigurable

SEFDM IFFT-based transmitter for , with a pipelineradix- IFFT implementation [33]. The modulator consists ofzero-insertion and reorder, the parallel IFFTs and postpro-cessing, as discussed individually in the following sections.1) Zero Insertion and Reorder: Fig. 9 illustrates the gen-

eral symbol reordering operation, which consists of padding theinput symbols with zeros before arranging them as a

matrix in column major order, as described in Section III.A naive implementation of this operation implies a buffer ofcomplex words to hold the sparse complex matrix. However,since each incoming symbol is mapped to only one IFFT input,

it is only necessary to use a multiplexer in front of each IFFT tochoose either the incoming symbol or sample.The control signals required to operate the multiplexers are

generated by modulo arithmetic operations on a -bitcounter, which reconfigures for different parameters. In ouroptimized implementation, this process is replaced with a LUTcontaining addresses for preset values of [21]. To keepthe parallel IFFT pipelines fed, we require read ports inthe preceding symbol buffer. However, since the detection ofSEFDM signals of incurs an increasing BER penalty,we generally only require and hence the maximumnumber of read ports required is two to provide symbols andin parallel on each clock cycle.2) Parallel IFFTs: The -point IDFTs are implemented in

this section as -point IFFTs, which can be implemented asparallel IFFT blocks or as a smaller number of time-multiplexedblocks, which is considered in the next section. Using parallelIFFTs allows the highest throughput and constant latency inde-pendent of , at the cost of linear increase in area and power.We have used 64-point, 16-bit complex IFFT blocks-based onthe radix- flowgraph [33]. The IFFTs have an enable signalwhich when de-asserted gates the internal clock and clears theoutput registers to zero.3) Postprocessing: The postprocessing operation combines

the parallel IFFT outputs after multiplication with a complexexponential in order to produce the discrete-time output sam-ples, . The complexity of the postprocessing is a linearfunction of , where we require complex multiply ac-cumulate (CMAC) operations. The hardware required includesthe CMACs and LUTs to store precalculated rotation coeffi-cients in read-only memory (ROM). A total of LUTsare required for each configuration. Due to inherent symmetryin the rotation coefficients, the storage required can in practicebe easily reduced to the range [0, ] per LUT at the costof increased complexity in the address generator logic, whichis subsequently required to count both up and down the LUTand requires a conditional negation of the output value. Forhigher performance implementations, the critical paths throughthe complex multipliers form a feed-forward cutset that can bearbitrarily pipelined to meet throughput requirements at the costof additional latency.

B. Multi-Stream Transform Architecture

The previously described parallel architecture is able to main-tain iso-throughput for all compression modes. However, this


Fig. 10. Multi-stream implementation of SEFDM transmitter, consisting of input reorder logic, multi-stream IFFT with extended FIFOs and single postprocessingcomplex multiply accumulate stage. Diamonds represent trivial complex multiplications. An increased clock frequency of c x base sample rate is required tomaintain equal throughput.

Fig. 11. Partially pruned half BFs occur when one of the complex inputs iszero.

requires an investment in circuit area proportional toover all modes. Such an increase in silicon area is obviouslyprohibitive for large values of and also possibly detrimental topower efficiency due to subthreshold leakage.An alternative to the systolic implementation is to use

time-multiplexed IFFTs and CMACs, which in the extreme caseof , requires only a single IFFT and CMAC with suffi-cient storage to hold the intermediate accumulation results.This approach implies a reduction in throughput proportional to

, and an increase in latency proportional to . The reduc-tion in throughput can be compensated by increasing the clockfrequency to maintain comparable throughput at significantlylower silicon area. It is acknowledged that this approach has apractical limitation in that providing a derived clock at arbitraryinteger multiples of the sample rate is not trivial and may re-quire an additional clock synthesizer.To avoid having to store a whole time-domain symbol in the

serial implementation, it is possible to instead use amulti-streamIFFT. This idea is already well known fromMIMO transmitterswhich require multiple IFFTs to modulate separate data streamsfor multiple antennas [34]. The multi-stream IFFT accepts al-ternately interleaved samples from a number of input streamson successive cycles, such that transformed output samples ap-pear at the output, as opposed to transforming the whole symbolat once, as shown in Fig. 10. This means that we only need tobuffer a single sample at the output in order to complete thecomplex multiply accumulate. It also significantly reduces thelatency. The main drawback with the multi-stream IFFT ap-proach is that it requires an increase in internal storage by afactor of .With a multi-stream IFFT, the symbol reorder block is only

required to generate a single complex sample per cycle andtherefore reduces to a single multiplexer, presuming the mod-ulated sub-carrier symbols are suitably arranged in a preceding

buffer. For low throughput systems, it is also possible to replacethe CMACwith the CORDIC algorithm to reduce further circuitarea and power dissipation [35].

C. Optimized IFFT for Sparse Inputs

The final reordered matrix of Fig. 9 containszero symbols, which leads to redundant operations with zerooperands in the initial stages of the IFFT algorithm. It is impor-tant to take advantage of this if we are to efficiently implementmodulators for large ratios of . In software approaches, it isrelatively straightforward to optimize away the redundant but-terfly (BF) operations and at the same time reduce the storagerequirement [27]. However, in hardware, it is often difficult toimplement pruned IFFTs because they lose special regularityin the signal flow graph, which is important to retain for effi-cient implementation. Hence, instead, we propose a simple ap-proach to reduce the storage requirement and the number of but-terfly (BF) operations to be calculated. Our analysis focuses onthe well known radix- pipeline architecture, which is particu-larly popular for OFDM systems, the main advantage is that thesame number of multiplications are required as a radix-4 algo-rithm, but with only simple radix-2 butterflies and a very regulartopology.In order to entirely prune a radix-2 BF operation, both com-

plex inputs need to be zero. When only one complex input to aBF is zero, it cannot be pruned per se, but can be simplified toeither a simple mapping to two output nodes or a mapping withnegation, as shown in Fig. 11. In the following sections, we out-line two practical approaches to pruning IFFTs for sparse inputs.The first allows an entire BF stage to be removed for some ra-tios of , while the second is suitable for all and allows clockgating of BF datapaths for pruned operations, which reduces theenergy cost of the redundant operations1) Whole Trellis Stage Pruning for : In this case, the

reordered zero bins (cf. Fig. 9) are arranged in a compactmannersuch that nonzero bins are followed by at least contiguouszeros. Hence, the first IFFT trellis stage contains only half BFsand can be pruned entirely up to the input to the first complexmultiplier, removing BF operations and com-plex words of storage. Fig. 12 shows the IFFT signal flow graphfor , showing the gray edges which are redundant. Un-fortunately, as previously mentioned, bandwidth compressionratios less than 1/2 incur a BER penalty and hence this opti-mization is only really applicable to , at least in thisparticular application.


Fig. 12. IFFT signal flow graph for , . Gray edges of graph are redundant, leading to removal of the first stage by replacing the BF operationswith a mapping operation which repeats the first points into the complex multipiers, requiring no additional hardware.

Fig. 13. Optimized BF and complex multiplier datapaths with clock gating forpruned operations controlled by zero flags which accompany the two complexoperands. Both zero flags are cleared if one of the operands is nonzero. Alldatapath arithmetic is complex with 16-bit real and imaginary parts.

2) BF Clock Gating for : For larger values of , zerobins are less regularly distributed in the trellis and it becomes in-creasingly difficult to implement a pruned algorithm, especiallyin the case of a reconfigurable . Therefore, we implement asuboptimal pruning approach based on token-flow style controlwhich allows us to clock gate BF stages and complex multi-pliers when the input operands are zero and also use an opti-mized FIFO which reduces the required storage significantly,using the same mechanism.Fig. 13 shows the general approach to reducing power in BFs

and complex multipliers with zero input operands. Each 32-bitcomplex operand and is associated with a 1-bitzero flag and , which is initially generated by theinput reordering logic. The zero flags then follow the datapathand is cleared only when a BF operation is performed on onezero and one nonzero operand. The flag enables simple controlof datapath clock gating (illustrated in Fig. 13 with the inputto the registers), such that register clock pins are isolated fromthe clock tree when a redundant BF or complex multiplicationoccurs. Since both the input and output registers are gated in thiscase, power is saved both in switched datapath capacitance andthe clock tree driver load. The area and power overhead of thelogic for the zero flag is negligible.

D. Optimized FIFO Storage

It is well-known that FIFO storage in pipeline FFTs consumessignificant area and power, a problem which is multiplicativelyworse in the case of either a multi-stream transform or equiv-alent parallel transforms. The FIFOs implement a fixed delayin order to establish the necessary stride between data movingthrough the trellis. As such, these structures are often imple-mented using serially connected registers. The power cost of thisshift register (SR) structure is significant, since all the flip-flopsare clocked on every cycle where the module is active, in orderto shift every data item to the next register. Because the clockload increases with the FIFO length, the SR structure exhibitspoor power efficiency. A dual-port (one read port, one writeport) SRAM macro can perform a similar function to the SRwith the addition of simple addressing logic, but is not area ef-fective for such a relatively small number of words. Registerfiles consisting of static latches offer low area and power [36],but generally require a semi-custom design approach, so are notconsidered here.Instead of moving the data every cycle, we opt to implement

a ring-buffer (RB) FIFO style, where instead of moving all databits on every cycle, a circular pointer is updated to enable onlythe register that is to be written [37]. All registers except the oneto be accessed can be clock gated, which reduces the clock loaddrastically, while the register that is to be read is addressed bythe pointer using a combinatorial read-out path. Unlike the SR,the RB has a clock load which is approximately independent ofthe number of entries and is therefore well suited to the currentdesign where there are significant numbers of such FIFO arrays.The zero flag is used to enable a reduction in the internal

storage requirement of the IFFTs (both in the multi-stream andparallel architectures), leading to further savings in area andpower. It is not necessary to use a full 32-bit complex word torepresent the sparse samples. Instead, we compresszero-samples in the FIFO buffers by storing only a single bit ina full-length 1-bit FIFO buffer, as illustrated in Fig. 14. Whenthe flag is asserted, the main array is not accessed and the


Fig. 14. Optimized FIFO with clock gated FIFO of entries and single 1-bitshift register (SR) of length .

Fig. 15. Area and simulated power consumption of SR and RB FIFO styles forthe larger sizes required for a 64 point Radix- transform. Complex word sizeis 32 bits, clock frequency is 100 MHz.

bit goes low which is used to gate the clock, reducingswitching activity when reading or writing a zero sample. Fur-thermore, the size of the array itself is reduced to words, wherein the case of the otherwise largest FIFO, at the first stage, isindependent of and stays fixed at 32 words, instead ofwords, since the number of nonzero bins stays fixed at (cf.Fig. 5).

V. IMPLEMENTATION RESULTS

The proposed architectures have been implemented in Ver-ilog and verified using register transfer level (RTL) simulationand FPGA prototyping [38]. The designs were implemented in acommercial 32-nm low-leakage CMOS cell library using stan-dard-cell logic synthesis, automatic place and route (APR) andstatic timing analysis (STA) tools. Synopsys PrimetimePX wasused for power estimation at TT process corner, nominal supplyvoltage and 25 C temperature. For a baseline comparison, wealso present results for a conventional OFDM modulator whichconsists of a single IFFT module. All designs were constrainedto an achieved clock period of 10 ns, which was verified at theSS 125 C corner using STA with suitable manufacturing mar-gins.

A. FIFO Implementation

To illustrate the tradeoffs between FIFO implementationstyles, we implemented the SR and RB FIFOs previously

described. Fig. 15 shows circuit area1 and power dissipationfrom simulation for FIFOs with 32 words down to 8 wordsfrom a 64-point pipeline IFFT (smaller buffers are requiredbut have been omitted for clarity). We find that the RB designexhibits a power dissipation reduction of 78% compared to thetraditional SR. The reason for the significantly lower powerdissipation is that the SR clocks every flip-flop in the structureon every cycle and hence area and power both increase withnumber of words. The RB only clocks the flip-flops in a singleentry per cycle, which is independent of the number of entries.The RB power dissipation does increase somewhat as thecircular address pointer size increases and, more significantly,the multiplexer structure in the read-out path also increasesin size. The RB FIFO is key to addressing power efficiencyin multi-stream pipeline IFFTs and is used exclusively in thefollowing implementations.

B. Parallel and Multi-Stream Architectures

Fig. 16 shows results for parallel and multi-stream implemen-tations of modulators for bandwidth compression ratios of 1,1/2, 2/3, and 3/4. The parallel architecture gives a linear areaand power cost proportional to as all the designs operate atthe same clock frequency of 100 MHz. The multi-stream archi-tecture requires an increasing clock frequency constraint whichtends to require logic gates with larger device sizes leading toa somewhat nonlinear increase in circuit area and power dissi-pation. Generally, the multi-stream architecture is more suitedto area constrained designs, while the parallel architecture canreach minimum power consumption. The fully paralleland multi-stream are merely the limits of the de-sign space and it is entirely feasible to exploit tradeoffs where

, especially where this can relax otherwise awkwardrequirements for integer-multiple clock frequencies. At this de-sign point, the raw (uncoded) throughput is 17.4 Mbps withQPSK modulation, up to 52.2 Mbps with 64 QAM modulation.Fig. 17 shows the SEFDM signal spectrum for the consideredbandwidth compression ratios, with 64 QPSK modulated sub-carriers.

C. Optimized Architecture

The pruning techniques described in Section IV have alsobeen implemented, leading to reductions in circuit area andpower dissipation. The first stage of BFs have been entirelypruned for , while the token flow clock gating andFIFO reduction approaches have been applied forand . Fig. 18 shows area and power dissipation forthese optimized designs, plotted on the same scale as Fig. 16for comparison. In the case of the parallel implementation,the clock gating effectively addresses the power consumptionincrease with , such that as the number of IFFT computedincreases, the number of BFs and FIFO accesses does notincrease significantly. The area increase is still proportionalto Mc, although the absolute numbers are reduced due to theoptimized storage. For , the area saving is around 25%,with a 13% power reduction. The multi-stream case shows

1Circuit area results are given in equivalent gates, by normalizing to the areaof a 2-input NAND cell (NAND2X1) from the employed cell library.


Fig. 16. Area and simulated power consumption of parallel (squares, ,MHz) and multi-stream (circles, , MHz)

SEFDM transmitters.

Fig. 17. Spectra of SEFDM modulated signals for considered bandwidth com-pression ratios.

a more marked improvement in absolute numbers, since theincreased clock frequency requirement makes the design moresensitive to optimization. In this case, the saving in area at

is 42%, while power dissipation is reduced by 33%.Compared to the baseline OFDM implementation, we find

that the optimized architectures give viable results. The multi-stream architecture requires only a 60% increase in silicon areaand four fold increase in power dissipation when comparing

SEFDM with OFDM. The parallel architecture, whileoffering higher performance, requires a 140% increase in areawith a 180% increase in power dissipation.

D. Reconfigurable Architecture

It is envisaged that a key feature of future spectrally efficientsystems will be the ability to adapt the tradeoff between spec-tral occupancy and BER based on observed channel conditions,much as we currently do with adaptive modulation and coding.To this end, SEFDM transmitter implementations with recon-figurable bandwidth compression ratio are easily derived fromthe presented parallel and multi-stream architectures and can be

Fig. 18. Optimized SEFDM modulators with pruned IFFT and FIFOs in par-allel (squares, , MHz) and multi-stream (circles, ,

MHz) implementations. Scale is the same as Fig. 16 for com-parison.

similarly extended for further bandwidth compression ratios. Inthe case of the parallel architecture of Fig. 8, it is merely nec-essary to adapt the symbol mapping control logic, the complexgains in the postprocessing block and to clock gate unused IFFTblocks. For the multi-stream version of Fig. 9, it is also neces-sary to adapt the clock frequency and FIFO buffer lengths basedon the number of transforms required.Fig. 19 shows area and power dissipation results for parallel

and multi-stream reconfigurable SEFDM transmitters. The re-configurable designs have around 8% greater circuit area thanthe design for and no more than 5% greater power dis-sipation. Compared to a single-mode OFDM modulator, the re-configurable SEFDM transmitter has a factor of four increase incircuit area (for the multi-stream architecture) and around 70%increase is power dissipation (for the parallel architecture). Dueto the clock tree gating in the design, dynamic power dissipa-tion scales well with different values of , and in fact, forhas a comparable power dissipation to the OFDM baseline, al-though with a greater leakage power contribution due to the sig-nificantly increased circuit area. It is also important to considerthat, in context, successfully achieving reduced spectral occu-pancy may well lead to considerable improvements in powerefficiency at the system level if it allows a reduction in trans-mission power at equal throughput.

VI. CONCLUSION

The newly developed SEFDM system is described. After re-viewing the practicality of proposed modulation algorithms forsignal generation, we focus on a recently proposed algorithmwhich employs multiple IFFTs, for reasons of low complexityand general suitability to hardware implementation. We mapthe algorithm to two variants of a VLSI architecture, one withparallel IFFTs and one where we apply the concept of multi-stream FFT to realize the multiple transforms at minimal circuitarea overhead. A number of optimizations due to transforms onsparse input vectors are described to further reduce the numberof arithmetic operations and FIFO sizes using a novel token flow


Fig. 19. Reconfigurable parallel (squares, , MHz) andmulti-stream (circles, , MHz) SEFDM modulators.

approach. The designs are detailed in the context of optimizedASIC implementations for and also con-sidering the requirement for a reconfigurable transmitter.Using a commercial 32-nm cell library, we report area and

simulated power dissipation figures for the proposed architec-tures along with a baseline OFDM transmitter for comparison.Analysis of the multi-stream architecture shows a 60% increasein area, along with power dissipation that scales with up to afactor of four with , as compared to the baseline OFDMimplementation. These results demonstrate that SEFDM trans-mitters can be realistically implemented with a viable increasein circuit area and power dissipation when compared to conven-tional OFDM. The work presented here clearly shows the feasi-bility and limitations of SEFDM signal generation and aims toserve as a first study of implementation in a modern integratedcircuit manufacturing technology.

REFERENCES

[1] M. R. D. Rodrigues and I. Darwazeh, “Fast OFDM:A proposal for dou-bling the data rate of OFDM schemes,” in Proc. Int. Conf. Telecomm.,Jun. 2002, vol. 3, pp. 484–487.

[2] F. Xiong, “M-ary amplitude shift keying OFDM system,” IEEE Trans.Commun., vol. 51, no. 10, pp. 1638–1642, Oct. 2003.

[3] M. R. D. Rodrigues and I. Darwazeh, “A spectrally efficient frequencydivision multiplexing based communications cystem,” in Proc. 8th Int.OFDM Workshop, Hamburg, Germany, 2003, pp. 48–49.

[4] M. Hamamura and S. Tachikawa, “Bandwidth efficiency improvementfor multi-carrier systems,” in Proc. 15th IEEE Int. Symp. Personal,Indoor, Mobile Radio Commun., Sep. 5–8, 2004, vol. 1, pp. 48–52.

[5] W. Jian, Y. Xun, Z. Xi-lin, and D. Li, “The prefix design and per-formance analysis of DFT-based overlapped frequency division mul-tiplexing (OvFDM-DFT) system,” in Proc. 3rd Int. Workshop SignalDesign and Its Applicat. Commun., Sep. 23–27, 2007, pp. 361–364.

[6] F. Rusek and J. B. Anderson, “The two dimensional Mazo limit,” inProc. Int. Symp. Inf. Theory, 2005, pp. 970–974.

[7] F. Rusek and J. B. Anderson, “Multistream faster than Nyquist sig-naling,” IEEE Trans. Commun., vol. 57, no. 5, pp. 1329–1340, May2009.

[8] D. Dasalukunte, F. Rusek, and V. Owall, “Multicarrier faster-than-Nyquist transceivers: Hardware architecture and performance anal-ysis,” IEEE Trans. Circuits and Systems I: Regular Papers, vol. 58,no. 4, pp. 827–838, Apr. 2011.

[9] J. Mazo, “Faster than Nyquist signalling,” Bell Syst. Tech. J., vol. 54,pp. 429–458, Oct. 1975.

[10] I. Kanaras, “Spectrally efficient multicarrier communication systems:Signal detection,mathematical modelling and optimisation,” Ph.D. dis-sertation, Dept. of Electron. and Elect. Eng., Univ. College London,London, U.K., 2010.

[11] R. Grammenos, S. Isam, and I. Darwazeh, “FPGA design of a truncatedSVD based receiver for the detection of SEFDM signals,” in Proc.IEEE 22nd Personal, Indoor, Mobile Radio Commun. Symp., 2011, pp.2085–2090.

[12] I. Kanaras, A. Chorti, M. Rodrigues, and I. Darwazeh, “Spectrally ef-ficient FDM signals: Bandwidth gain at the expense of receiver com-plexity,” in Proc. Int. Conf. Commun., 2009, pp. 1–6.

[13] I. Kanaras, A. Chorti, M. R. D. Rodrigues, and I. Darwazeh, “Afast constrained sphere decoder for ill conditioned communicationsystems,” IEEE Commun. Lett., vol. 14, no. 11, pp. 999–1001, Nov.2010.

[14] A. Chorti, I. Kanaras, M. Rodrigues, and I. Darwazeh, “Joint channelequalization and detection of spectrally efficient FDM signals,” inProc. IEEE 21st Personal, Indoor Mobile Radio Commun. Symp., Sep.2010, pp. 177–182.

[15] S. Isam, I. Kanaras, and I. Darwazeh, “A truncated SVD approach forfixed complexity spectrally efficient FDM receivers,” in Proc. IEEEWireless Commun. Netwo. Conf., 2011, pp. 1584–1589.

[16] S. Isam and I. Darwazeh, “Design and performance assessment of fixedcomplexity spectrally efficient FDM receivers,” in Proc. IEEE 73rdVeh. Technol. Conf., 2011, pp. 1–5.

[17] S. Ahmed and I. Darwazeh, “IDFT based transmitters for spec-trally efficient FDM system,” in Proc. London Commun. Symp.,Sep. 2009 [Online]. Available: http://www.ee.ucl.ac.uk/lcs/pre-vious/LCS2009/index.html

[18] S. Isam and I. Darwazeh, “On the digital design of non-orthogonalspectrally efficient frequency division multiplexed (FDM) signals,” in4th Int. Symp. Broadband Commun. (ISBC’10), Jul. 2010.

[19] S. Isam and I. Darwazeh, “Simple DSP-IDFT techniques for generatingspectrally efficient FDM signals,” in IEEE, IET Int. Symp. Commun.Syst., Netw., Digital Signal Process., Jul. 2010, pp. 20–24.

[20] M. R. Perrett and I. Darwazeh, “Flexible hardware architecture ofSEFDM transmitters with real-time non-orthogonal adjustment,” inProc. 18th Int. Conf. Telecomm., May 2011, pp. 369–374.

[21] P. N. Whatmough, M. R. Perrett, S. Isam, and I. Darwazeh, “VLSIarchitecture for a reconfigurable spectrally efficient FDM basebandtransmitter,” in Proc. IEEE Int. Symp. Circuits Syst., May 2011, pp.1688–1691.

[22] J. Zhao and A. D. Ellis, “A novel optical fast OFDM with reducedchannel spacing equal to half of the symbol rate per carrier,” in Proc.Optical Fiber Commun. (OFC), Collocated National Fiber Optic Eng.Conf. (OFC/NFOEC), 2010, pp. 1–3.

[23] S. K. Ibrahim, J. Zhao, D. Rafique, J. A. O’Dowd, and A. D. Ellis,“Demonstration of world-first experimental optical Fast OFDMsystem at 7.174 Gbit/s and 14.348 Gbit/s,” in Proc. 36th Eur. ConfOpt. Commun. (ECOC) Exhib., 2010, pp. 1–3.

[24] S. Yamamoto, K. Yonenaga, A. Sahara, F. Inuzuka, and A. Takada,“Achievement of subchannel frequency spacing less than symbol rateand improvement of dispersion tolerance in optical OFDM transmis-sion,” IEEE/OSA J. Lightw. Technol., vol. 28, no. 1, pp. 157–163, Jan.12010.

[25] L. Hanzo, W. T. Webb, and T. Keller, Single- and Multi-CarrierQuadrature Amplitude Modulation. New York: IEEE Press–Wiley,2000.

[26] K. Maharatna, E. Grass, and U. Jagdhold, “A 64-point Fourier trans-form chip for high-speed wireless LAN application using OFDM,”IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 484–493, Mar. 2004.

[27] H. V. Sorensen and C. S. Burrus, “Efficient computation of the DFTwith only a subset of input or output points,” IEEE Trans. SignalProcess., vol. 41, no. 3, pp. 1184–1200, Mar. 1993.

[28] S. Ahmed, “Spectrally efficient fdm communication signals and trans-ceivers: Design, mathematical modeling and system optimization,”Ph.D. dissertation, Dept. of Electron. and Elect. Eng., Univ. CollegeLondon, London, U.K., 2011.

[29] S. Isam and I. Darwazeh, “Peak to average power ratio reduction inspectrally efficient FDM systems,” in Proc. 18th Int. Conf. Telecomm.(ICT), May 2011, pp. 363–368.

[30] S. Isam and I. Darwazeh, “Precoded spectrally efficient FDM system,”in Proc. IEEE 21st Int. Symp. Personal Indoor and Mobile RadioCommun. (PIMRC), Sep. 2010, pp. 99–104.


[31] P. O’Brien, S. McGrath, and C. J. Burkley, “The generation of band-width efficient modulation schemes using direct digital synthesis,” inProc. IEEE Int. Symp. Personal, Indoor Mobile Radio Commun., 1992,pp. 393–396.

[32] R. G. Alves, P. L. Osorio, and M. N. S. Swamy, “General FFT pruningalgorithm,” in Proc. 43rd IEEE Midwest Symp. Circuits Syst., 2000,vol. 3, pp. 1192–1195.

[33] S. He andM. Torkelson, “Designing pipeline FFT processor for OFDM(de)modulation,” in Proc. Int. Symp. Signal, Syst., Electron., 1998, pp.257–262.

[34] Y. Lin and C. Lee, “Design of an FFT/IFFT processor for MIMOOFDM systems,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 54,no. 4, pp. 807–815, Apr. 2007.

[35] J. E. Volder, “The CORDIC trigonometric computing technique,” IRETrans. Electron. Comput., pp. 330–334, Sep. 1959.

[36] M. Seok, D. Jeon, C. Chakrabarti, D. Blaauw, and D. Sylvester, “A 0.27V 30 MHz 17.7 nJ/transform 1024-pt complex FFT core with super-pipelining,” in Proc. IEEE Int. Solid-State Circuits Conf., 2011, pp.342–344.

[37] P. Hsieh, J. Jhuang, P. Tsai, and T. Chiueh, “A low-power delay bufferusing gated driver tree,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 17, no. 9, pp. 1212–1219, Sep. 2009.

[38] M. Perrett, R. Grammenos, and I. Darwazeh, “Verification method-ology for the detection of spectrally efficient FDM signals generatedusing reconfigurable hardware,” IEEE Int. Conf. Commun., 2012, ac-cepted for publication.

Paul N. Whatmough (S’09) received the B.Eng. de-gree (first class honors) in electronic communicationsengineering from the University of Lancaster, Lan-caster, U.K., in 2003 and the M.Sc. degree (with dis-tinction) in communications systems and signal pro-cessing from the University of Bristol, Bristol, U.K,in 2004. He is currently working toward the doctoratedegree at the University College London, London,U.K.From 2005 to 2008, he held the position of

Research Scientist at Philips Research Labs, Redhill,U.K. (which became NXP Semiconductors Research in 2006). In 2008, hejoined the R&D Department at ARM Ltd., Cambridge, U.K. while workingtowards the industrial doctorate degree at University College London. His cur-rent research interests are in low-power circuits, algorithms and architecturesrelating to wireless, DSP, and embedded computing.Mr. Whatmough is a member of the IET. He was the recipient of the IET

Student Project Award in 2003, the IEEE Communications Chapter Award in2004, and the European Wireless Technology Conference (EuWiT) Young En-gineering Prize in 2008.

Marcus R. Perrett (M’08) received the B.Sc. degreein electronics from Anglia Ruskin University, Cam-bridge, U.K., in 2003. He is currently working to-wards the doctorate degree in telecommunications atUniversity College London, London, U.K.From 2001 to 2009, he worked for Alps Electric

UK Ltd., Milton Keynes, U.K. in the roles of Hard-ware Engineer, Senior Engineer, and InnovationsManager and now works as FPGA DevelopmentManager for Fixnetics Ltd., London, U.K. Hiscurrent research interests are the hardware encapsu-

lation of algorithms relating to wireless communications.

Safa Isam (S’06) received the B.Sc. degree (firstclass honors) in electrical and electronic engineeringfrom the University of Khartoum, Khartoum, Sudan,in 2003 and the M.Sc. degree (with distinction) intechnologies for broadband communications fromthe University College London (UCL), London,U.K, in 2007. She is currently working towards thePhD degree at UCL, researching spectrally efficientmulticarrier systems.From 2004 to 2008, she held many positions at the

National Electricity Corporation, Sudan. Her currentresearch interests are related to wireless communications, spectral efficiency,and systems optimization.Mrs. Isam is a member of the IET. She was the recipient of the British

Chevening Scholarship in 2006, UCL EE MSc course prize in 2007, andOverseas Research Student Award and UCL Graduate School Scholarship for2008–2011.

Izzat Darwazeh (SM’03) received the M.Sc. andPh.D. degrees in electronic engineering from theUniversity of Manchester Institute of Science andTechnology (UMIST), Manchester, U.K., in 1986and 1991, respectively.He holds the Chair of Communications En-

gineering and is head of the Communicationsand Information Systems Group, Department ofElectronic and Electrical Engineering, Univer-sity College London, London, U.K. His researchinterests are in the areas of high-speed optical

communication systems and networks, microwave circuits, and MMICs foroptical fiber applications and in mobile communication circuits and systems.He has authored/coauthored more than 150 papers in the areas of optical andcellular communications and MMICs and high-speed/frequency circuits. He isthe coeditor of the 1995 IEE book Analogue Optical Fibre Communications(Institution of Engineering and Technology, 1995) and the coauthor of a bookon Linear Circuit Analysis and Modelling (Elsevier, 2005).Prof. Darwazeh is a Chartered Engineer and a fellow of the IET.

06152180

Documents

direct digital

bandwidth

simulated

current research

london sw7

spectrally

rst class

mobile radio