fft implementation on fpga

Chapter 1

INTRODUCTION

The Discrete Fourier Transform (DFT) plays an important role in the analyses,

design and implementation of the discrete-time signal- processing algorithms

and systems it is used to convert the samples in time domain to frequency

domain. The Fast Fourier Transform (FFT) is simply a fast (computationally

efficient) way to calculate the Discrete Fourier Transform (DFT). The wide

usage of DFT’s in Digital Signal Processing applications is the motivation to

Implement FFT’s. Almost every branch of engineering and science uses

Fourier methods. The words "frequency," "period," "phase," and "spectrum" are

important parts of an engineer's vocabulary.

The Discrete Fourier transform is used to produce frequency analysis

of discrete non periodic signals. The FFT is another method of achieving the

same result, but with less overhead involved in the calculations. Transforms

basically convert a function from one domain to another with no loss of

information. Fourier Transform converts a function from the time (or spatial)

domain to the frequency domain. DFT is identical to samples of the Fourier

transform at equally spaced frequencies. Consequently, computation of the N-

point DFT corresponds to the computation of N samples of the Fourier

transform at N equally spaced frequencies ωk = 2πk/N. Considering input x[n]

to be complex, N complex multiplications and (N-1) complex additions are

required to compute each value of the DFT, if computed directly from the

formula given as

To compute all N values therefore requires a total N2 complex

multiplications and N (N-1) complex additions. Each complex multiplication

requires four real multiplications and two real additions and each complex

addition requires two real additions. Therefore a total of 4N2 real multiplications

1

and N(4N-2) real additions are required. Besides these multiplications and

additions there should be provision for storing N complex input sequences and

also to store N output values. Contrary to this by using Decimation in Time FFT

radix-2 algorithm the number of complex multiplications and additions will be

reduced to (N/2) log 2 N and Nlog 2 N to compute the DFT of a given complex

x[n]. A large number of FFT algorithms have been developed over the years,

notably the Radix-2, Radix-4, Split- Radix, Fast Hartley Transform (FHT),

Quick Fourier Transform (QFT), and the Decimation-in-Time-Frequency

(DITF), algorithms.Among these Radix-2,Radix-4,FHT are of prime concern in

this project.In RAD-2 algorithm the DFT computation is initially split into two

summations, one of which involves the sum over the first data points and the

other over the next data points where as a four-way split is used in RAD-4. For

the DHT, the kernel is real unlike the complex exponential kernel of the DFT.

The objectives of this project involve

Familiarisation of the various FFT algorithms and a comparative study of

the above on the basis of effective speed and area involved.

Familiarisation of FPGA and various steps involved in the

implementation of a given algorithm.

2

Chapter 2

FAST FOURIER TRANSFORM

The first major breakthrough in implementation of Fast Fourier Transform

(FFT), algorithms was the Cooley-Tukey algorithm developed in the mid-

1960s, which reduced the complexity of a Discrete Fourier Transform from (N2)

to (N·logN), At that time, this was a substantial saving for even the simplest of

applications. Since then, a large number of FFT algorithms have been

developed. The Cooley-Tukey algorithm became known as the Radix- 2

algorithm and was shortly followed by the Radix-3, Radix-4, and Mixed Radix

algorithms. Further research led to the Fast Hartley Transform (FHT) and the

Split Radix (SRFFT) algorithms. Recently, two new algorithms have emerged:

the Quick Fourier Transform (QFT), and the Decimation-In-Time-Frequency

(DITF), algorithm. In this project we provide a comparison of 3 contemporary

FFT algorithms. The criteria used are the operations count, memory usage and

computation time. We chose the following algorithms for our analysis: Radix- 2

(RAD2), Radix-4 (RAD4), FHT.

The relationship between finite sequence {x(n)} in time domain and its

representation {X(k)} in frequency domain is given by the following Discrete

Fourier Transform.

X (k )=DFT ( x (n ))=∑n=0

N −1

x (n)W Nnk , 0<k<N-1

X (n )=IDFT ( x (k ) )= 1N∑k =0

N−1

x ( k )W N−nk , 0<n<N-1

W Nnk=e

− j 2πnkN

When computing DFT, we can find some ratios are computed repeatedly. In

addition, W Nnkowns following characteristics.

3

W Nnk=W N

(n+N )k=W Nn(k+N )

W N−nk=(W N

nk)¿=W Nn ( N−k )

W Nn =W N

n

,W N=W nNn

W N

N2 =−1 , W N

(k+ N2

)=−W N

k

2.1 Review of FFT algorithms: The basic principle behind most Radix based

FFT algorithms is to exploit the symmetry properties of a complex exponential

that is the cornerstone of the Discrete Fourier Transform (DFT), These

algorithms divide the problem into similar sub-problems (butterfly

computations), and achieve a reduction in computational complexity. All Radix

algorithms are similar in structure differing only in the core computation of the

butterflies. The FHT differs from the other algorithms in that it uses a real

kernel, as opposed to the complex exponential kernel used by the Radix

algorithms.

2.2 Types of FFTs

2.2.1 Radix-2 Decimation in Frequency Algorithm: The RAD2 DIF

algorithm is obtained by using the divide-and conquer approach to the DFT

problem. The DFT computation is initially split into two summations, one of

which involves the sum over the first data points and the other over the next

data points, resulting in

X (k )=∑n=0

N2

−1

x(n)W Nnk+∑

n= N2

N−1

x (n)W Nnk

The above equation can be simplified to

X (k )=∑n=0

N2

−1

{x (n )+(−1 )k . x(n+N2 )}W N

nk

¿¿

Considering the even and odd-numbered samples separately in equations

4

X (2 k )=∑n=0

N2

−1

{x (n)+x (n+N2 )}W N

2

nk

¿¿

X (2 k+1 )=∑n=0

N2

−1 {x ( n )−x (n+ N2 ) . W N

2

nk }W N2

nk

¿

¿

The same computational procedure can be repeated through decimation of the

N/2- point DFTs X(2k), and X(2k+1), The entire process involves v=log 2 N

stages with each stage involving N/2 butterflies. Thus the RAD2 algorithm

involves N/2·log 2 N complex multiplications and N·log 2 N complex additions.

Observe that the output of the whole process is out-of-order and requires a bit

reversal operation to place the frequency samples in the correct order.

2.2.2 Radix-4 Algorithm: The RAD4 algorithm is very similar to the RAD2

algorithm in concept. Instead of dividing the DFT computation into halves as in

RAD2, a four-way split is used. The N- point input sequence is split into four

subsequences, x(4n),, x(4n+1), , x(4n+2), and x(4n+3), , where n=0,1,…N/4-1.

X (k )=∑n=0

N4

−1

x(n)W Nnk+∑

n= N4

N2

−1

x (n)W Nnk+ ∑

n=N2

3 N4

−1

x (n)W Nnk+ ∑

n=3N4

N2

−1

x (n)W Nnk

setting

F(l,q)=∑m=0

N4

−1

x (l , m)W N4

mq

X(p,q)=X(N4

. p+q¿,

And

5

X(l,m)=X(4m+1) , where

l,p=0,1,2,3 and m,q=0,1….N4

−1

The decimation process is similar to the RAD2 algorithm, and uses v=log4N

stages, where each stage has N/4 butterflies. TheRAD4 butterfly involves 8

complex additions and 3 complex multiplications, or a total of 34 floating point

operations. Thus, the total number of floating point operations involved in the

RAD4 computation of an N-point DFT is 4.25log2N, which is 15% less than the

corresponding value for the RAD2 algorithm.

2.2.3 Fast Hartley Transform: The main difference between the DFT

computations previously discussed and the Discrete Hartley Transform (DHT),

is the core kernel . For the DHT, the kernel is real unlike the complex

exponential kernel of the DFT. The DHT coefficient is expressed in terms of the

input data points as

X (k )=∑n=0

N −1

x (n ){cos¿¿

This results in the replacement of complex multiplications in a DFT by real

multiplications in a DHT. For complex data, each complex multiplication in the

summation requires four real

multiplications and two real additions using the DFT. For the DHT, this

computation involves only two real multiplications and one real addition. There

exists an inexpensive mapping of coefficients from the Hartley domain to the

Fourier domain, which is required to convert the output of a DHT to the

traditional DFT coefficients.Following equation , relates the DFT coefficients to

the DHT coefficients for an N-point DFT computation.

6

Re(DFT(k))=DHT (k )+DHT ( N−k )

2

Im(DFT(k))=DHT (k )−DHT ( N−k )

2

Chapter 3

FPGA IMPLEMENTATION OF FFTs

3.1 Basic Concept of FPGA : Field Programmable Gate Arrays (FPGAs) are

one of the fastest growing segments of the semiconductor industry. They were

first introduced in 1985, and since then they have quickly gained widespread

acceptance as an excellent technology for implementing moderately large

7

digital circuits in low production volumes. FPGAs are programmable devices

that can be directly configured by the end user without the use of an integrated

circuit fabrication facility. They offer the designer the benefits of custom

hardware, eliminating high development costs and manufacturing time. Figure

3.1 shows a conceptual diagram of a typical FPGA .

Field Programmable Gate Arrays are called this because rather than having

a structure similar to a PAL or other programmable device, they are structured

very much like a gate array ASIC (Application Specific Integrated Circuit) .

The first programmable device was the programmable array logic (PAL). One

of the PAL devices is PLD. Programmable Logic Devices (PLDs) are

programmable devices that can be configured for a wide variety of applications.

They enable faster implementation and emulation of circuit designs on

hardware. The flexibility provided by these devices through the presence of

reconfigurable elements has increased their popularity .There are two major

types of PLDs: Field Programmable Gate Arrays (FPGAs) and Complex

Programmable Logic Devices (CPLDs). Among the various possible FPGA

architectures, lookup-table (LUT) based FPGA architectures have been the most

popular ones. A LUT-based FPGA consists of an array of programmable logic

blocks (PLBs) together with programmable interconnections.The maximum

numbers of gates in an FPGA are as high as 500,0000 .

8

Fig.3.1 Conceptual block diagram of FPGA

3.2 Field Programmable Devices FPDs, or Field Programmable Devices, is a

general term for all devices that can programmed (and possibly reprogrammed)

after fabrication. Several standard approaches to programmability are used in

the industry in the form of PROMs, PLAs, PALs, CPLDs, and FPGAs. These

approaches vary significantly in their complexity (and subsequently their cost)

and the applications for which they are best suited. Amongst the largest and

fastest growing FPDs are Field -Programmable Gate Arrays (FPGAs). Although

there are many types of FPGAs, all architectures include logic blocks, I/O

blocks, and programmable routing, which are arranged in a regular pattern.

FPGA provide narrow logic resources; in other words, their logic blocks are

generally small and uncommitted. One advantage of an FPGA over other types

of FPDs is that they generally have much higher logic capacities than other

FPDs and offer a higher ratio of flip- flops to logic. A higher ratio of flip-flops

to logic is important because flip-flops are often the limiting factor in designs.

FPGAs are the most common form of FPD offered by programmable logic

vendors. One such vendor, Xilinx, offers several different "families" of FPGAs 9

that target different design sizes, design speeds, and cost requirements. Some of

the more popular devices include the XC4000, the Spartan series, and the Virtex

II series. Connection blocks facilitate connectivity between logic block pins

and the routing channels. Each input pin can be programmed to connect to one

or more of the tracks in a channel using either a multiplexer or multiple

transistors (see Figure 3.2). Output pins, on the other hand, are connected to

tracks using tri-state buffers. The number of tracks that a pin can connect to is

called its connection block flexibility. Switch blocks reside at the intersections

of horizontal and vertical routing channels. They provide programmable

switches used to connect tracks from both the vertical and horizontal channels

incident to the switch. The number of outgoing tracks that each ingoing track

can connect to is called its switch block flexibility. An FPGA generally consists

of a two-dimensional array of logic blocks that can be connected by general

interconnection resources.

Fig 3.2 Internal Structure of Control Logic Block

10

The interconnect comprises segments of wire, where the segments may be of

various lengths. The interconnect resources include programmable switches that

serve to connect the logic blocks to one another or one wire segment to another.

Logic circuits are implemented in the FPGA by partitioning the logic into

individual logic blocks and then interconnecting the blocks as required via the

switches. The structure and content of a logic block are called its architecture.

There are different kinds of logic block architecture available, and logic blocks

can be built using look-up tables (Xilinx), multiplexers (Actel) or even PALs

(Altera).The structure and content of the interconnect resources in an FPGA is

called its Routing architecture. The routing architecture consists of wire

segments and programmable switches. There exists many different ways to

design the structure of a routing architecture, some FPGAs offer simple

connection between blocks, and others provide fewer, but more complex routes.

Each manufacturer has a distinct name for their basic block, but the

fundamental unit is the LUT. Altera call theirs a Logic Element (LE) while

Xilinx’s FPGAs have configurable logic blocks (CLBs) organized in an array.

3.3 Basic Building Blocks :Xilinx user-programmable gate arrays include two

major configurable elements: configurable logic blocks (CLBs) and input/output

blocks (IOBs).

• CLBs provide the functional elements for constructing the user’s logic.

• IOBs provide the interface between the package pins and internal signal lines.

Programmable interconnect resources provide routing paths to connect the

inputs and outputs of these configurable elements to the appropriate networks.

Customized configuration is established by programming internal static memory

cells that determine the logic functions and internal connections implemented in

the FPGA. Configurable Logic Blocks implement most of the logic in an FPGA.

11

3.4 Steps involved in the implementation of VHDL code on FPGA:

Computer-aided design (CAD) is a very important aspect of FPGA technology.

It allows users to convert a circuit description represented in a hardware

description language (HDL) or as a schematic to a bit stream that can be

uploaded to an FPGA for programming.

Examples of CAD software for FPGAs are Xilinx Alliance and Foundation,

Altera Quartus II and Max+Plus II, and Actel Libero. Implementing a logic

design with an FPGA usually consists of the following steps (depicted in the

figure 4.2, which follows):

1. First of all enter a description of your logic circuit using a hardware

description language (HDL) such as VHDL or Verilog. One can also draw the

design using a schematic editor.

2. Then use a logic synthesizer program to transform the HDL or schematic into

a netlist. The netlist is just a description of the various logic gates in your design

and how they are interconnected.

3. Implementation tools are used to map the logic gates and interconnections

into the FPGA. The FPGA consists of many configurable logic blocks which

can be further decomposed into look-up tables that perform logic operations.

The mapping tool collects the netlist gates into groups that fit into the LUTs and

then the place & route tool assigns the gate collections to specific CLBs while

opening or closing the switches in the routing matrices to connect the gates

together.

4. Once the implementation phase is complete, a program extracts the state of

the switches in the routing matrices and generates a bitstream where the ones

and zeroes correspond to open or closed switches. The bitstream is downloaded

into a physical FPGA chip. The electronic switches in the FPGA open or close

in response to the binary bits in the bitstream. Upon completion of the

downloading, the FPGA will perform the operations specified by the HDL code

12

or schematic .The following figure (Fig(3.4)) shows the design flow of the

FPGA.

Fig.3.4 Design Flow Of FPGA

13

Chapter 4

IMPLEMENTATION TOOLS

4.1 XILINX ISE

Xilinx ISE (Integrated Software Environment) is a software tool produced

by Xilinx for synthesis and analysis of HDL designs, enabling the developer

to synthesize ("compile") their designs, perform timing analysis, examine RTL

diagrams, simulate a design's reaction to different stimuli, and configure the

target device with the programmer. The Xilinx ISE consists of a set of programs

to create (capture), simulate and implement digital designs in a FPGA or CPLD

target device. All the tools use a graphical user interface (GUI) that allows all

programs to be executed from toolbars, menus or icons.

Fig.4.1 Simulation in Xilinx ISE

14

4.2 SPARTAN- 6 FPGA:

The Xilinx, Spartan®-6 FPGA family delivers an optimal balance of low risk,

low cost, low power, and performance for cost-sensitive applications. These

FPGAs use a proven low-power 45nm process technology. Also, the Spartan-6

series offers an advanced power management technology, up to 150k logic

cells, integrated PCI Express® blocks, advanced memory support, 250 MHz

DSP slices, and 3.2 Gbps low-power transceivers

Fig 4.2 SPARTAN 6 Development Board

15

Chapter 5

CONCLUSION

By this project we try to implement a fixed point fft on an FPGA in

collaboration with NeST Technologies. The purpose of this project is to develop

an understanding of the underlying principles in FFT implementation with the

FPGA technique as an alternate approach to the general or special purpose

microprocessors. Implementation of fixed point FFT algorithm using FPGA

technique seems to be easy and simple compared with other techniques. Also it

is possible to implement another applications or algorithms using this approach

in the field of single or image processing. Communications systems, and

electronic circuit design… etc.

16

REFERENCES

[1] S. G. Johnson and M. Frigo, “A modified split-radix FFT with fewer arithmetic operations”, IEEE Transactions on Signal Processing, 2007, pp. 111-119

[2] Bracewell, R. N.” The Fourier Transform and its applications”, the McGraw- hill Companies, Inc, 2000, ISBN: 0-07-303938-1.

[3] Cartwright, M,” ‘Fourier Methods for mathematicians, scientists and engineers” , Ellis Hoi- wood Limtied, 1990, ISBN: 0-13-327016.

[4] Bob Zeidman,” Introduction to CPLD & FPGA Design”, www.chalknet.com.

[5] Andraka Consulting Group Inc., “FPGA Basics”, article at www.andraka.com, 2002.

[6] Xilinx Inc.: ‘Xilinx Virtex- E Data book’, http:// www. Xilinx.com, 2000-2001, (Ace 2001-02-05).

17

fft implementation on fpga

Documents