introduction to the fft algorithm and its hardware ... · fft requirements introduction algorithm...
TRANSCRIPT
Mario Garrido © 2010 1
LINKLINKÖÖPING UNIVERSITYPING UNIVERSITYDepartmentDepartment ofof ElectricalElectrical
EngineeringEngineering
Linköping, 28.09.2010
INTRODUCTION TO THE FFT ALGORITHM INTRODUCTION TO THE FFT ALGORITHM AND ITS HARDWARE ARCHITECTURESAND ITS HARDWARE ARCHITECTURES
Mario Garrido GMario Garrido Gáálvezlvez
DivisionDivision ofof ElectronicsElectronics Systems (ES)Systems (ES)
Mario Garrido © 2010 2
INDEXINDEX
INTRODUCTIONINTRODUCTION
DFT VS FFTDFT VS FFT
RADIXRADIX
DIF AND DITDIF AND DIT
PIPELINED ARCHITECTURESPIPELINED ARCHITECTURES
ININ--PLACE ARCHITECTURESPLACE ARCHITECTURES
HARDWARE IMPLEMENTATIONHARDWARE IMPLEMENTATION
Mario Garrido © 2010 3
INTRODUCTIONINTRODUCTION
IntroductionIntroduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Discrete Fourier Transform (DFT):
-Discrete version of the Fourier transform for digital systems.
- It obtains the spectrum of the input signal.
- Widely used algorithm for signal processing andsignal analysis:
- Audio and Image Processing.
- Digital Receivers.
- Medical applications: EEG, ECG.
- ADSL.
Fast Fourier Transform (FFT): efficient algorithms forthe computation of the DFT. Most common: Cooley-Tukey
Mario Garrido © 2010 4
DFTDFT
A filter for each of the frequencies
Order of operations:
∑−
=
−=
1
0
2
][][N
n
knN
jenxkX
π
1,...,0 −= Nk
O( N2 )
Introduction
Algorithm
DFT DFT vsvs FFTFFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Mario Garrido © 2010 5
FFT (DIT)FFT (DIT)
∑−
=
−=
1
0
2
][][N
n
knN
jenxkX
π
∑∑−
=
−−−
=
−++=
12/
0
2/2212/
0
2/2
]12[·]2[][N
i
ikN
jkN
jN
i
ikN
jeixeeixkX
πππ
1,...,0 −= Nk
12/,...,0 −= Nr
O( N · logN )
Decimation in time (DIT):
Twiddlefactors
Introduction
Algorithm
DFT DFT vsvs FFTFFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Mario Garrido © 2010 6
FFT (DIF)FFT (DIF)
∑−
=
−=
1
0
2
][][N
n
knN
jenxkX
π
∑−
=
−++=
12/
0
2/2
])2/[][(]2[N
n
rnN
jeNnxnxrX
π
∑−
=
−−+−=+
12/
0
2/22
])2/[][(]12[N
n
rnN
jnN
jeeNnxnxrX
ππ
1,...,0 −= Nk
12/,...,0 −= NrO( N · logN )
Twiddlefactors
Decimation in frequency (DIF):
Introduction
Algorithm
DFT DFT vsvs FFTFFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Mario Garrido © 2010 7
BUTTERFLIES & RADIX BUTTERFLIES & RADIX
Butterfly: basic element used for the FFT computation.
Radix-2 butterfly => 2-point FFTIntroduction
Algorithm
DFT vs FFT
RadixRadix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Radix-4 butterfly => 4-point FFT
x[1]
x[0]
-1 X[1]
X[0]
X[1]
X[3]
X[0]
X[2]
-1
-1
-1
-1-jπ/2ex[3]
x[2]
x[1]
x[0]
Mario Garrido © 2010 8
FFT FLOW GRAPHFFT FLOW GRAPH
4 0 3
7
6
5
4
0
0 7
15
11
2
3
1
0
0
4
0
0
0
0
0
0
4
6 4
0
2
0 0
0
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
4
6
2
0
0
0
0
0
x[n]
n
X[k]
k
logrNstages
butterflies
(radix-2)
rotations
φπN
je
2−
N points
Mario Garrido © 2010 9
RADIXRADIX--4 FFT GRAPH4 FFT GRAPH
12
14
15
13
10
11
8
9
7
6
5
4
3
2
1
0
0
3
2
1
0
0
0
0
9
6
3
0
6
4
2
0
0
4
8
12
5
9
1
13
15
11
7
3
14
10
6
2
STAGE 1 STAGE 2
Mario Garrido © 2010 10
DIF DIF andand DIT DIT FFTsFFTs
4 0 3
7
6
5
4
0
0 7
15
11
2
3
1
0
0
4
0
0
0
0
0
0
4
6 4
0
2
0 0
0
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
4
6
2
0
0
0
0
0
4 0 3
4
4
4
7
3
0 7
15
11
0
0
0
0
0
5
1
0
0
0
0
0
4
4 6
0
0
0 0
2
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
6
6
0
0
2
2
0
0
DIF DIT
Mario Garrido © 2010 11
FFT ALGORITHMFFT ALGORITHM
Number of points: N
Number of stages: n = logrN
Each stage consists of butteflies and rotations.
Decomposition:
- determines the algorithm.
- most common decomposition: DIF and DIT.
Radix:
- determines the size of the butterfly.
- determines the algorithm.
Mario Garrido © 2010 12
FFT ARCHITECTURES FFT ARCHITECTURES Butterflies -> adders.
Rotations -> rotators.
In hardware, a butterfly/rotator can calculate anynumber of butterflies/rotations of the algorithm.
Typical approaches:- Direct implementation: each butterfly/rotation of thealgorithm is calculated by a single butterfly/rotator in hardware.
- Pipelined architectures: all the butterflies of the samestage of the flow graph are calculated by the same butterflyin hardware.
- In-place or iterative architectures: all the butterflies androtations of the flow graph are calculated by the sameprocessing element.
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
ArchitecturesArchitectures
Pipelined
Iterative
HW Design
Mario Garrido © 2010 13
PIPELINED ARCHITECTURESPIPELINED ARCHITECTURES
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
PipelinedPipelined
Iterative
HW Design
4 0 3
7
6
5
4
0
0 7
15
11
2
3
1
0
0
4
0
0
0
0
0
0
4
6 4
0
2
0 0
0
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
4
6
2
0
0
0
0
0
Process a continuous flowof data.
Suitable for high-throughputapplications.
Types:
- feedback (FF ó SDF) .
- feedforward (FF óMDC).
Each stage of thearchitecture calculates a whole stage of the algorithm.
ST2 ST3 ST4ST1
Mario Garrido © 2010 14
FEEDBACK ARCHITECTURESFEEDBACK ARCHITECTURES
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
PipelinedPipelined
Iterative
HW Design
8
R2 R2R2R2
4 2 1
x[i]
R2 R2 R2 R2
8 4 2 1
x[i]
R4
3 x 4 3 x 1
R4x[i]
Feedback loops.
Throughput =
1 sample/cycle.
Latency = N cycles.
AREAAREA
Total memory = N.
Radix-2: less adders.
Radis-4: less rotators.
Radix-22: less addersand rotators (includetrivial ones).
FB RADIXFB RADIX--22
FB RADIXFB RADIX--44
FB RADIXFB RADIX--2222
Mario Garrido © 2010 15
FEEDFORWARD ARCHITECTURESFEEDFORWARD ARCHITECTURES
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
PipelinedPipelined
Iterative
HW Design
Processed data pass to the next stage.
High performance:-Throughput > 1sample/cycle.
- Latency = N/r (without input reordering).
Larger area.
FF Radix-2 (2-parallel):
FF Radix-4 (4-parallel):
4S
WIT
CH
4
R22
2
1
1
R2 R2 R2
SW
ITC
H
SW
ITC
H
3
SW
ITC
H
3
R42
1 2
1R4
Mario Garrido © 2010 16
ININ--PLACE ARCHITECTURESPLACE ARCHITECTURES
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
IterativeIterative
HW Design
All the stages of the FFT are calculated iteratively withthe same processing element.
In general:
- lower area.
- lower throughput.
- not very suitable for processing a continuous flow.
MEMORY0
1MEMORY
0
1
RADIXRADIX--44RADIXRADIX--22
Mario Garrido © 2010 17
FFT REQUIREMENTSFFT REQUIREMENTS
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW HW DesignDesign
Number of points of the FFT (N).
Clock frequency and number of samples per clock cycle.
Available area.
Input and output wordlength + accuracy.
Maximum latency of the FFT.
How samples arrive and how they must be provided: - Continuous flow or there is a certain time betweenconsecutive transforms.
- Order of the samples.
- Number of samples in parallel.
Mario Garrido © 2010 18
CHOOSING THE ARCHITECTURECHOOSING THE ARCHITECTURE
FFT ALGORITHM
- Few hardware.
- Lower performance.
- Iterative process using a memory -> More similar toa digital signal processor.
- Larger area.
- Higher performance.
- Continuous flow of data.
- Feedback and feedforward.
8
R2 R2R2R2
4 2 1
x[i]
ININ--PLACEPLACE PIPELINEDPIPELINED
MEMORY0
1
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW HW DesignDesign
Mario Garrido © 2010 19
DESIGNING AN INDESIGNING AN IN--PLACE FFTPLACE FFT
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW HW DesignDesign
4 0 3
7
6
5
4
0
0 7
15
11
2
3
1
0
0
4
0
0
0
0
0
0
4
6 4
0
2
0 0
0
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
4
6
2
0
0
0
0
0
MEMORY0
1
DIFFICULTIESDIFFICULTIESHow to read and store
samples in the memory.
How to manage therotation memory.
How to manage sequencesof data that arrivecontinously.
Store processed data in theaddress that has been red
Calculate the index and getthe value from the memory.
Use double buffering.
SUGGESTIONSSUGGESTIONS
Mario Garrido © 2010 20
DESIGNING A PIPELINED FFTDESIGNING A PIPELINED FFT
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW HW DesignDesign
DIFFICULTIESDIFFICULTIES
Each stage has buffers ofdifferent length and rotationsare also different.
General control.
Wordlength and overflow.
Program a stage without rotation memory and with a variable-size buffer by using “generic”.
Replicate it with “generate”. Then, add the rotation memories.
Simple control based on a counter.
Either parameterize the wordlength of the stage or truncate.
SUGGESTIONSSUGGESTIONS
8
R2 R2R2R2
4 2 1
x[i]
0
1 L
R2
0
1
Mario Garrido © 2010 21
THANK YOUTHANK YOU
Introduction
Algorithm
DFT vs FFT
Radix
DIF and DIT
Architectures
Pipelined
Iterative
HW Design
Thank you.