mapping the fft algorithm to the ibm cell processor
DESCRIPTION
Mapping the FFT Algorithm to the IBM Cell Processor. Andy Polidore Advisors: Brendan Burns, Joseph Czechowski. Motivation. MRI Imaging Fast Fourier Transformations Efficient algorithm for computing a Discrete Fourier Transform DFT converts time-domain to frequency-domain - PowerPoint PPT PresentationTRANSCRIPT
Mapping the FFT Mapping the FFT Algorithm to the IBM Cell Algorithm to the IBM Cell ProcessorProcessor
Andy PolidoreAdvisors: Brendan Burns, Joseph Czechowski
MotivationMotivationMRI ImagingFast Fourier Transformations
◦Efficient algorithm for computing a Discrete Fourier Transform
◦DFT converts time-domain to frequency-domain
2D FFT: Perform a 1D FFT on each row of an image and then perform a 1D FFT on each resulting column
The Cell◦ Nine cores◦ 1 Power Processing Unit (PPU)◦ 8 Synergistic Processing Units (SPU)
StrategyStrategyCell comes with 2d routine
◦Needs to be called twice◦First call organizes the data in
contiguous column form Striping
Limited SPU memory◦Quad Buffering
PPU SPU 0Input Buffer
Output Buffer
FFT out
Input
DMA In
FFT
DMA Out
PPU SPU 0Input Buffer
Output Buffer
Input
FFT out
FFT
DMA In
DMA Out
PPU
Input Buffer
Output Buffer
DMA InInput
FFT out
FFT
SPU 7SPU 1
SPU 0
PPUSPU 2Input Buffer
Output Buffer
FFT out
Input
DMA In
FFT
DMA Out
PPUInput Buffer
Output Buffer
Input
FFT out
FFT
DMA In
DMA Out
Sync Point
SPU 1
SPU 0
SPU 2
SPU 1
SPU 0
Quad bufferingQuad bufferingWhy it is required?
◦Space problems◦Maximizing processing power
Buffers◦IN to handle incoming data◦FFTin and FFTout to process the data◦OUT stores the data ready to be
DMA’ed back to main memory
BufferingBuffering
665544332211
------------------------------------------FILLFILL00DDCCBBAA
BufferingBuffering
6655443322
--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11
------------------------------------------FILLFILL00DDCCBBAA
BufferingBuffering
66554433
FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT
FFTFFTININ
FFTFFTOUTOUT
11------------------------------------------FILLFILL00DDCCBBAA
BufferingBuffering
665544
FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT
FFTFFTININ
FFTFFTOUTOUT
11------------------------------------------FILLFILL00DDCCBBAA
BufferingBuffering
FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT66OUTOUTFILLFILLFFTFFTOUTOUTFFTFFTININ55
FFTFFTOUTOUTOUTOUTFFTFFTININFILLFILL44
FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT22--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11------------------------------------------FILLFILL00DDCCBBAA
StripingStripingMain Memory
SPU 0
SPU 1
SPU 2
SPU 3
SPU 4
SPU 5
SPU 6
SPU 7
ChallengesChallengesSimulator
◦Testing is slow◦Alignment◦Compiler
C coding◦Working with bytes
Parallel processing◦Data movement◦Debugging
Knowledge GainedKnowledge GainedMastering LinuxC make files, linking, etcData movement strategiesMulti-core processingDebugging!
Results and ConclusionsResults and ConclusionsSuccess?Future Work
◦Arbitrary size input
Questions?Questions?