mapping the fft algorithm to the ibm cell processor

17
Mapping the FFT Mapping the FFT Algorithm to the IBM Algorithm to the IBM Cell Processor Cell Processor Andy Polidore Advisors: Brendan Burns, Joseph Czechowski

Upload: darrel-buckley

Post on 30-Dec-2015

43 views

Category:

Documents


1 download

DESCRIPTION

Mapping the FFT Algorithm to the IBM Cell Processor. Andy Polidore Advisors: Brendan Burns, Joseph Czechowski. Motivation. MRI Imaging Fast Fourier Transformations Efficient algorithm for computing a Discrete Fourier Transform DFT converts time-domain to frequency-domain - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mapping the FFT Algorithm to the IBM Cell Processor

Mapping the FFT Mapping the FFT Algorithm to the IBM Cell Algorithm to the IBM Cell ProcessorProcessor

Andy PolidoreAdvisors: Brendan Burns, Joseph Czechowski

Page 2: Mapping the FFT Algorithm to the IBM Cell Processor

MotivationMotivationMRI ImagingFast Fourier Transformations

◦Efficient algorithm for computing a Discrete Fourier Transform

◦DFT converts time-domain to frequency-domain

2D FFT: Perform a 1D FFT on each row of an image and then perform a 1D FFT on each resulting column

The Cell◦ Nine cores◦ 1 Power Processing Unit (PPU)◦ 8 Synergistic Processing Units (SPU)

Page 3: Mapping the FFT Algorithm to the IBM Cell Processor

StrategyStrategyCell comes with 2d routine

◦Needs to be called twice◦First call organizes the data in

contiguous column form Striping

Limited SPU memory◦Quad Buffering

Page 4: Mapping the FFT Algorithm to the IBM Cell Processor

PPU SPU 0Input Buffer

Output Buffer

FFT out

Input

DMA In

FFT

DMA Out

PPU SPU 0Input Buffer

Output Buffer

Input

FFT out

FFT

DMA In

DMA Out

Page 5: Mapping the FFT Algorithm to the IBM Cell Processor

PPU

Input Buffer

Output Buffer

DMA InInput

FFT out

FFT

SPU 7SPU 1

SPU 0

Page 6: Mapping the FFT Algorithm to the IBM Cell Processor

PPUSPU 2Input Buffer

Output Buffer

FFT out

Input

DMA In

FFT

DMA Out

PPUInput Buffer

Output Buffer

Input

FFT out

FFT

DMA In

DMA Out

Sync Point

SPU 1

SPU 0

SPU 2

SPU 1

SPU 0

Page 7: Mapping the FFT Algorithm to the IBM Cell Processor

Quad bufferingQuad bufferingWhy it is required?

◦Space problems◦Maximizing processing power

Buffers◦IN to handle incoming data◦FFTin and FFTout to process the data◦OUT stores the data ready to be

DMA’ed back to main memory

Page 8: Mapping the FFT Algorithm to the IBM Cell Processor

BufferingBuffering

665544332211

------------------------------------------FILLFILL00DDCCBBAA

Page 9: Mapping the FFT Algorithm to the IBM Cell Processor

BufferingBuffering

6655443322

--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11

------------------------------------------FILLFILL00DDCCBBAA

Page 10: Mapping the FFT Algorithm to the IBM Cell Processor

BufferingBuffering

66554433

FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT

FFTFFTININ

FFTFFTOUTOUT

11------------------------------------------FILLFILL00DDCCBBAA

Page 11: Mapping the FFT Algorithm to the IBM Cell Processor

BufferingBuffering

665544

FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT

FFTFFTININ

FFTFFTOUTOUT

11------------------------------------------FILLFILL00DDCCBBAA

Page 12: Mapping the FFT Algorithm to the IBM Cell Processor

BufferingBuffering

FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT66OUTOUTFILLFILLFFTFFTOUTOUTFFTFFTININ55

FFTFFTOUTOUTOUTOUTFFTFFTININFILLFILL44

FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT22--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11------------------------------------------FILLFILL00DDCCBBAA

Page 13: Mapping the FFT Algorithm to the IBM Cell Processor

StripingStripingMain Memory

SPU 0

SPU 1

SPU 2

SPU 3

SPU 4

SPU 5

SPU 6

SPU 7

Page 14: Mapping the FFT Algorithm to the IBM Cell Processor

ChallengesChallengesSimulator

◦Testing is slow◦Alignment◦Compiler

C coding◦Working with bytes

Parallel processing◦Data movement◦Debugging

Page 15: Mapping the FFT Algorithm to the IBM Cell Processor

Knowledge GainedKnowledge GainedMastering LinuxC make files, linking, etcData movement strategiesMulti-core processingDebugging!

Page 16: Mapping the FFT Algorithm to the IBM Cell Processor

Results and ConclusionsResults and ConclusionsSuccess?Future Work

◦Arbitrary size input

Page 17: Mapping the FFT Algorithm to the IBM Cell Processor

Questions?Questions?