reconfigurable hpc notes on datastream-based fft

Post on 17-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reconfigurable HPC

Notes on datastream-based FFT

http://www.fpl.uni-kl.de/staff/hartenstein/Hartenstein-Kyushu04-FFT.ppsxReiner Hartenstein

TU Kaiserslautern

Baden-Baden,12 June 2013

derived from: R. Hartenstein: Reconfigurable Technologies; 23 July 2004, Seminar given at Kyushu University, Fukuoka, Japan http://www.fpl.uni-kl.de/staff/hartenstein/HartensteinKyushu04p1.pdfhttp://www.fpl.uni-kl.de/staff/hartenstein/HartensteinKyushu04p2.pdf

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

2

application-specific distributed memory*

• Application-specific memory: rapidly growing markets:– IP cores– Module generators– EDA environments

• Optimization of memory bandwidth for application-specific distributed memory

*) see Herz et al.: proc. IEEE ICECS 2002

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

3

MoM anti machinean Xputer architecture

Multiple Scan Windowsdata

counter

memory

bank

asM

asM

asM

asM

asM

asM

...... asM

A d

istr

ibu

ted

mem

ory

rDPUsmart

memoryinterface

example: 4x4

scan window

s

.....

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

4

16 point CGFFT: mapped onto 2-D memory space

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

5

ou

tpu

t

tem

p

tem

p

tem

p

coeff

.

coeff

.

coeff

.

CGFFT: Nested and Parallel Scan Pattern

inp

ut

coeff

.

ini

ini+1

coeff.empty

MAC

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

6

CGFFT: Parallel Scan Pattern Animation

ini

ini+1

coeff.empty

outk

MAC

outj 32 steps

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

7

CGFFT: Parallel Scan Pattern Animation

MAC

outj

outj+1

outk

outk+1

ini

ini+1

coeff.empty

Ini+2

ini+3

coeff.empty

MAC

4 MAC unitsin parallel

8 MAC unitsin parallel

16 steps8 steps4 steps

© 2004, reiner@hartenstein.de http://hartenstein.de

TU Kaiserslautern

8

CGFFT: Nested and Parallel Scan Pattern

scanouter loop

patternHLScan is 3 steps [2, 0]

SP1 is 7 steps [0, 2]

SP23 is 7 steps [0, 1]

inner loopcompoundscanpatterns

3 in parallel

goto

top related