reconfigurable hpc notes on datastream-based fft
TRANSCRIPT
Reconfigurable HPC
Notes on datastream-based FFT
http://www.fpl.uni-kl.de/staff/hartenstein/Hartenstein-Kyushu04-FFT.ppsxReiner Hartenstein
TU Kaiserslautern
Baden-Baden,12 June 2013
derived from: R. Hartenstein: Reconfigurable Technologies; 23 July 2004, Seminar given at Kyushu University, Fukuoka, Japan http://www.fpl.uni-kl.de/staff/hartenstein/HartensteinKyushu04p1.pdfhttp://www.fpl.uni-kl.de/staff/hartenstein/HartensteinKyushu04p2.pdf
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
2
application-specific distributed memory*
• Application-specific memory: rapidly growing markets:– IP cores– Module generators– EDA environments
• Optimization of memory bandwidth for application-specific distributed memory
*) see Herz et al.: proc. IEEE ICECS 2002
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
3
MoM anti machinean Xputer architecture
Multiple Scan Windowsdata
counter
memory
bank
asM
asM
asM
asM
asM
asM
...... asM
A d
istr
ibu
ted
mem
ory
rDPUsmart
memoryinterface
example: 4x4
scan window
s
.....
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
4
16 point CGFFT: mapped onto 2-D memory space
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
5
ou
tpu
t
tem
p
tem
p
tem
p
coeff
.
coeff
.
coeff
.
CGFFT: Nested and Parallel Scan Pattern
inp
ut
coeff
.
ini
ini+1
coeff.empty
MAC
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
6
CGFFT: Parallel Scan Pattern Animation
ini
ini+1
coeff.empty
outk
MAC
outj 32 steps
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
7
CGFFT: Parallel Scan Pattern Animation
MAC
outj
outj+1
outk
outk+1
ini
ini+1
coeff.empty
Ini+2
ini+3
coeff.empty
MAC
4 MAC unitsin parallel
8 MAC unitsin parallel
16 steps8 steps4 steps
© 2004, [email protected] http://hartenstein.de
TU Kaiserslautern
8
CGFFT: Nested and Parallel Scan Pattern
scanouter loop
patternHLScan is 3 steps [2, 0]
SP1 is 7 steps [0, 2]
SP23 is 7 steps [0, 1]
inner loopcompoundscanpatterns
3 in parallel
goto