a flexible interconnection structure for reconfigurable fpga dataflow applications

A Flexible Interconnection Structure

for Reconfigurable FPGA Dataflow Applications

Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio

Politecnico di MilanoDipartimento di Elettronica, Informazione e Bioingegneria

Milano, IT

[durelli, nacci, rcattaneo, pilato, sciuto]@elet.polimi.itmarco.santambrogio@polimi.it

20th Reconfigurable Architectures Workshop May 20-21, 2013, Boston, USA

Rationale

• Strive for performance in computing intensive applications

• Reconfigurable HW well suited for certain classes of applications– Multimedia, computational biology, physical

simulation

• FPGA used in HPC systems• High maintenance costs

– need to share resources among users

• Need to dynamically share and reuse components on FPGA among different users

Outline

• Goals• State of Art• Proposed Solution• Design and Evaluation• Case Study• Conclusions and Future work

• Design an interconnection able to:– Create different pipelines reusing

available components on the FPGA– Share the resources between different

applications– Not insert any stall in the pipeline

• Target FPGA for HPC scenario

State of Art

• BUS interconnection– Congestion problem– Does not scale

• Network on Chip– Possible congestion problem– Good scalability

• Introduce unexpected delays in computation– Can’t assure performance when sharing

the device between different users

Proposed Solution

• Switch based interconnection– Cores inputs connected to interconnection

outputs– Cores outputs connected to interconnection

inputs– Fully pipelined point-to-point communication

• Data read/write only when all the inputs are available

• Can be configured by setting for each input and output channels:– Switching configuration:

• Multiplexer configuration to route information

– From which clock cycle the channel is active– How much data have to be read/write through that

channel6

Proposed Solution

• Suited for Dataflow/Pipelined applications• Parameters can be extracted from an high

level description of the application and pipeline structure:– Possibility to automate the parameter

extraction and interconnection design

Implementation

• Solution Implemented with HLS:– HLS well suited for dataflow/stencil loop synthesis– Simplify HW development– Generation of compatible interfaces

• Maxeler Technologies:– HPC Dataflow computing exploiting FPGA– Proprietary HLS starting from Java-like description:

• Proposed interconnection solution easily described in Java

• MaxWorkstation 3A:– Intel i7 quad-core– Xilinx Virtex6 XC6VSX547T– PCIe communication:

• Maximum 8 channels/streams

Evaluation: Area Occupation

• Area increment (10-30%) due to increase in switching logic

• The interconnection consumes up to 6% of the FPGA:– Lot of space remains for user cores

Evaluation: Frequency

• Tested with pass-through cores to evaluate maximum working frequency of the interconnection (300MHz)

• In case of real life applications (Brain network with cores working at 200MHz) the interconnection does not affect the critical path

Case Study• Application:

– Image processing pipeline (up to 4 stages):• Gray scale (GS), Gaussian blur (GB), Edge detection (ED) filters• Their combinations

• Tested architectures:

• Experiments:– Single execution of a N stages pipeline– Batch execution of a workload of 100 random applications

(A) (B) (C) (D)

Case Study: Single execution

(A) (B) (C) (D)

Case Study: Single execution

(A) (B) (C) (D)

Case Study: Batch execution

• Proposed solution (D) does not introduce overhead in the overall execution time w.r.t. the other two architectures

• Low system load:– Up to 30% reduction in the overall workload execution time

Case Study: Batch execution

• Low system load (1-2 applications):– Proposed solution (D) does not introduce delays in the

execution of a single application of the workload

• Higher system loads (more than 2 applications):– 10%-30% reduction in single application execution time

Conclusions and Future work

• Conclusion:– Design of a interconnection to support HW

resource sharing in multi-application scenario

– Solution suited for dataflow/pipelined systems

– Possibility to realize different pipeline configurations at run-time

• Future works:– Design of a mapping/reconfiguration strategy

to allocate user cores and configure new core instances at run-time

a flexible interconnection structure for reconfigurable fpga dataflow applications

Documents

dataflow diagram

tpl dataflow

dataflow models

design and synthesis of image processing systems using...

dataflow & social seeder

แบบฝึกหัด dataflow diagram

t pl dataflow

dataflow monitoring

dataflow order execution

high-level synthesis for reconfigurable systems. 2 agenda...

onesource dataflow

compact reconfigurable avionics – reconfigurable data

a lrtap dataflow

h-tree based configuration schemes for a reconfigurable...

course-grained reconfigurable devices. 2 dataflow machines...

a hybrid systolic-dataflow architecture for inductive...

dataflow model

design of distributed applications based on the osi...

fg dataflow diagram

eng6530 reconfigurable computing systems introduction to...