packet-switched vs. time-multiplexed fpga overlay networks kapre et. al

40
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh

Upload: angie

Post on 23-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al. RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh. Agenda. Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet-Switched vs. Time-Multiplexed FPGA Overlay

Networks

Kapre et. al

RC Reading Group – 3/29/2006

Presenter: Ilya Tabakh

Page 2: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 3: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 4: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Introduction

• Dedicated spatial interconnect links on a configured FPGA network can be inefficient for sparse communication patterns

• Overlaying virtual networks on top of the physical networks can help address this issue

Page 5: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Time-MultiplexedPros

– Can take advantage of global route information

Cons– Offline computation can be compute intensive– Must allocate resources for communication

schedule and all possible communication between operators

Page 6: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet-Switched

Pros– No offline setup and resources for storing

communication schedule– Routes are made for operators that are

actually communicating

Cons– Switches more complex– Routes can be less efficient

Page 7: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Novel Contributions of work

• Demonstration of efficient and scalable static and dynamic FPGA overlay networks

• Quantification of difference between offline scheduling and online routing

• Quantification of performance impacts due to balancing interconnects and computing

• Characterization of area and performance tradeoffs between time-multiplexed and packet-switched

• Quantification of performance difference between time-multiplexed and packet-switched under varying application communication loads.

Page 8: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 9: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

NoC• Early days – on-chip buses• Later necessary to investigate scalable, high-

performance, low-overhead on chip networks• Networks are required since buses scale

poorly• As the number of PEs increases the

communcation increases and more bandwidth is needed

Page 10: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Communication Patterns• Need to know in order to choose

network to use• Configured switching is inefficient for

apps that underutilize links• Circuit switching is efficient for larger

messages on shorter networks• Need to know characteristics in order to

make appropriate choice

Page 11: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet Switched

How they improve on past work in FPGA-based overlay networks– Allow arbitrary topolgies– Use real applications and relistic PE

architectures to generate traffic payloads– Network speed is much faster running at

166 MHz as compared to most running at 25-50 MHz

Page 12: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Time Multiplexed• Use a greedy router similar to the one

used in the Virtual Wires project• Virtual Wires overcame pin limitation by

time sharing each physical wire among logical wires and pipelining

• This paper attempts to explore the entire design space as opposed to one system size or config

Page 13: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 14: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Performance AnalysisSeveral important quantities of the network have to be

defined• PE Input Serialization

A bound of cycle count for input • PE Output Serialization

A bound of cycle count for output• Network Bisection

Maximum number of messages that can cross the network on a given cycle

• Network LatencyNumber of cycles required to cross the network

Page 15: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Butterfly Fat Trees• Most FPGA NoCs have focused on meshes• BFTs achieve higher performance at

equivalent chip size• Routing functions programmed in the split

primitives determine path• Single address bit is used to make a routing

decision at each switch• Time-multiplexed merge contains a context

memory which stores computed routing

Page 16: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al
Page 17: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 18: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet Switched

• Primitives have input queues• Split primitives computes the routing

decision in a single cycle based on the destination address

• Arbitration is done by selecting packets based on input queue occupancies

• Network with floorplaned and pipelined primitives can operate as high as 180 MHz

Page 19: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 20: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Time Multiplexed• Statically scheduled prior to runtime• Switching primitives contain context

memory• Context memory requires 1 bit of storage

per cycle• Network capable of operating at 166 MHz• Greedy routing algorithm used

Page 21: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Area and Latency of Switching

Page 22: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 23: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• A real life application was mapped onto the networks

• ConceptNet – common-sense reasoning knowledge base represented as a graph

• Start with a inititial set of nodes, send activation from each node to it’s neighbors along weighed edges

• Time multiplexed run at 100% activity packet switched run between 1-100% activity level

• Limitations– Nodes limited to 128 edges of fanout or fanin– Can only process a single edge per cycle

Application

Page 24: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 25: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Java based infrastructure– simulates the packet switched network – computes schedules for time multiplexed network

• Used smallest set of ConceptNet predicates

• Java infrastructure generates VHDL netlist

• Hand coded VHDL for ConceptNet PEs

• Created custom multipliers instead of using onboard for speed

Methodology

Page 26: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Methodology (cont)• Synthesis and place and routing using

Synplicity Compiler v8.0• Xilinx ISE v8.1i to obtain operating

frequency and slice count• Long wires that constrain performance

are further pipelined based on post place-and-route timing analysis

• Lots of intervention to prepare system

Page 27: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 28: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

ResultsThree quantitative comparisons are provided to

characterize the tradeoffs between packet switched and time multiplexed networks– Routing of identical topologies– Impact of area with identical area constraints– Examine performance while varying activity level

(Activity Factors)

Page 29: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Routing identical topologies• Small numbers of PEs induce a light

communication load• As PEs , communication and offline

routing starts to outperform online routing

• Online routing requires up to 63% more cycles than offline routing for larger networks

Page 30: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Impact of Area

• A couple of things to consider when talking about area– PE vs. Interconnect Tradeoff– Area-Time Tradeoff

Page 31: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

PE vs. Interconnect Tradeoff

Sometimes the network performs better with less PEs but more capacity inthe network.

Page 32: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Area-Time Tradeoff• Packet switched and time multiplexed

networks may use significantly different amounts of area due to differences in switch sizes

• At smaller areas time multiplexing requires more cycles

• At higher cycle counts time multiplexing requires more area for context

• Performance is limited by 128 edge fanin or fanout limit

Page 33: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Activity Factors• Packet-switching takes 8x as many

cycles to route

• At some activity factors less than 100% packet-switching should be able to outperform time-multiplexing for same area

Page 34: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 35: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Conclusions• Demonstrated implementations of packet-

switched and time-multiplexed FPGA overlay networks operating at 166 MHz

• Offline scheduling offers up to a 63% performance increase over online scheduling for equivalent topologies

• Packet-switching is up to 2x faster for small areas

• Time-multiplexing is up to 8x faster for large areas

Page 36: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Conclusions (cont.)

For activity factors less than 30% or 5%, packet switching offers better performanceAt 32K slices and 100K slices respectively

Page 37: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Future Work• Mapping larger communication graphs

with smaller fanout limitations to fully test networks

• Compress context memory for time-multiplexing

• Improve efficiency of packet switching

• Extend work to multiple-chip networks

Page 38: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

• Introduction • Background• Topology• Packet Switched• Time Multiplexed• Application• Methodology• Results• Conclusions• Wrap-up• Questions

Agenda

Page 39: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Wrap-up

• Paper takes a look at trade-offs involved in FPGA networks

• Thought it was a good look at design decisions and gave actual guidance to the designer

• Describes interesting alternative to mesh network (BFTs)

Page 40: Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al