routing algorithms for fpgas with sparse intra-cluster routing crossbars

55
Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars Yehdhih Ould Mohammed Moctar 1 Guy G.F. Lemieux 2 Philip Brisk 1 1 University of California Riverside 2 University of British Columbia 22 nd International Conference on Field Programmable Logic and Applications Oslo, Norway, August 29-31, 2012

Upload: tave

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars. 1 University of California Riverside 2 University of British Columbia. Yehdhih Ould Mohammed Moctar 1 Guy G.F. Lemieux 2 Philip Brisk 1. 22 nd International Conference on Field Programmable Logic and Applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

Yehdhih Ould Mohammed Moctar1 Guy G.F. Lemieux2 Philip Brisk1

1University of California Riverside2University of British Columbia

22nd International Conference on Field Programmable Logic and ApplicationsOslo, Norway, August 29-31, 2012

Page 2: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

2

Outline

• Background– FPGA architecture– FPGA routing

• FPGAs w/full intra-cluster routing crossbars– Implications for routing

• FPGAs w/sparse intra-cluster routing crossbars– Routing challenges– New routing algorithms

• Experimental Results• Conclusion

Page 3: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

3

FPGA Architecture (1/3)

Basic Logic Element (BLE)

Page 4: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

4

FPGA Architecture (2/3)

Versatile Place and Route (VPR) CLB Architecture

Page 5: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

5

FPGA Architecture (3/3)

Page 6: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

6

Routing Resource Graph (RRG)

www.eecg.toronto.edu/~aling/ece1718/project/fang/route_rr_graph.png

2-LUT

in1 in2

out

wire1

wire2

wire3 wire4source

sink

out

wire4

wire2

in2in1

wire1

wire3

Page 7: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

7

FPGA Routing• Disjoint-path problem (on the RRG); NP-complete

• Input:– Graph G(V, E)– Set of sources S = {s1, s2, …, sm}

– Set of sets of sinks T = {T1, T2, …, Tn}, Ti = {ti1, ti

2, …, tik}

• Solution

– Finds paths from each source si to all sinks in Ti

– Paths emanating from different sinks must be disjoint (cannot shared any vertices or edges)

• Objective(s)– Minimize delay, wirelength, etc.

Page 8: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

8

Disjoint and Non-disjoint Paths

s1

s2

t21

t11

Page 9: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

9

Disjoint and Non-disjoint Paths

This route is illegal!

s1

s2

t21

t11

Two nets are routed on the same FPGA segment(remember, RRG vertices represent wires and CLB I/Os)

Page 10: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

10

Disjoint and Non-disjoint Paths

This route is legal!

s1

s2

t21

t11

Page 11: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

11

LUT Input Equivalence (1/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

LUT inputs are interchangeable

Page 12: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

12

LUT Input Equivalence (2/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

s1

s2 t21 = in2

t11 = in1 t2

1 = in1

T11 = in2

s1

s2

Overly restrictive disjoint-path problem formulation

Page 13: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

13

LUT Input Equivalence (3/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

Represent all inputs of a LUT as one RRG sink, t

ts1

s2

Page 14: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

14

Outline

• Background– FPGA architecture– FPGA routing

• FPGAs w/full intra-cluster routing crossbars– Implications for routing

• FPGAs w/sparse intra-cluster routing crossbars– Routing challenges– New routing algorithms

• Experimental Results• Conclusion

Page 15: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

15

CLB Architecture (Recap)

Versatile Place and Route (VPR) CLB Architecture

Page 16: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

16

Full Crossbar Intra-cluster Routing (1/6)2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• Typical of FPGAs 10+ years ago

Page 17: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

17

Full Crossbar Intra-cluster Routing (2/6)2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• Each CLB input connects to each BLE input

Page 18: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

18

Full Crossbar Intra-cluster Routing (3/6)2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• CLB inputs are equivalent

Page 19: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

19

Full Crossbar Intra-cluster Routing (4/6)2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• No need to model intra-cluster routing in the RRG

Page 20: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

20

RRG

Full Crossbar Intra-cluster Routing (5/6)

RRG

Page 21: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

21

Full Crossbar Intra-cluster Routing (6/6)

Don’t model the RRG inside each CLB!

Page 22: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

22

Outline

• Background– FPGA architecture– FPGA routing

• FPGAs w/full intra-cluster routing crossbars– Implications for routing

• FPGAs w/sparse Intra-cluster routing crossbars– Routing challenges– New routing algorithms

• Experimental Results• Conclusion

Page 23: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

23

Sparse Crossbar Intra-cluster Routing (1/6)

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• Most FPGAs today

Page 24: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

24

Sparse Crossbar Intra-cluster Routing (2/6)

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• CLB inputs are not equivalent!

Page 25: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

25

Sparse Crossbar Intra-cluster Routing (3/6)

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• RRG must include the intra-cluster routing

Page 26: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

26

Sparse Crossbar Intra-cluster Routing (4/6)

RRG

Page 27: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

27

Sparse Crossbar Intra-cluster Routing (5/6)

RRG encompasses the entire FPGA

Page 28: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

28

Sparse Crossbar Intra-cluster Routing (6/6)

• Key Issues relating to RRG size– May run out of memory • If you are not routing in the cloud

– Long router runtimes• Large search space leads to slow convergence• Thrashing

• Objective– Route without expanding the whole RRG

Page 29: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

29

Routing Algorithms• Extensions to PathFinder routing algorithm

– Routes one net at a time via wavefront expansion

• Baseline– Extend the RRG with intra-cluster routing at each CLB

• Selective RRG Expansion (SERRGE)– Route nets to CLB inputs– Dynamically expand the RRG to finish the route

• Partial Pre-Routing (PPR)– Pre-route all nets within CLBs– Complete global routes to CLB inputs

Page 30: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

30

Baseline Router2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• Inputs of the same LUT are equivalent!

Global RRG

Page 31: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

31

SERRGE in Action (1/8)

Start with a global RRG

s

t

Page 32: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

32

SERRGE in Action (2/8)

s

t

Route from s’s CLB to t’s CLB

Page 33: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

33

SERRGE in Action (3/8)

s

t

Locally expand the RRG with part of t’s CLB

Page 34: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

34

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Maintain one copy of the intra-cluster topology

SERRGE in Action (4/8)

t

Page 35: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

35

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Expand the fanout of the CLB input being routed

SERRGE in Action (5/8)

t

Page 36: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

36

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Complete the route if possible; if not, try again

SERRGE in Action (6/8)

t

Page 37: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

37

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Track used CLB and BLE inputs; remember the route

SERRGE in Action (7/8)

t

Page 38: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

38

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Deallocate unused RRG resources

SERRGE in Action (8/8)

t

Page 39: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

39

SERRGE Summary• Global routing uses standard PathFinder

• Selectively expand portions of the RRG to complete routes from CLB inputs to BLE inputs

• Storage Requirement– Global RRG (standard PathFinder requirement)– One copy of the intra-cluster routing topology– All intra-cluster routes computed thus far

• Fairly challenging to implement!

Page 40: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

40

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Process CLBs one at a time

PPR in Action (1/4)

t1

t2

Page 41: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

41

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Compute local routes for each BLE

PPR in Action (2/4)

t1

t2

Page 42: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

42

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

CLB inputs now form equivalence classes.

PPR in Action (3/4)

t1

t2

Page 43: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

43

PPR in Action (4/4)

Perform global routing w/equivalence class constraints on CLB inputs

s

t

Page 44: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

44

PPR Summary

• Pre-route each CLB

• Propagate LUT equivalences to CLB inputs– A baseline router on the full-blown RRG would still need to

satisfy BLE input-pin equivalence constraints

• Route as normal

• Much easier implementation than SERRGE

Page 45: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

45

Outline

• Background– FPGA architecture– FPGA routing

• FPGAs w/full intra-cluster routing crossbars– Implications for routing

• FPGAs w/sparse Intra-cluster routing crossbars– Routing challenges– New routing algorithms

• Experimental Results• Conclusion

Page 46: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

46

Experimental Setup• VPR 5.0, iFAR FPGA architecture files

– Crossbar population density, p = {25%, 50%, 75%, 100%}– LUT size = K = {4, 5, 6, 7}– BLEs per cluster, N = {4, 6, 8, 10}

• 10 largest IWLS benchmarks– Results are averaged

• VPR-PathFinder runs for 100 iterations, then stops– Baseline (full-blown RRG)– SERRGE– PPR– VPR 5.0 (when crossbar population density is 100%)

Page 47: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

47

Routability as a Function of LUT Size

N = 8 BLEs per CLBp = 50% population density

Page 48: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

48

Routability as a Function of CLB Size

K = 6-LUTsp = 50% population density

Page 49: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

49

Runtime as a Function of Population Density

K = 6-LUTsN=8 BLEs per cluster

Page 50: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

50

Runtime as a Function of LUT Size

p = 100% population densityN=8 BLEs per cluster

Page 51: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

51

PPR Intra-cluster Routing Runtime

K=6-LUTsN=8 BLEs per cluster

On average, total runtime was 100s of seconds!

Page 52: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

52

Static and Dynamic RRG Size

K=6-LUTsN=8 BLEs per cluster

VPR generates FPGAs that are sized to the application, and are smaller than commercial products

Page 53: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

53

Critical Path Delay

K=6-LUTsN=8 BLEs per cluster

Page 54: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

54

Outline

• Background– FPGA architecture– FPGA routing

• FPGAs w/full intra-cluster routing crossbars– Implications for routing

• FPGAs w/sparse Intra-cluster routing crossbars– Routing challenges– New routing algorithms

• Experimental Results• Conclusion

Page 55: Routing Algorithms for FPGAs with Sparse Intra-cluster Routing Crossbars

55

Conclusion

• Addressed FPGA routing algorithms where CLBs have sparse intra-cluster routing crossbars

• Minimal modifications to PathFinder required

• SERRGE vs. PPR– SERRGE achieves better routability– PPR converges faster– PPR easier to implement