![Page 1: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/1.jpg)
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
An Asynchronous Routing Algorithm for Clos Networks
Wei Song and Doug Edwards
The Advanced Processor Technologies Group (APT)
School of Computer Science
The University of Manchester
{songw, doug}@cs.man.ac.uk
1
![Page 2: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/2.jpg)
Outline
• Introduction of Clos networks and asynchronous circuits
• Classic Synchronous dispatching algorithms – Random dispatching (RD)
– Concurrent Round-Robin Dispatching (CRRD)
• Asynchronous Dispatching (AD) algorithm
• Implementation
• Outcome
– 32-port Clos network. Setup 6.2 ns, release 3.9 ns.
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
2
![Page 3: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/3.jpg)
Switching Networks
• Telephone networks
• Asynchronous transfer mode (ATM) and
IP networks
–ATLANTA chip (622Mb/s/port, 1997)
–Petastar Optical Switch (160Gb/s/port, 2003)
• Intra/inter chip interconnect
–High-radix router with many narrow ports
–SDM: delay guaranteed services
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
3
![Page 4: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/4.jpg)
Asynchronous Circuit
• Asynchronous vs. synchronous
–Low power (clockless)
–Possible performance boost (average latency)
–Tolerance to process variation
• Implementation style
–2-phase vs. 4-phase
–Bundled-data vs. quasi-delay insensitive (QDI)
–4-phase QDI
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
4
![Page 5: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/5.jpg)
Research Target
• An area-efficient and high-speed switching
network for on-chip networks
–High-radix router
–Low-power consumption
–Tolerance to process variation
–Area efficient
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
5
![Page 6: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/6.jpg)
Clos Switching Network
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
6
3 stages: IMs, CMs and OMs
C(n, k, m):k IM/OMsn ports per IM/OMm CMs
LI links between IM/CMLO links between CM/OM
SMS: buffer CMMSM: buffer IM/OMS3: no buffer
n´m
n´m
n´m
k´k
k´k
k´k
m´n
m´n
m´n
IM(0)
IM(k-1)
IM(i)
CM(0)
CM(m-1)
CM(r)
OM(0)
OM(k-1)
OM(j)
![Page 7: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/7.jpg)
Non-Blocking
• Blocking
– An available input/output pairs may not be connected due to internal blocking.
• Strict Non-Blocking (SNB)– All available input/output pairs are connectable.
–
• Rearrangeable Non-Blocking (RNB)– Connecting available input/output pairs may
require internal permutation.
–
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
7
12 nmn
12 nm
![Page 8: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/8.jpg)
Area Consumption
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
8
![Page 9: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/9.jpg)
Routing Algorithms
• Optimal algorithms
–Algorithms provide guaranteed results for all matches but with a higher complexity in time and implementation.
• Heuristic algorithms
–Algorithms provide all or partial connections in much lower time complexity.
• Most dynamically reconfigurable Clos
networks use heuristic algorithms
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
9
![Page 10: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/10.jpg)
Dispatching Algorithms
Every Input/output pair has mpaths. Packets are dispatched evenly to all CMs.
Dispatching algorithm: the algorithm used in IM to CM packet dispatching.
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
10
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
C(2,3,2)
![Page 11: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/11.jpg)
Random Dispatching (RD)
• Phase 1
– IPs send
requests to
LIs
– LIs select IPs
• Phase 2
– CMs select
LIs
– Configure
granted IPs
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
11
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 12: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/12.jpg)
Random Dispatching (RD)
• Phase 1
– IPs send
requests to
LIs
– LIs select IPs
• Phase 2
– CMs select
LIs
– Configure
granted IPs
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
12
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 13: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/13.jpg)
Random Dispatching (RD)
• Phase 1
– IPs send
requests to
LIs
– LIs select IPs
• Phase 2
– CMs select
LIs
– Configure
granted IPs
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
13
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
RD cannot resolve contention in IMs.
![Page 14: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/14.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
14
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 15: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/15.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
15
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 16: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/16.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
16
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 17: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/17.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
17
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 18: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/18.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
18
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 19: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/19.jpg)
Concurrent Round-Robin Dispatching (CRRD)
• Phase 1
– IPs request
– LIs select
– IPs select
– Go back
(iteration)
• Phase 2
– Same as RD
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
19
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
CRRD cannot resolve contention in CMs.
![Page 20: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/20.jpg)
Asynchronous Dispatching (AD)
• Difficulties
– Packets arrive asynchronously
– Modules are event-driven
– Orphans are generated by failed requests
• Solution
– Independent algorithms (allocators) run in IMs and CMs
– Failed requests use status feedback from CMs as acknowledge.
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
20
![Page 21: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/21.jpg)
Asynchronous Dispatching (AD)
• IM alg.
– IPs request
– LIs select
– OK? Send data
– Fail? Go back
• CM alg.
– CMs grant LIs
– Update status
feedback
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
21
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 22: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/22.jpg)
Asynchronous Dispatching (AD)
• IM alg.
– IPs request
– LIs select
– OK? Send data
– Fail? Go back
• CM alg.
– CMs grant LIs
– Update status
feedback
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
22
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 23: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/23.jpg)
Asynchronous Dispatching (AD)
• IM alg.
– IPs request
– LIs select
– OK? Send data
– Fail? Go back
• CM alg.
– CMs grant LIs
– Update status
feedback
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
23
IM(0)
CM(0)
IM(1)
OM(0)
OM(1)
CM(1)
CM(2)
0
2
1
3
![Page 24: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/24.jpg)
Comparison of Algorithms
• Random Dispatching (AD)
– Cannot resolve contention in IMs or CMs.
• Concurrent Round-Robin Dispatching (CRRD)– Handle contention in IMs using iterations.
– Cannot resolve contention in CMs.
• Asynchronous Dispatching (AD)– Contention in IMs is handled by asynchronous
arbiter naturally.
– Handle contention in CMs using status feedback.
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
24
![Page 25: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/25.jpg)
Scheduler Architecture
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
25
OM_Sch
(0)
IM_Sch(i)
IM_Sch(7)
CM_Sch(r)
CM_Sch(3)
OM_Sch
(7)
OM_Sch
(j)
RIM
IM_Dis
InpG0
InpG1
InpG2
InpG3
OMr
IMr
OM
r
IMr
RCM
CM_Dis
CM_Sch(0)IM_Sch(0)
req(0,0)
req(0,1)
req(0,2)
req(0,3)
req(i,0)
req(i,1)
req(i,2)
req(i,3)
req(7,0)
req(7,1)
req(7,2)
req(7,3)
C(4,8,4)
![Page 26: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/26.jpg)
CM Dispatcher/OM Scheduler
CMa: CM ack
CMs: CM status feedback
CMa and CMs are
mutually exclusive
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
26
Tre
e A
rbite
r
CMr( i, j)
CMr( 0, j)
CMr( 7, j)
CMa( i, j)
CMa( 0, j)
CMa( 7, j)
CMa( i', j)
i' = 0 ~ 7, i' ≠ i
CMa( i, j')
j' = 0 ~ 7, j' ≠ j
CMs( i, j)
C(4,8,4)
OM_Sch
(0)
IM_Sch(i)
IM_Sch(7)
CM_Sch(r)
CM_Sch(3)
OM_Sch
(7)
OM_Sch
(j)
RIM
IM_Dis
InpG0
InpG1
InpG2
InpG3
OMr
IMr
OM
r
IMr
RCM
CM_Dis
CM_Sch(0)IM_Sch(0)
req(0,0)
req(0,1)
req(0,2)
req(0,3)
req(i,0)
req(i,1)
req(i,2)
req(i,3)
req(7,0)
req(7,1)
req(7,2)
req(7,3)
![Page 27: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/27.jpg)
IM Dispatcher
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
27
IM/CM
Match
ArbiterIM/CM
Match
Arbiter
LI Arbiter
LI Arbiter
IMcfg
IMr(0)
IMr(1)
IMr(2)
IMr(3)
ImCmA(7)
ImCmA(0)
CMs(3)
CMs(0)
CMr(3)
CMr(0)
IMa
8
4*44
8
8
CfgG
CMa(0)
CMa(3)8
C(4,8,4)OM_Sch
(0)
IM_Sch(i)
IM_Sch(7)
CM_Sch(r)
CM_Sch(3)
OM_Sch
(7)
OM_Sch
(j)
RIM
IM_Dis
InpG0
InpG1
InpG2
InpG3
OMr
IMr
OM
r
IMr
RCM
CM_Dis
CM_Sch(0)IM_Sch(0)
req(0,0)
req(0,1)
req(0,2)
req(0,3)
req(i,0)
req(i,1)
req(i,2)
req(i,3)
req(7,0)
req(7,1)
req(7,2)
req(7,3)
![Page 28: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/28.jpg)
IM/CM Match Arbiter(Multi-resource Arbiter)
• S. Golubcovs, D. Shang, F. Xia, A. Mokhov, and A. Yakovlev, “Modular approach to
multi-resource arbiter design,” ASYNC 2009.
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
28
Tree Arbiter
Tre
e A
rbite
r
ri cb
ocb
o
ri
rbo
rbi
rbi
rbocbi
cb
o
riri cb
o
rbi
rbocbi
rbo
rbi
cbi
cbi
ci
ci ci
ci
IMr(0, j)
IMr(3, j)
IMrb(0, j)
IMrb(3, j)
CMs(0, j)
CMsb(0, j)
CMs(3, j)
CMsb(3, j)
h
h( j,3
,0)
h
h( j,0
,0)
hh( j,0,3)
hh( j,3,3)
C(4,8,4)
![Page 29: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/29.jpg)
Request Generation
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
29
h( j, h, r)
IMr( h, j)
CMs( r, j)
CMrMx( r, j, h)
CMrMx( r, j, 0)
CMrMx( r, j, 3)
CMrME( r, j)
LI A
rbite
rCMrME( r, 0)
CMrME( r, 7)
CMr( r, 0)
CMr( r, j)
CMr( r, 7)
CMsb( r, j)
CMrMx(0, j, h)
CMrMx(3, j, h)
IMrb( h, j)
C(4,8,4)
![Page 30: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/30.jpg)
Configuration Generation
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
30
CMa( r, j)
CMrMx( r, j, h)
CMr( r, j)
cfgMx( r, h, j)
cfgMx( r, h, 0)
cfgMx( r, h, 7)
IMcfg( r, h)
IMa(h)
IMcfg(0, h)
IMcfg(3, h)
C(4,8,4)
![Page 31: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/31.jpg)
Area and Speed
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
31
Dispatching:setup 4.6 ns release 2.9 ns
OM Scheduler:setup 1.6 ns release 1.0 ns
Total: setup 6.2 ns release 3.9 ns
C(4,8,4)
![Page 32: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/32.jpg)
MATLAB Simulation Model
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
32
![Page 33: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/33.jpg)
Throughput Analysis
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
33
C(4,8,4) non-blocking loadRD 55%CRRD (i=4) 70%AD (i=3) 76%
C(4,8,m) non-blocking loadRD (m=7) 74%CRRD (m=7) 82%AD (m=7) 96%
![Page 34: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/34.jpg)
Conclusions
• The first asynchronous routing algorithm for general 3-stage Clos networks.
• Better throughput performance than RD and CRRD
• Future works
–Optimize the Clos structure
–Reduce area and latency
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
34
![Page 35: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/35.jpg)
Thank you!
Questions?
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
35
![Page 36: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/36.jpg)
Iterations
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
36
![Page 37: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/37.jpg)
Uniform Traffic
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
37
![Page 38: An Asynchronous Routing Algorithm for Clos Networks](https://reader030.vdocuments.site/reader030/viewer/2022012420/6174fe65e9054d10f47f6518/html5/thumbnails/38.jpg)
Results after Optimization
• Async
– Setup 3.7 ns
– Reset 3.2 ns
– Period ~ 7 ns (140M)
– 28K Gates
• Sync
– 350 MHz
– Cell time = Iter+1 =5 (70M)
– 22K Gates
• Optimization
– Replace multi-resource arbiter
– Eager request (InpG)
– MUTEX arbiter
– Optimized tree arbiter
June 23rd 2010Advanced Processor Technologies Group
The School of Computer Science
38