an operand routing network for an sfq reconfigurable data-paths processor
DESCRIPTION
An Operand Routing Network for an SFQ Reconfigurable Data-Paths Processor. I. Kataeva 1 , H. Akaike 1 , A. Fujimaki 1 , N. Yoshikawa 2 , N. Takagi 3 , K. Murakami 4 1 Dept. of Quantum Engineering, Nagoya University 2 Dept. of Electrical and Computer Engineering, Yokohama National University - PowerPoint PPT PresentationTRANSCRIPT
An Operand Routing Network for an SFQ Reconfigurable Data-Paths Processor
I. Kataeva1, H. Akaike1, A. Fujimaki1, N. Yoshikawa2, N. Takagi3, K. Murakami4
1Dept. of Quantum Engineering, Nagoya University2Dept. of Electrical and Computer Engineering, Yokohama National University3Dept. of Information Engineering, Nagoya University4Research Institute for Information Technology, Kyushu University
Outline
Architectures of the Operand Routing Network NDRO-based ORN Crossbar-based ORN
A crossbar switch with a multicasting function
Experimental results Crossbar switches A 1-to-2 ORN prototype
Conclusion
Reconfigurable Data-path processoran architecture suitable for SFQ implementation
:
LM
:...:::
SMAC
SB
ORN
...
ORN
...
: : : :
ORN
...
ORN
FPU
two-dimensional array of Floating Point Units
connected using Operand Routing network
ORN is reconfigured to fit a DFG
Requirements for the Operand Routing Network
each FPU can be connected to one or more FPUs in the next row
connections are established between the FPUs in the immediate vicinity of each other
number of the connections, N, is odd
one FPU’s output can be connected to either or both inputs of an FPU in the next stage
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
NDRO-based Operand Routing Network
“+”: 2×M×N NDRO cells small number of Josephson junctions
NDRO
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
NDRO
NDRO
NDRO
NDRO
NDRO
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
“–”: irregular non-pipelined structure with the increase of the complexity becomes cumbersome
½CB
½CB
½CB
CB
CB
“+”: scalable pipelined easily re-designed for any number of N and M
Crossbar-based Operand Routing Network
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU ½CB
½CB
CB
CB
CB
CB
CB
CB
CB
CB
CB
“–”: large number of Josephson junctions M - ½ CB and (2×M+1)(N-1)/2 - CB
Comparison of the ORN architectures
ORN complexity
latency, ps minimum interval
number of control lines
bias current, A
power, mW
number of JJ (including control
block)
N=2, M=4 200 nf+200ps 32 0.2 0.5 1710
N=3, M=8 288 nf+288ps 96 0.7 1.75 6840
N=5, M=10 200
N=9, M=32 1152
ORN complexity
latency, clocks
minimum interval
number of control lines
bias current, A
power, mW
number of JJ
number of JJ (including control
block)
N=2, M=4 6 nf 48 0.3 0.75 3000 4020
N=3, M=8 6 nf 100 0.63 1.575 6230 7684
N=5, M=10 10 nf 208 1.41 3.525 13930 14954
N=9, M=32 18 nf 1168 8.28 20.7 77440 99088
Crossbar-based ORN
NDRO-based ORN
Number of JJs of NDRO-based ORN in a table is an estimation based on a design of the switch for RDP prototype (N=3, M=4) that consisted of 3420 JJs (Iwasaki, not published yet)
A crossbar switch with broadcasting function: 296 JJs
Crossbar switch with a multicasting function, ver.1
DFF
01AND
clkin
cross0
bar0
din0
din1
dout0
dout1
00
10AND
11AND
00AND
DFF
DFF
DFF
DFF
NDRO
DFF
NDRO
DFF
DFF
cross1
bar1
01AND
10AND
11AND
00AND
01AND
10AND
11AND
00AND
788 JJs
area – 1.28mm×1mm
bias current – 87 mA
clock frequency – 27 GHz
4 pipeline stages
01 10 11
clkout
NOT
NOT
clkin
cross0
bar0
din0
din1
dout0
dout1
00
NDROC02
cross1
bar1
AND
AND
AND
AND
296 JJs 30 mA 62% reduction of the number of JJs 2 pipeline stages instead of 4
01 10 11
NDROC02
DFF
DFF
clkout clkin
cross0
bar0
din0
dout0
dout1NDRO
cross1
bar1AND
AND
141 JJs 14 mA 70% reduction of the number of JJs 2 pipeline stages instead of 4
NDRO
DFF
clkout
Crossbar switch with a multicasting function, ver.2
00 01 10 11
Experimental results
data_in
lad
der
clkin_lfin
clkin_lfoutclkin_hf
data_out
Circuits tested:
crossbar switch with a multicasting function, ver.1
½ crossbar switch with a multicasting function , ver.1
1-to-2 ORN prototype
circuit under test
Cross-bar switch with a multicasting function, ver.1
00 01
10 11
din0din1bar0
bar1clkin_lfin
clkin_lfout
dout0dout1
clkout
clkout1
cross0
din0din1bar0cross1clkin_lfin
clkin_lfout
dout0dout1
clkout
clkout1
bar1
Pattern Bias current
Lower margin, %
Upper margin, %
00 bias1 -13.450 4.213
bias2 -4.702 15.014
01 bias1 -13.450 7.745
bias2 -7.989 5.156
10 bias1 -15.217 7.745
bias2 -4.702 5.156
11 bias1 -13.450 4.213
bias2 -4.702 11.728
788 JJs area – 1.28mm×1mm bias current – 87 mA clock frequency – 27 GHz
Total: 1484 JJs bias current – 172 mA
clkin_hf
½ cross-bar switch with a multicasting function, ver.1
00 01
10
din0bar0
bar1clkin_lfinclkin_lfout
dout0
dout1
clkout
clkout1
cross0
din0bar0
cross1clkin_lfin
clkin_lfout
dout0dout1
clkout
clkout1
Pattern Bias current lower margin, %
Bias current upper margin, %
00 2.402 15.512
01 2.7 17.797
10 1.011 18.293
Tested at the frequencies: 22÷36 GHz
455 JJs area – 1mm×0.52mm bias current – 50 mA clock frequency – 27 GHz
Total: 1066 JJs bias current – 127 mA
1-to-2 ORN: low frequency test
completely functional, exhaustive test
bias_kern0 = -14.6/5.3 % does not depend on the pattern
bias_kern1 = -16.1/18.3 % for din0 -> dout11, dout12 bias_kern2 = -20.7/12.6 % for din0 -> dout11, dout12
minimum!
bias_kern1 = -40.3/17.2% for din1 -> dout01 bias_kern2 = -38/12.6% for din2 -> dout02, dout12
maximum!
Total 2031 JJs Total bias current 224.4 mAopen466, no. 4 chip F2 example:
din0 -> dout01, dout02, dout12control pattern: CBT0 “10”, CBT1 “01”, CBT2 “10”
din0
cross10bar00clkin_lfin
dout01
dout11
clkoutclkout1clkout2
dout02
dout12
cross11cross01
bar02bar12
bias_kern0 = -14.6/5.3 % bias_kern1 = -23/17.2% bias_kern2 = -23/12.6%
bias_kern1 margins for din0 -> dout11 routing
-30.000
-25.000
-20.000
-15.000
-10.000
-5.000
0.000
5.000
10.000
15.000
20.000
10.842 12.679 14.324 15.858 17.241 18.818 20.345 21.854 23.480 upper margin
lower margin
bias_kern1 margins for din0 -> dout01 routing
-30.000
-25.000
-20.000
-15.000
-10.000
-5.000
0.000
5.000
10.000
15.000
20.000
10.842 12.679 14.324 15.858 17.241 18.818 20.345 21.854 upper margin
lower margin
1-to-2 ORN: high frequency test and bias margins frequency dependence
din0
clkin_lfout1clkin_hfclkin_lfin
dout01dout11
clkoutclkout1clkout2
dout02dout12
bar10bar00
bar11bar01
open466, no. 4 chip F2 example:din0 -> dout11control pattern: CBT0 “00”, CBT1 “00”
bias_kern0 margins were frequency independent measured at the frequencies up to 23.5 GHz
bias_kern2 margins for din0 -> dout02, dout12 routing
-30.000
-25.000
-20.000
-15.000
-10.000
-5.000
0.000
5.000
10.000
15.000
10.842 12.679 14.324 15.858 17.241 18.818 20.345
upper margin
lower margin
Conclusion
we have proposed two different architectures of an ORN:
NDRO-based crossbar-based
complexity comparison has been done and crossbar-based ORN is considered a better option due to its scalability and pipelined structure
a new version of the crossbar switch has been designed
two versions of a crossbar with a multicasting function have been designed for 2.5 kA/cm2 process and successfully tested at the frequencies up to 28 GHz
a 1-to-2 ORN has been designed and successfully tested at the frequencies up to 23.5 GHz