embedding-based placement of processing element networks on fpgas for physical model simulation
DESCRIPTION
Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation. Intern at SpaceX. Bailey Miller*, Frank Vahid*, Tony Givargis** *Univ. of California, Riverside **Univ. of California, Irvine. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/1.jpg)
Frank Vahid, 1
Embedding-Based Placement of Processing element Networks on
FPGAs for Physical Model Simulation
Bailey Miller*, Frank Vahid*, Tony Givargis**
*Univ. of California, Riverside**Univ. of California, Irvine
Supported by the NSF (CNS1016792, CPS1136146), and Semiconductor Research Corporation (GRC 2143.001)
Intern at SpaceX
![Page 2: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/2.jpg)
Bailey Miller, UCR 2
![Page 3: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/3.jpg)
Frank Vahid, UCR 3
Models of physical world that run in real-time
http://www.nhlbi.nih.gov/
Test cyber-physical systems
Physical mockup
Transducermodels
Environmentmodel
Digital mockup
![Page 4: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/4.jpg)
Issue: Real-time achieved via inaccuracy
Frank Vahid, UCR 4
Weibel lung complexity4 gen: 32 ODEs6 gen: 128 ODEs8 gen: 512 ODEs10 gen: 2048 ODEs
“2-3 minutes to simulate one breath accurately”
V[1],R[1]
V[2],R[2]V[7],R[7]
![Page 5: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/5.jpg)
Frank Vahid, UCR 5
0100200300400500600700800900
1000
Weib
el
Neuron
Weib
el + g
as
Hemod
ynam
ic
Weib
el + h
emo
Per
form
ance
(ms)
PC(1)PC(4)GPUHLS / FPGAs
Speedup vs real-time
PC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS/FPGA: 3.2x
Earlier results14301490 1522
1184
Parallel computations + • Neighbor communication Seem like great match for FPGAs
![Page 6: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/6.jpg)
Frank Vahid, UCR 6
Network of synchronized PEs on FPGAs
FPGA
Digital mockup
General Processing Element• Iterative ODE solver (Euler/RK4) 0.1 ms / 0.01 ms timestep
PEPE
PE 1 PE: 300 MHz
![Page 7: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/7.jpg)
Frank Vahid, UCR 7
0100200300400500600700800900
1000
Weib
el
Neuron
Weib
el + g
as
Hemod
ynam
ic
weibel
+ hem
o
Per
form
ance
(ms)
PC(1)PC(4)GPUHLSGeneral PEs
Speedup vs real-timePC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS: 3.2xGeneral PE: 4.9x
Earlier results1430
1490 1522
1184
![Page 8: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/8.jpg)
Frank Vahid, UCR 8
Current work Problem: More PEs Lower frequency
FPGA
Inter-PE critical path
FPGA
DSP
INST MEM
DATA MEM
Internal PE critical path
0100200300400
0 200 400 600Freq
uenc
y (M
Hz)
# PEs
Frequency vs. PE network size
11-gen Weibel model, Virtex6 240T FPGA, general PEs
0
500
1000
1500
2000
1 10 20 50 100 200 400
# PEs
kODE
s/se
c
Real ODEs/sec
Lost ODEs/sec due to freq drop
![Page 9: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/9.jpg)
Earlier approach
Frank Vahid, UCR 9
10K iterations
150K iterations
Convert virtual PEs to physical circuits using
FPGA place-route
1
2
Phase
Maps ODEs to virtual PEs using
simulated annealing
![Page 10: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/10.jpg)
Frank Vahid, UCR 10
FPGA
This work: Main idea
Graph embedding: Map guest graph to host graph, minim. max wire length
Guest
Host
Virtual PEs
Physical PEs
Use physical model structure to avoid using FPGA placement (Phase 2)
![Page 11: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/11.jpg)
Frank Vahid, UCR 11
FPGA
12 3
…
FPGA FPGA
Phase 2 – Map virtual PEs to physical PEs
Embedding algorithm
H-tree embedding Linear embedding Direct map embedding
Guest
Host
[1] Zienicke, P. 1990. Embeddings of Treelike Graphs into 2-Dimensional Meshes. (WG '90).[2] Aleliunas, R., and Rosenberg, A.L. 1982. On Embedding Rectangular Grids in Square Grids. (Computers ‘82).[3] Berman, F., and Snyder, L. 1987. On mapping parallel algorithms into parallel architectures, (PDC, ‘87).
![Page 12: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/12.jpg)
2D grid of physical PEs
Frank Vahid, UCR 12
EqP1EqV1
EqP2EqV2
EqP3 EqV3
EqP4EqV4
EqP7EqV7
EqP5EqV5
EqP6EqV6
FPGA
Bypass FPGA placement
EqP1EqV1
EqP4EqV4
EqP2EqV2
EqP5EqV5
EqP2EqV2
EqP6EqV6
EqP7EqV7
(Phase 1: May require "graph folding" first to reduce #PEs)
![Page 13: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/13.jpg)
FPGA
Compare/backup: Simulated annealing
Frank Vahid, UCR 13
Cost function:
C = w1*sum + w2*max + w3*gapsSum = sum of wire distancesMax = max wire length (Euclidean dist.)Gaps = wires across architectural features
FPGA
P1 P1
P2
Neighbor function: Swap PEs based on distance to neighbors
![Page 14: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/14.jpg)
Results
Frank Vahid, UCR 14
No placement strategy
Simulated annealing placement
Embedding placement
4 generations shown 5 generations shown 5 generations shown
![Page 15: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/15.jpg)
Results
Frank Vahid, UCR 15
0
100
200
300
400
Model and #PEs
Circ
uit f
requ
ency
(MH
z)
No placement strategy Simulated Annealing Embedding
Not routable
Strategy # LUTS # BRAM #DSP Equivalent LUTs
None 58362 512 256 306682
SA 58567 512 256 306887
Embed 58569 512 256 306889
No impact on size
2D Neuron model - 256PE – Xilinx Virtex6Strategy Total
power (mW)
Dynamic power (mW)
Static power (mW)
None 15525 8744 6481
SA 16604 10013 6590
Embed 19859 12999 6859
20% more powerUsing Xilinx XPower Analyzer
0100200300400
0 200 400 600Freq
uenc
y (M
Hz)
# PEs
Frequency vs. PE network size
![Page 16: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/16.jpg)
Results/Conclusions
Frank Vahid, UCR 16http://www.meti.com/
Speedup vs real-time (avg)
PC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS: 3.2xGeneral PE: 4.9xGrph emb(GPE): 11.2xCustom PE: 6.1xHeterog PE: 34.5x(Grph emb(HPE): 48.5x)
• Graph emb. gives additional speedup– FPGAs: Fastest cost-effective
execution of physical models–
http://www.youtube.com/watch?v=ThUKVhqoA3Q • Future
– Manycore device– Beyond testing CPS
• Implement end-productsSpeedup / dollar
Heterog PE • 3.0x better than
PC(4)• 4.5x better than GPUCPU (I7-950 + Intel X58 board): $480
GPU(GTX460 + I3-540 + H55 board): $380FPGA (Xilinx Virtex6 240T-2 board): $1800
![Page 17: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816122550346895dd07a28/html5/thumbnails/17.jpg)
Questions?
Frank Vahid, UCR 17