programming fpgas2020/04/02  · experiments for nearest neighbor computations 6x - 64x worse...

11
The Open Source Way Programming FPGAs Ahmed Sanaullah Senior Data Scientist Office of the CTO Ulrich Drepper Distinguished Engineer Office of the CTO 1 Hugh Brock Research Director Office of the CTO

Upload: others

Post on 16-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

The Open Source Way

Programming FPGAs

Ahmed SanaullahSenior Data Scientist

Office of the CTO

Ulrich DrepperDistinguished Engineer

Office of the CTO

1

Hugh BrockResearch Director

Office of the CTO

Page 2: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

2

Productivity for FPGAs: A Simplified Model

Simply put, being productive means getting all required functions on the FPGA with low effort and high performance

Page 3: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

3

Stages Affecting Productivity in a FPGA Toolchain

High Level Synthesis Compiler

Verilog/VHDL Code

IP Block Library

High Level Language Code

Custom HDL Library

Synthesis Logic OptimizerRTL Simulation

Netlist File

Place and Route

Bitstream File

Programmer

FPGA

Software Runtime

RHOS / Shell

FPGA Database (Layout)

FPGA Toolchains have largely been proprietary -> Reduced productivity

Page 4: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

4

Reduced Productivity due to Proprietary Tooling

Lack of Customizability Cost of Individual Licenses

Rigidity of Algorithms Security

Page 5: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

5

Overview of Some Open Source Efforts

Verilog/VHDL Code

Yosys (Synthesis) Berkley ABC (Logic Optimizer)

Netlist File

Nextpnr (Place & Route)

Bitstream File

OpenOCD (Programmer)

FPGA

OPAE (Software Runtime)

Project Icestorm Trellis/XrayRapidWright(FPGA Database)

VerilatorIcarus Verilog

(Simulation)

Morpheus (RHOS)

BU RH Collab (HLS Compiler) High Level Language

Code

Open Cores (Custom HDL Library)

Page 6: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

6

Example: Hacking the Intel OpenCL SDK for FPGAs

Out of box OpenCL performance is really bad!Using documented best practices

Experiments for nearest neighbor computations6x - 64x worse performance than VerilogUp to 7x more resource usage than Verilog

Yang, Chen, et al. "OpenCL for HPC with FPGAs: Case study in molecular electrostatics." 2017 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2017.

Page 7: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

7

Example: Hacking the Intel OpenCL SDK for FPGAs

https://llvm-hpc3-workshop.github.io/slides/Denisenko.pdf

Page 8: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

8

Example: Hacking the Intel OpenCL SDK for FPGAs

For Every Compilation

Once Per Compiler

PROBES IR REPORT

APP HLL APP HLL HDL

CODETRNSFRM

STATICPROFILER

FRONT-ENDCOMPILER

FULLCOMPILER

PREPROCESSOR

Sanaullah, A. (2019). Towards hardware as a reconfigurable, elastic, and specialized service (Doctoral dissertation).

Page 9: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

9

Example: Hacking the Intel OpenCL SDK for FPGAs

Performance Evaluation for Packet Processing Workloads

* Bojie Li et al, “Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proceedings of the 2016 ACM SIGCOMM Conference,pages 1–14. ACM, 2016

AES-256 Comparison SHA-1 Comparison

Page 10: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

10

Example: Hacking the Intel OpenCL SDK for FPGAs

Performance Evaluation for Parallel Computing Dwarfs

Sanaullah, Ahmed, Rushi Patel, and Martin Herbordt. "An Empirically Guided Optimization Framework for FPGA OpenCL." 2018 International Conference on Field-Programmable Technology (FPT). IEEE, 2018.

Page 11: Programming FPGAs2020/04/02  · Experiments for nearest neighbor computations 6x - 64x worse performance than Verilog Up to 7x more resource usage than Verilog Yang, Chen, et al

linkedin.com/company/red-hat

youtube.com/user/RedHatVideos

facebook.com/redhatinc

twitter.com/RedHat

Thank you

11