machine learning in fpgas · 2018. 4. 20. · latency landscape 2 1 ns 1 us 1 ms 1 s cms l1 trigger...

4
Javier Duarte, Burt Holzmann, Sergo Jindariani, Benjamin Kreis, Kevin Pedro, Ryan Rivera, Nhan Tran Fermilab + collaborators from UIC, MIT, CERN and industry Machine learning in FPGAs

Upload: others

Post on 24-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine learning in FPGAs · 2018. 4. 20. · LATENCY LANDSCAPE 2 1 ns 1 us 1 ms 1 s CMS L1 Trigger HLT Offline DUNE LHCb … DAQ Offline Trigger Offline Machine learning is being

Javier Duarte, Burt Holzmann, Sergo Jindariani, Benjamin Kreis, Kevin Pedro, Ryan Rivera, Nhan TranFermilab

+ collaborators from UIC, MIT, CERN and industry

Machine learning in FPGAs

Page 2: Machine learning in FPGAs · 2018. 4. 20. · LATENCY LANDSCAPE 2 1 ns 1 us 1 ms 1 s CMS L1 Trigger HLT Offline DUNE LHCb … DAQ Offline Trigger Offline Machine learning is being

LATENCY LANDSCAPE 2

1 ns 1 us 1 s1 ms

HLTL1 Trigger OfflineCMS

DUNELHCb…

DAQ OfflineTrigger Offline

Machine learning is being used to solve a wide array of problems across a large range of latency constraints

Assumptions most algorithms can be formulated as machine learning problems new specialized hardware will be optimized for machine learning

Page 3: Machine learning in FPGAs · 2018. 4. 20. · LATENCY LANDSCAPE 2 1 ns 1 us 1 ms 1 s CMS L1 Trigger HLT Offline DUNE LHCb … DAQ Offline Trigger Offline Machine learning is being

FIRST COMPLETE, GENERAL TRIGGER STUDY 3

- -- -

- -

-

- /

--- -

/

hls 4 ml

hls4ml

HLS 4 ML

https://hls-fpga-machine-learning.github.io/hls4ml/

https://arxiv.org/abs/1804.06913

SEE RESEARCH TECHNIQUES SEMINAR BY JAVIER DUARTE,

APRIL 24 @ 1030 AM

FERMILAB-PUB-18-089-E

P������� ��� ���������� �� JINST

Fast inference of deep neural networks in FPGAs for

particle physics

Javier Duartea

, Song Hanb

, Philip Harrisb

, Sergo Jindariania

, Edward Kreinarc

, Benjamin

Kreisa

, Jennifer Ngadiubad

, Maurizio Pierinid

, Ryan Riveraa

, Nhan Trana

, Zhenbin Wue

aFermi National Accelerator Laboratory, Batavia, IL 60510, USAbMassachusetts Institute of Technology, Cambridge, MA 02139, USAcHawkEye360, Herndon, VA 20170, USAdCERN, CH-1211 Geneva 23, SwitzerlandeUniversity of Illinois at Chicago, Chicago, IL 60607, USA

E-mail: [email protected]

A�������: Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physicscapabilities through the improvement of the real-time event processing techniques. Machine learningmethods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as awhole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardwarehas only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low,sub-microsecond latency requirements that are unique to particle physics. We present a case study forneural network inference in FPGAs focusing on a classifier for jet substructure which would enable,among many other physics scenarios, searches for new dark sector particles and novel measurementsof the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We developa package based on High-Level Synthesis (HLS) called hls4ml to build machine learning modelsin FPGAs. The use of HLS increases accessibility across a broad user community and allows for adrastic decrease in firmware development time. We map out FPGA resource usage and latency versusneural network hyperparameters to identify the problems in particle physics that would benefit fromperforming neural network inference with FPGAs. For our example jet substructure model, we fit wellwithin the available resources of modern FPGAs with a latency on the scale of 100 ns.

Page 4: Machine learning in FPGAs · 2018. 4. 20. · LATENCY LANDSCAPE 2 1 ns 1 us 1 ms 1 s CMS L1 Trigger HLT Offline DUNE LHCb … DAQ Offline Trigger Offline Machine learning is being

CO-PROCESSORS 4

IntelGoogle(ASIC)

Xilinx/AWS/Huawei

SEE MACHINE LEARNING FORUM TALK, ANDREW PUTNAM (MICROSOFT RESEARCH), MAY 14 @ 1PM

LARGE SPEED/ENERGY GAINS OVER CPU/GPU!

Efficiently interface between CMSSW and accelerator hardware? Some pilot projects in progress with: Microsoft Azure + MSR , Amazon Web Services Exploring connections through CERN OpenLab with: Intel Academia? These systems will only improve (higher throughput, more DSPs/memory)

Goals, benchmark performance of co-processor hardware against other computing architectures (CPU, GPU, etc.)

Microsoft (Wired)