leveraging openpower and fpgas to accelerate machine

18
© 2016 OpenPOWER Foundation Leveraging OpenPOWER and FPGAs to Accelerate Machine Vision and Learning Lisa Liu, Michaela Blott, Giulio Gambardella, Ken O’Brien Xilinx Research

Upload: others

Post on 28-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leveraging OpenPOWER and FPGAs to Accelerate Machine

© 2016 OpenPOWER Foundation

Leveraging OpenPOWER and FPGAs to Accelerate Machine Vision and Learning

Lisa Liu, Michaela Blott, Giulio Gambardella, Ken O’BrienXilinx Research

Page 2: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Agenda

• Introduction

• Performance characterization & estimation

• Machine vision & learning

• Summary

© 2016 OpenPOWER Foundation 2

Page 3: Leveraging OpenPOWER and FPGAs to Accelerate Machine

© 2016 OpenPOWER Foundation

Introduction

3

Page 4: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Xilinx Research - Ireland

Page 4

- 7 researchers + students & visiting scholars

- 2 in university program

- Est. 10 years ago

Applications & Architectures:Through application-driven technology development with customers, partners, and engineering & marketing

Page 5: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Architecture Trends: Heterogeneous Computing

• End of Moore’s law and Dennard scaling• Technology node scaling becomes increasingly difficult

• Power cost eclipses CAPEX and heat dissipation is problematic

• Resulting economics are questionable (performance/$)

• Heterogeneous computing• Provides further performance scaling

• Reduces power consumption

• CAPI• Provides a coherent way to integrate accelerators into POWER8

• Enables users to build POWER-based heterogeneous platforms

© 2016 OpenPOWER Foundation 5

Page 6: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Ease of Use:New Design Environments

© 2016 OpenPOWER Foundation 6

• ISE, RTL-based design entry with IP library

Legacy

• Vivado HLS

• SDNet (DSL PX)

• Block stitching and manual integration in platform in RTL

Raised abstraction for accelerators

• SDx

• SNAP

• Predefined methods for data transfer & automated implementation

Simplified host integration & automated infrastructure creation

Time

Ab

straction

Page 7: Leveraging OpenPOWER and FPGAs to Accelerate Machine

© 2016 OpenPOWER Foundation

Performance Characterization & Estimation

7

Page 8: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Rooflines for Machine Vision & Learning

© 2016 OpenPOWER Foundation 8

compute bound

Page 9: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Rooflines for Machine Vision & Learning

© 2016 OpenPOWER Foundation 9

Page 10: Leveraging OpenPOWER and FPGAs to Accelerate Machine

© 2016 OpenPOWER Foundation

Machine Vision & Learning

10

Page 11: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Video Scaler

• A functionality to resize video or image via interpolation

• A basic and popular functionality used in data centers

11

Page 12: Leveraging OpenPOWER and FPGAs to Accelerate Machine

CAPI Based Video Scaler for Power8

© 2016 OpenPOWER Foundation 12

• Acceleration of video scaling task with Xilinx IP

• Fully integrated with FFMpeg on Power8

• Prototype results• 1280x720 1920x720

• Video scaler implemented in SW: 3.8 frames/sec

• Video scaler implemented in FPGA: 59 frames/sec

• Speedup: ~15X

PCIEPOWER8 CAPP PSL SHIMXilinx Polyphase

Video Scaler

ADM-PCIE-7V3

Implemented in C/C++

scale up

Page 13: Leveraging OpenPOWER and FPGAs to Accelerate Machine

• Convolutional neural networks are the predominant algorithm for machine learning• Achieving superhuman accuracy

• CNNs have high compute and memory requirements• AlexNet requires 1.456 GOps/frame and 244MB of storage

• Reduced precision networks emerge• 50x reduction in modelsize for 6b at no loss in accuracy [1,2]

• Binary networks show a small loss for large networks [3]

Page 13

Convolutional Neural Networks

Source:

[1] Iandola et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." (2016).

[2] Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, Kurt Keutzer “FireCaffe: near-linear acceleration of deep neural network training on compute

clusters”

[3] Sung et al., “Resiliency of Deep Neural Networks Under Quantization”, ICLR’16 (fully connected network layers for phoneme recognition)

Page 14: Leveraging OpenPOWER and FPGAs to Accelerate Machine

CAPI-Based Neural Network Accelerator

© 2016 OpenPOWER Foundation 14

ADM-PCIE-7V3

POWER8 SHIM

AFU

Streaming

Threshold Threshold Threshold

PSL

Implemented in C/C++

CAPP PCIE

Page 15: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Prototype Results

© 2016 OpenPOWER Foundation 15

CIFAR10 BNN• Input images: 32x32 pixels, RGB image

• 2 (3x3) Conv + Max Pool + 2 (3x3) Conv + Max Pool + 2 Conv + Max Pool + 3 FC

MNIST BNN• Input images: 28x28 pixels, black and white

• Topology: 3 FC layers, 256 neurons each

First prototype results for BNN image classification• Demonstrate order of magnitude improvement at classification rates

Network # OPS / Image FPS - HW TOPS/sec

CIFAR10 BNN 1.2G 6.3k 7.8

MNIST BNN 668k 2.21M 1.4

Page 16: Leveraging OpenPOWER and FPGAs to Accelerate Machine

© 2016 OpenPOWER Foundation

Summary

16

Page 17: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Summary

• Industry trends in computing platforms• Heterogeneous

• Tightly integrated accelerators

• CAPI provides an easy to integrate FPGAs with POWER8 to build heterogeneous computing platforms

• Platform characterization and benchmarking show• Heterogeneous computing platforms can bring significant acceleration for machine vision &

learning algorithms

• Prototype results demonstrate:• 15x speed-up for video scaler

• Orders of magnitude improvements in classification rates through the use of custom data types and architectures on FPGAs

© 2016 OpenPOWER Foundation 17

Page 18: Leveraging OpenPOWER and FPGAs to Accelerate Machine

Q&A

© 2016 OpenPOWER Foundation 18