all programmable fpga, hardware efficiency to software … · 2017. 10. 10. · software defined...

35
All Programmable FPGA, hardware efficiency to software programmers ecosystem 陆佳华,[email protected]

Upload: others

Post on 12-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

All Programmable FPGA, hardware efficiency to software programmers ecosystem

陆佳华,[email protected]

Page 2: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Traditional Compute Architectures Are Not Scalable

Page 3: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Heterogeneous Domain-Optimized Computing Platforms

Page 4: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

FPGA Silicon Architecture

Logic

ArrayDSP Array

Standard FPGA

GTsIOs

Configuration Network

Programmable Interconnect

Config

PC

Ie

BRAM

Distr RAM

Page 5: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

All Programmable FPGA Silicon Architecture

Logic

Array

IOs

Zynq MPSoC (16nm)

A53

L1$

L2$

A53

L1$

A53

L1$

A53

L1$

System MMU

DD

R

Fla

sh

US

B

Gig

E

R5 R516Gbps

30Gbps

GTs

Config

Security

P-States

CC & SVM

Inte

rla

ke

n

Video

Enc

10

0 G

igE

PC

Ie

DSP Array

Configuration Network

Programmable Interconnect

GPU

X

PC

Ie

OCM

BRAM

URAM

Distr RAM

HBM

ADC

DAC

CC

IX

Page 7: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Typically productivity improvement via:

Creation of HW-optimized functions in C/C++

Accelerated verification (>1000X RTL)

Automated, intelligent assembly (15x manual)

High-Level Hardware Design Environment

C

Libraries

HLS

High-Level IP

Creation

Automated IP Assembly

Design Methodology

IP Sub-

systems

Page 7

Page 8: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Domain Specific Compute Platform

JESD

In

CPRI

OutAXISDI HDMIAXI

Networking Shell Video Shell Wireless Shell

PR RegionPR Region

Host Bridge

EMAC EMAC

DMA

MEMC

AXI

CPU

PPP PPP PPP

Shell : Domain specific infrastructure (gray)

Role : ‘Donut holes’ in FPGA or on CPU executing programmable functions (white)

Host Bridge Host Bridge

DMA

MEMC

CPU

Page 8

One Board

Multiple Shells

Role

Shell

Page 9: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 9

Personas for All Programmable FPGA

End

application

SW platform

HW platform

Silicon and boards

HW designer

Software designer

Application developer

Page 10: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Heterogeneous Compute Platforms in Datacenter

Machine Learning

ASICs (TPU by Google)

FPGA (Xilinx, Intel)

GPU (NVidia Tesla P4)

CPU Accelerators Memory Networking Other

How Server Components Change with Machine Learning

Storage

Page 11: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Enabling

Cars and

Machines

to See it All

Autonomous Cars

Enabling Industry Growth Drivers: The ‘Megatrends’

Accelerating

Storage,

Networking,

and Compute

Cloud Computing

Enabling

Safe and Secure

Connected

Machines

Industrial IoT

Connecting

Any Band,

Any Standard,

Any Network

5G Wireless

Machine LearningEdge Cloud

Page 12: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 12

Page 13: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Xilinx

In-Memory Databases

Video Transcode

Machine Learning

NFV

Page 13

All Programmable FPGA Opportunities in the Cloud

Key-values dataflow

Mix of DRAM and NV

Software Defined

Storage

Moving window

dataflow

Bit level processing

(7bit, 8bit)

Massive compute

(MACs)

2D Parallel

Distributed memory

Broadcast

interconnect

Wireline speed

Bit level protocols

String search

Crypto & security

Deep Packet Inspection

SmartNICwww.netfpga.org

Software Defined SSDFidus Sidewinder-100

caffe

Compute Accelerationwww.tul.com.tw/ProductsFPGA.html

Page 14: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 14

Moving up the stack : Machine Learning Stack

Software Kernel development

APIs & Libraries

CNN Network Design Tools & Training Frameworks (e.g. CAFFE)

Reference Networks & Custom Networks

HW Platform development

Xilinx

Customers

OpenSource

CNN Compiler & Runtime

Machine Learning

Stack

Page 15: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 15

FaaS : FPGA as a Service

CPU FPGAAmazon EC2 F1 is a compute instance with

Xilinx FPGAs for hardware acceleration

F1 instances can be programmed using

SDAccel

An FPGA bitstream can be registered as an

Amazon FPGA Image (AFI)

Page 16: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 16

AWS F1 Platform Model

PCIe

Custom Kernels

DMA

DDR

AXI Interfaces

x86 Host CPU Xilinx FPGA

Custom Application

Drivers

OpenCL Runtime

OpenCL API

User

Application

Code

AWS F1

Platform

Page 17: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

From Cloud to Edge Computing

CLOUD

Data CenterEmbedded

Desktop

Mobile Server

……

IOT

The Edge/FogPerformance

Scale

Transactional

Real-time

Deterministic

Resourceful

Page 18: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

From Cloud to Edge Computing

Cloud

Page 19: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Back to the Future

Page 20: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Fog Computing in the Edge

Cloud

Edge

Page 21: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Cameras Radar LIDAR Ultrasound Sound

Autonomous Driving

Autonomous Driving : 25+ sensors

100’s of TOPS

Page 22: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 22

From ADAS to Autonomous Driving

Page 23: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Unique Value Prop FPGA

CPU/GPU

FPGA

1. Dataflow

2. Distributed Memory

3. Broadcast Interconnect

4. Reduced Precis. Arithmetic

Page 24: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Application Development

Algorithm Development

Platform Development

DNN

CNNGoogLeNet

SSD

FCN …

Page 24

Page 25: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Xilinx Foundation at the EdgeVision Customers Using Xilinx

>80ProAV & Broadcast Suppliers

>60Smart Camera & Visualization Suppliers

>50Industrial Vision Equipment Makers

>10Medical Diagnostic Suppliers

>5VR/AR Equipment Makers

>80ADAS Models From 23 Makers

>5Drone Suppliers

Page 25

Page 26: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Breakouts in Integration & Programming

Breakout in System Integration

ALL Programmable Devices

ZynqMPSoCs

UltraScale+HBM/CCIX Enhanced FPGAs

SDxSoftware Defined Environments

ZynqRFSoCs

ALL Programmable Models

Breakout in Programming Models

Page 27: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Setting the Course for the Next 5 Years

Goal: 5X Potential Users in 5 Years

1000s of SoC Teams with

SW and HW Engineers

>250,000 SW Engineers

>50,000 HW Engineers

For Highest Growth

Megatrends

Page 27

Page 28: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 28

Inspiration from the software community

Page 29: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 29

Programming for the Edge

PYTHON Productivity for ZYNQ MPSoC

Porting SW Methods from the Cloud to Physical Reality of IOT Devices in the Edge

Leverage Python Open Source Community

Interactive & Modern User Interface Jupyter Notebooks

Unleash unique capabilities of All Programmable via curated

list of Overlay Libraries

Page 30: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

The rise of productivity languages

Systems/Efficiency Languages

C, C++, OpenCL

Page 30

Hardware Systems

OS, hypervisor, drivers

Applications,

Programming Frameworks

Productivity Languages,

Python, Scala, ..

Program ZYNQ in a productivity language

Page 31: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Hundreds of people maintain the Linux kernel,tens of millions of people use it!

Linux

Page 31

How can we support hundreds of thousands of Zynq programmers

with hundreds of bitstream developers?

Users:

Apps Programmers:

Device driver writers:

Kernel developers:

Programmers(embedded):

Hundreds of thousands

Tens of thousands

Thousands

Hundreds

Tens of millions

Page 32: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Jupyter on Zynq

Page 32

Ubuntu Server

Tornado Server

IPython Kernel

ARM A9s

Dashboard

Terminal

Editor

Notebooks

Jupyter infrastructure consists of:

a suite of web browser tools

IPython kernel & Tornado web

server

On top of Ubuntu

All Native SW on Zynq CPU

subsystem

Jupyter

Overlays

Pyzynq Libraries

FPGA

Browser tools:

JavaScript,

HTML, CSS

Server Tools:

Python

Page 33: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 33

PYNQ community

Highlight third party overlays

and development

– BNN

– Apache Spark

– Japan user Group

– Soft GPU

– Video processing

– DSP FIR

– Tutorial

– CNN

Open Hardware examples

Page 34: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

Page 34

Find out more at www.xilinx.com/university

www.pynq.io

Access Data Analytics, ML,

Instrumentation, IO programming

on FPGA in the Edge

Run FPGAs in the Cloud

Page 35: All Programmable FPGA, hardware efficiency to software … · 2017. 10. 10. · Software Defined Storage Moving window dataflow Bit level processing (7bit, 8bit) Massive compute (MACs)

© Copyright 2017 Xilinx.

All Programmable Systems and Networks

HARDWARE

OPTIMIZED

SOFTWARE

DEFINED

Talent Development Ecosystem

[email protected]