fpgas and the cloud – an endless tale of virtualization ...specs/events/wrc2019... · main...

32
Member of the Helmholtz Association Page Dr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de Dr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de FPGAs and the Cloud – An Endless Tale of Virtualization, Elasticity and Efficiency 13th HiPEAC Workshop on Reconfigurable Computing (WRC) // València // January 21st, 2019 Dr.-Ing. Oliver Knodel // [email protected]

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.deDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

FPGAs and the Cloud – An Endless Tale of Virtualization, Elasticity and Efficiency13th HiPEAC Workshop on Reconfigurable Computing (WRC) // València // January 21st, 2019 Dr.-Ing. Oliver Knodel // [email protected]

Page 2: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

HW Development Prototyping & CI

Our Background @ HZDRMain Challenge: Pre-/post-processing and archiving of research data — Filter and compress measured or simulated data at the edge of the

datacenter (Cloud). — Accelerate compute-intensive tasks with dedicated low-latency,

high-bandwidth hardware.

HPC and hardware acceleration (OpenCL) — Many research questions require compute intensive

deep learning approaches suitable for GPUs and FPGAs. — The research data is located in the data centre anyway.

Prototyping and Continuous Integration (CI) — The custom FPGA designs have to be tested and verified

with every development cycle to meet high requirements. — FPGAs in the datacenter can be used for that during

idle times.

Generate Data

Post-Process Data Deep learning & Analyse

Store Data Filter & Compression

T-Elbe: ~35 GB/s (24/7)

PCI-Express: ~6 GByte/s

I/O per node: ~30 MByte/s

data

cent

er

!2

Page 3: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Structure

1. Introduction and motivation 2. Requirements analysis and system design

— System architecture of the cloud — RC3E — Virtualization of the FPGAs — RC2F

3. Results of the prototypical implementation — Required FPGA-Resources — Behavior simulation of a FPGA-Cloud

4. Outlook: The future FPGA-Cloud at HZDR

!3

Page 4: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

!4

II.Systemarchitectureofthecloud—RC3E

I.Requirementsandstakeholderanalysis

III.VirtualizationoftheFPGAs—RC2F

Page 5: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Requirements and Stakeholder — Provision, Security, Service Models

Approach — The user groups require a different level of

visibility and virtualization. — With division into service models, this abstraction

can be integrated into a resource management system (RC3E).

Service-Models 1. Reconfigurable Silicon as a Service (RSaaS). 2. Reconfigurable Accelerator as a Service

(RAaaS). 3. Background Acceleration as a Service (BAaaS).

!5

Visibility of the hardware

FPGA-Prototyping Produktive cloud foracceleration- and security-tasks

Hardware accessibility

physicalFPGA

physicalFPGA

Compute Node

CPU user-modifiable

static

FPGA

Bitstream

UserApp

Design Tools

Driver

RC2F Host-Hypervisor

User VM

RSaaS

vFPGApartial

Bitstream

UserApp

Design Flow

RC2F Driver

RC2F Host-Hypervisor

Compute VM

RAaaS

vFPGApartial

Bitstream

Service

RC2F Driver

RC2F Host-Hypervisor

Service VM

BAaaS

BitstreamDB

Flexibility

RC2F-Sec: ✘ !✔

Page 6: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Requirements and stakeholder — The user in the context of Cloud-FPGAs

Objectives — Flexible deployment for different groups of people with different needs. — The FPGA is not visible to the enduser and only virtual resources are used by the service providers.

!6

FPGA

Virtualization

PCIe

Passthrough

Datacenter Operator

physical architecture

CPU FPGA

visibility of the hardware

RSaaS

virtual architecture

Cloud Provider

VM vFPGA

RAaaS / BAaaS

Developer

access-rights

abstraction from the hardware

Service Provider

virtual infrastructure

VM vRAI

SaaS BAaaS

application/

service

End User

execution

Page 7: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

!7

II.Systemarchitectureofthecloud—RC3E

I.Requirementsandstakeholderanalysis

III.VirtualizationoftheFPGAs—RC2F

Page 8: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

II. RC3E System architecture — Architecture of the RC3E prototype

— View of the hardware architecture: • The cloud consists of nodes with

one CPU each and (two) PCIe-coupled FPGAs.

• Each physical FPGA can contain multiple virtual FPGAs.

— Manage resources (vFPGA and VMs) across all cloud nodes according to service models.

— Required components: • Local node-management, • RC2F host-hypervisor and • FPGA-hypervisor (located on the

FPGA device)

!8

RC3E Database

User-Statistics

vFPGA- Images/Instances

System- Images

Compute-Node

Local Node-Management — RC3E• Local VM-Management• (v)FPGA Assignment / Configuration• Configuration of virtual devices

Operating System• Network• …

Block- Images

Local Management - Dom0

VMs (DomU)

RC2F Host-Hypervisor• Monitoring• Assignment of devices• FPGA Configuration

vFPGA• Configuration• States• vCS

Physical FPGA

User-VM • Application• Tools / Datasets• vFPGA-API

Local Management - Dom0

System-Hypervisor• Provision of VMs• Virtualization-API• Physical devices

Cloud-Management — RC3E• System Preferences• Resource-Management• Management of Resources• User-API (for user requests)

Load Distribution and Scheduling• Global Resource-Assignment• BAaaS Job Queue• Global Resource-Monitoring

RC3E-Components System-ComponentsUser-Resources

FPGA-Hypervisor• States• Configuration

Client-System

• VM and vFPGA Management• System-Configuration• …

Client-Terminal• Available Resources • User Statistics/Preferences

Management-Node

DistributedFilesystem

Page 9: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

II. RC3E System architecture — FPGA Images (vRAIs)

— All necessary files for a background accelerator are encapsulated in virtual Reconfigurable Acceleration Images (vRAIs).

— The vFPGA itself is described in the form of a Reconfigurable (Device) Configuration (RCFG).

— A simple VM with hardware accelerator can be easily described:

!9

VM FPGA

(v)FPGA

Node

Internet

User

ExampleofaRCFGforavFPGAinmodelBAaaS.

service = 'ba' #Service Model BAaaS

name = ['vfpga-kmeans'] #vFPGA/User Design Name

vm = ['vm1-pvm'] #VM-Instance Name

vfpga = [1] #Number of vFPGAs

size = [4] #vFPGA-Slots

memory = [4000] #DDR-Memory Size

boot= [‘idle'] #Initial vFPGA-State

design = ['kmeans-quad.vrai'] #Initial Design

Host Application

vRAI

RCFG

vFPGA

vFPGA Bereich

vFPGA

vFPGA Bereich

vFPGA-Image

vFPGA-Slots(FARs)

Bitmask(for Context relocation)

Page 10: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

II. RC3E System architecture — Description of a more complex vRAI

!10

FPGAVM 1 VM 2

Node 1 Node 2

Internet

User

vFPGA vFPGA

ExampleofaRCFGdescribingavFPGAclusterwithtwohostVMs.

service = 'rs' #Service Model RSaaS

name = ['cluster'] #vFPGA/User Design Name

vm = [‘vm1’, 'vm2'] #VM-Instance Name

vfpga = [2, 2] #Number of vFPGAs

size = [6, 6] #vFPGA-Slots

memory = [2000] #DDR-Memory Size

debug = 'csp' #Debug Interface

vif = [‘ip=10.0.0.42’; ‘ip=10.0.0.45’] #IP-Adress-Range

boot= [‘idle'] #Initial vFPGA-State

Host Application

vRAI

RCFG

vFPGA

vFPGA Bereich

vFPGA

vFPGA Bereich

vFPGA-Image

vFPGA-Slots(FARs)

Bitmask(for Context relocation)

Page 11: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

II. RC3E System architecture — Interactions between components

!11

Allocation eines vFPGAs mit VM

Starten eines vRAI Paketes

vFPGA von VM entfernen

FPGAHypervisor

Cloud-Management User-VMMiddleware vFPGA

Local Node-Management

FreigabevFPGA Freigabe

vFPGA Freigeben

vFPGA entfernen

vFPGA löschen

vFPGA stoppen

Kanäle entfernen

Belegung DB aktualisieren

Zuordnungentfernen

Status

allocatevFPGA

Cloud-Management NodeA

Compute-Node withvFPGA-ResourcesB Physical

FPGA-Resource C

Anwendung starten

Datenkanal

Start vRAI

vFPGAkonfigurieren

vFPGA status

vCS initialisieren

vFPGA status

vFPGA konfigurieren

Status

configure vFPGA

initializevFPGA

configure HCS with vFPGA data

channelset up

assign occupancyin database

system.rcfg

assignment VM/vFPGAsystem.rcfg

resources available?

system.rcfg

register in database

start VM

startvFPGA

configure vFPGA

vFPGA readyVM ready

Status, VM-IP

(I) Allocation of a vFPGA resource with additional VM

(II) Starten eines vRAI-Paketes

(III) Freigabe einer vFPGA-Ressource

startingpointforRC2Fvirtualization

Page 12: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

!12

II.Systemarchitectureofthecloud—RC3E

I.Requirementsandstakeholderanalysis

III.VirtualizationoftheFPGAs—RC2F

Page 13: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

III. RC2F Virtualization — Virtualization of the FPGAConcept for the virtualization of FPGAs (approach based on classical System-VMs): — The user core is located directly (bare-metal) on the FPGA-resources within a dynamically

reconfigurable region, the virtual FPGA (vFPGA). — The static part of the FPGA contains management structures and the physical interfaces. — The concept is equivalent to traditional virtual machines:

!13

Interface-Virtualization:

— ExternaldevicesareaccessedthroughparavirtualizationwithintheFPGAhypervisor(VMM).

— TheinterfacesofthevFPGAsarethefrontends,whichareconnectedtothebackendinthestaticregion.

static region partial reconfigurable region

vFP

GA

0

DomU

vFP

GA

1

DomU

vFP

GA

N

DomU

…Management /Configuration

Dom0

FrontendInterface

FrontendInterface

FrontendInterface

Hypervisor

BackendInterface

HardwareInterface

Physical FPGA (FPGA-Resources, PCI-Endpoint, …)

NativeTyp1System-/Bare-Metal-VirtualizationwithsameISA

Page 14: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

III. RC2F Virtualization — Definitions— A vFPGA is perceived by the user as a stand-alone resource with a dynamic

number of hardware resources (slices, LUTs, registers, etc.).

— A vFPGA is mapped to vFPGA-Slots — smallest possible physical regions with a fixed number of hardware resources.

— A vFPGA-Design is the hardware design within a vFPGA.

— A partial Bitstream is called vFPGA-Image and

— if this is assigned to a user this becomes a vFPGA-Instance.

Distinctions in the term “Hypervisor”:

— The management structure for the vFPGAs on their host system is called the RC2F Host-Hypervisor.

— The FPGA-Hypervisor is the management structure on the FPGA (Dom0).

vFPGA-Slot 0

vFPGA-Slot 1

vFPGA-Slot 2

vFPGA-Slot 3

Frontend

0

Frontend

1

Frontend

2

Frontend

3

vFP

GA

0

vFPGA 1

PPRPPR PPR PPR

RC2F-Infrastructure

PPR

vFPGA-Instance

vFPGA-Design vFPGA-Image

!14

Page 15: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1

vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||

||||||||||

vControl Unit- vCU -

Config Space (vCS)

||||||||||

||||||||||

hoststream

netpacket

hoststream

netpacket

vControl Unit- vCU -

Config Space (vCS)

ICAP Config

EncryptionUnit

AES, RSA, SHA

Provision of vFPGAs on the physical FPGA-device: — Multiple vFPGAs within vFPGA slots.

— The ICAP is used for dynamic partial reconfiguration.

— Direct management of the states of the vFPGAs on the FPGA within the vControl Unit (vCU).

Components of the infrastructure (FPGA-Hypervisor): — Internal system bus used for the Paravirtualization.

— Hypervisor Control Unit (HCU) to monitor the physical FPGA infrastructure.

— PCIe and Ethernet-Interfaces.

— Access to DDR3 memory on the FPGA board via page tables.

JTAGFlashBBRAM

PCIe Core Dom0

FPGA-Hypervisor

Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1

PCI-Express EndpointHypervisor Control Unit

- HCU -

Config Space (HCS)System StatusvFPGA-Control

InfrastructureManagement

ClockingFPGA Monitoring

vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||

Ethernet

Eth Core

PCI Virtualization VLAN/IP

||||||||||

128

vControl Unit- vCU -

Config Space (vCS)

||||||||||

||||||||||

32

hoststream

netpacket

3232 128 32

hoststream

netpacket

3232

… …

vControl Unit- vCU -

Config Space (vCS)

Memory Core

Memory-Controller

Memory-Virtualization (Pagetable)

256256

256

ICAP Config

EncryptionUnit

AES, RSA, SHA

DDR3 RAM

FPGA Board

Network

Host

Channel-Virtualization……

Asymmetrisch

Symmetrisch

JTAGFlashBBRAM

PCIe Core Dom0

FPGA-Hypervisor

Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1

PCI-Express EndpointHypervisor Control Unit

- HCU -

Config Space (HCS)System StatusvFPGA-Control

InfrastructureManagement

ClockingFPGA Monitoring

vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||

Ethernet

Eth Core

PCI Virtualization VLAN/IP

||||||||||

128

vControl Unit- vCU -

Config Space (vCS)

||||||||||

||||||||||

32

hoststream

netpacket

3232 128 32

hoststream

netpacket

3232

… …

vControl Unit- vCU -

Config Space (vCS)

Memory Core

Memory-Controller

Memory-Virtualization (Pagetable)

256256

256

ICAP Config

EncryptionUnit

AES, RSA, SHA

DDR3 RAM

FPGA Board

Network

Host

Backend-Interface

Frontend-Interface

Hardware-Interface

Channel-Interface

Backend-Interface

Frontend-Interface

Hardware-Interface

Channel-Virtualization……

Asymmetrisch

SymmetrischIII. RC2F Virtualization — RTL

!15

Page 16: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

III. RC2F Virtualization — Homogeneous vFPGA-Slots

RC

2F-In

fras

truc

ture

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastructure (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 0

vFPGA-Slot 1

vFPGA-Slot 2

vFPGA-Slot 3

vFPGA-Slot 4

vFPGA-Slot 5

Inhomogenous columns

FPG

A-In

fras

truk

tur

PC

IeP

CIe

Partition Pin Regions (PPR)

Inhomogenous columns

RC

2F-In

fras

truc

ture

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastructure (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 0 vFPGA-Slot 0

vFPGA-Slot 1 vFPGA-Slot 1

vFPGA-Slot 2 vFPGA-Slot 2

vFPGA-Slot 3 vFPGA-Slot 3

vFPGA-Slot 4 vFPGA-Slot 4

vFPGA-Slot 5 vFPGA-Slot 5

— Establishment of vFPGA-Slots within the clock regions. — Realization of homogeneous vFPGA-Slots on the

physical FPGA to enable migration of vFPGA-Instances. — The distribution is dictated by the inhomogeneous

FPGA architecture.

Spe

iche

rbed

arf

Sei

tent

abel

le (k

Byt

e)

0

150

300

450

600

Seitengröße (MByte)

8 4 2 1 0,5 0,25

Speichergröße 4 GByte Speichergröße 8 GByte Speichergröße 16 GByte Speichergröße 32 GByte

Util

izat

ion

of t

he d

iffer

ent

vFP

GA-S

lots

co

mpa

red

to t

he c

ompl

ete

vFP

GA-S

lot

0

80 %

84 %

88 %

92 %

96 %

100 %

vFPGA-Slot 0 vFPGA-Slot 1 vFPGA-Slot 2 vFPGA-Slot 3 vFPGA-Slot 4 vFPGA-Slot 5 Homogener vFPGA-Slot

Slice Registers/LUTs Block-Ram Tiles

Aus

last

ung

der

unte

rsch

iedl

iche

n vF

PG

A-S

lots

ge

genü

ber

des

volls

tänd

igen

vFP

GA-S

lots

0

80 %

84 %

88 %

92 %

96 %

100 %

vFPGA-Slot 0 vFPGA-Slot 1 vFPGA-Slot 2 vFPGA-Slot 3 vFPGA-Slot 4 vFPGA-Slot 5 Homogener vFPGA-Slot

Slice Registers/LUTs Block-Ram Tiles

�2

BRAMs

Slice Registers / LUTs

DSPs

(Virtex-7 XC7VX485T)

!16

Page 17: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

III. RC2F Virtualization — Application scenariovFPGA-Slot 0: I0 - BSMC8-vFPGA

vFPGA-Slot 4: I 4 - Crypto-vFPGA

vFPGA-Slot 2: I 2 - BSMC8-vFPGA

vFPGA-Slot 3: I 3 - Crypto-vFPGA

vFPGA-Slot 1: I 1 - Crypto-vFPGA

vFPGA-Slot 5: I 5 - BSMC8-vFPGAR

C2F

-Infr

astr

uktu

r(F

PG

A-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

Example assignment within a simple scenario — Full utilization of the physical FPGA by six single vFPGA-Instances. — Release of individual instances by users. — Migration to provide a larger contiguous area. — Placement of a larger vFPGA-Instance (Triple). — Provision of the entire FPGA for a maximum size (hexa) vFPGA-Instance.

vFPGA 0: —

vFPGA 4: —

vFPGA-Slot 2: I 2 - BSMC8-vFPGA

vFPGA 3: —

vFPGA-Slot 1: I 1 - Crypto-vFPGA

vFPGA-Slot 5: I 5 - BSMC8-vFPGA

RC

2F-In

fras

truk

tur

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 0: I 5 - BSMC8-vFPGA

vFPGA 4: —

vFPGA-Slot 2: I 2 - BSMC8-vFPGA

vFPGA 3: —

vFPGA-Slot 1: I 1 - Crypto-vFPGA

vFPGA 5: —

RC

2F-In

fras

truk

tur

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 0: I 5 - BSMC8-vFPGA

vFPGA-Slot 2: I 2 - BSMC8-vFPGA

vFPGA-Slot 1: I 1 - Crypto-vFPGA

vFPGA-Slot 3: I 6 - 32x32 Matrix-vFPGA

RC

2F-In

fras

truk

tur

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 3: I 7 - k-Means-vFPGA

RC

2F-In

fras

truk

tur

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

vFPGA-Slot 0: I 5 - BSMC8-vFPGA

vFPGA-Slot 2: I 2 - BSMC8-vFPGA

vFPGA-Slot 1: I 1 - Crypto-vFPGA

vFPGA-Slot 3: I 6 - 32x32 Matrix-vFPGA

RC

2F-In

fras

truk

tur

(FP

GA-H

yper

viso

r, P

CIe

, DD

R)

RC2F-Infrastruktur (Ethernet + ICAP)

Fron

tend

0Fr

onte

nd 1

Fron

tend

2Fr

onte

nd 3

Fron

tend

4Fr

onte

nd 5

!18

Page 18: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

!19

Results of the prototypical implementationof the RC3E & RC2F

Page 19: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Hardware prototype @ TU DresdenCompute-Node 0

CPUIntel Core i7-2600K @3.40GHz

PCIeGen2x4

PCIe Gen2x8

Virtex7 (VC707)

Virtex-6 (ML605)

Network (Gigabit-Ethernet)

Management-Node

XEN Hypervisor 4.6.5

Compute-Node 1

CPUIntel Core i5-4690 @3.50GHz

PCIeGen2x1

PCIeGen2x4

Kintex7 (KC705)

Virtex5 (ML505)

XEN Hypervisor 4.6.5

Local Node-Management Ubuntu 16.04.2 (Dom0)

Kernel 4.10.0,

Local Node-ManagementUbuntu 16.04.2 (Dom0)

Kernel 4.10.0,

VM(DomU)

… VM(DomU)

— Structure of the cloud prototype includes different FPGAs. — Used as a system for development (RSaaS) and teaching (RAaaS). — For the evaluation of a larger system (BAaaS) a RC3E-Simulation

was realized.

!20

Page 20: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Results of the prototypical implementation — Migration

Relocate Mask (MByte) 4,8 9 13 17,3 21,3 25,3 20,3

Relocate Time (s) 0,0528 0,078 0,102 0,1278 0,1518 0,1758 0

Migration 1,72 3,15 4,51 5,98 7,34 8,70 0

single dual triple quad quint hex fullMByte

Spe

iche

rbed

arf

in M

Byt

e

1

10

100

1.000

Größe der (aggregierten) vFPGAs

Single Dual Triple Quad Quint Hexa

vFPGA-Image/Instanz vRAI (homogene vFPGAs)

Zeitd

auer

in S

ekun

den

(s)

0,01

0,1

1

10

Größe der (aggregierten) vFPGAs

Single Dual Triple Quad Quint Hexa

Konfiguration vFPGA-Image

Readback vFPGA-Instanz

Migration vFPGA-Instanz

Relocation vFPGA-Instanz

Demonstratoren

(I) a (I) a (II) (III) (IV) (V) Mean

slots 1 3 1 4 1 6 2,66666666666667

Szenario

I II III IV V

vFPGA-Image 0,264 0,132 1,7656 0,085 4,20 6,44931666666667

Laufzeit 60 60 60 60 60 300

Reg

LUT

BRAM

DSP

(Re-)Konfiguration, Readback und Migration vFPGAs aktiv

�2

Tim

e in

sec

onds

(s)

Configuration

!21

(Virtex-7 XC7VX485T)

Page 21: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Results of the prototypical implementation — Utilization

Table 1

---inhomogen--- homogen leer 15 x vFPGA infra komplett

CLB LUTs 68.160 63.360 4.800 950.400 188.640 1.182.240

LUT as Logic 68.160 63.360 4.800 950.400 188.640 1.182.240

LUT as Memory 35.040 30.240 4.800 453.600 95.040 591.840

CLB Registers 136.320 126.720 9.600 1.900.800 377.280 2.364.480

Register as Flip Flop 136.320 126.720 9.600 1.900.800 377.280 2.364.480

Register as Latch 136.320 126.720 9.600 1.900.800 377.280 2.364.480

CARRY8 8.520 7.920 600 118.800 23.580 147.780

F7 Muxes 34.080 31.680 2.400 475.200 94.320 591.120

F8 Muxes 17.040 15.840 1.200 237.600 47.160 295.560

F9 Muxes 8.520 7.920 600 118.800 23.580 147.780

CLB 8.520 7.920 600 118.800 23.580 147.780

CLBL 4.140 4.140 0 62.100 11.700 73.800

CLBM 4.380 3.780 600 56.700 11.880 73.980

LUT Flip Flop Pairs 68.160 63.360 4.800 950.400 188.640 1.182.240

Block RAM Tile 120 120 0 1.800 360 2.160

RAMB36/FIFO 120 120 0 1.800 360 2.160

RAMB18 240 240 0 3.600 720 4.320

URAM 64 64 0 960 720 960

DSPs 408 408 0 6.120 360 6.840

Bonded IOB 52 52 0 780 360 832

HPIOB_M 24 24 0 360 28 384

HPIOB_S 24 24 0 360 7 384

HPIOB_SNGL 4 4 0 60 6 64

HPIOBDIFFINBUF 48 48 0 720 14 720

HPIOBDIFFOUTBUF 48 48 0 720 14 720

BITSLICE_CONTROL 16 16 0 240 3 240

BITSLICE_RX_TX 104 104 0 1.560 3 1.560

BITSLICE_TX 16 16 0 240 11.520 240

RIU_OR 8 8 0 120 12 120

GLOBAL CLOCK BUFFERs

80 80 0 1.200 3 1.560

BUFGCE 48 48 0 720 3 720

BUFGCE_DIV 8 8 0 120 6 120

BUFG_GT 24 24 0 360 3 720

0

Summe 735.136 680.632 54.000 10.209.480 2.031.120 12.726.240

0,004243201448346090,802238524497416 0,159600950477124

V7-485 2.794.220

Diff 5

Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs

30.191 6.70992.424

RC2F-Prototyp Virtex-7 XC7VX485T

Prognostische Abschätzung für die Cloud Virtex-7 UltraScale+ XCVU9P

x 4,55

Table 2

infra inhomogen homogen

Slice LUTs 90560 32600 28400

LUT as Logic 90560 32600 28400

LUT as Memory 36744 14400 13200

Slice Registers 181120 65200 56800

Register as Flip Flop

181120 65200 56800

Register as Latch 181120 65200 56800

F7 Muxes 45280 16300 14200

F8 Muxes 22640 8150 7100

Slice 22640 8150 7100

SLICEL 13454 4550 3800

SLICEM 9186 3600 3300

LUT Flip Flop Pairs 90560 32600 28400

Block RAM Tile 358 100 100

RAMB36/FIFO 358 100 100

RAMB18 716 210 200

DSPs 692 340 340

GTXE2_COMMON 2

GTXE2_CHANNEL 8

IBUFDS_GTE2 1

BUFGCTRL 7

MMCME2_ADV 2

ICAPE2 1

PCIE_2_1 1

utili

zatio

n of

the

FP

GA

res

ourc

es

0 %

20 %

40 %

60 %

80 %

partial reconfigurable areas containing the vFPGA-Slots

static area containing the RC2F-infrastructure and 6 vFPGA-Frontends

areas not useable due to inhomogeneity

slice registers Block-RAM tiles slice LUTs DSPs

2,43 %

24,71 %

72,86 %

11,09 %

32,78 %

58,50 %

4,08 %

34,76 %

61,17 %

8,92 %

32,78 %

58,30 %

62,34 %6,59 %

31,07 %

static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots

static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots

RC2F-Prototype Virtex-7 XC7VX485T

estimation for a productive cloud Virtex-7 UltraScale+ XCVU9P

x 4,55

Proz

entu

aler

Ant

eil d

er H

ardw

are-

Res

sour

cen

an d

en R

esso

urce

n de

s g

esam

ten

FPG

As

0 %

20 %

40 %

60 %

80 %

Dynamisch rekonfigurierbareBereiche für vFPGA-Slots

Statischer Bereich für RC2F-Infrastruktur und 6 vFPGA-Frontends

Aufgrund der Inhomogenitätnicht nutzbare Bereiche

Slice Registers Block-RAM Tiles Slice LUTs DSPs

Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs

�1

!22

(Virtex-7 XC7VX485T)

Page 22: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Results of the prototypical implementation — Virtex-7 UltraScale+

Table 1

---inhomogen--- homogen leer 15 x vFPGA infra komplett

CLB LUTs 68.160 63.360 4.800 950.400 188.640 1.182.240

LUT as Logic 68.160 63.360 4.800 950.400 188.640 1.182.240

LUT as Memory 35.040 30.240 4.800 453.600 95.040 591.840

CLB Registers 136.320 126.720 9.600 1.900.800 377.280 2.364.480

Register as Flip Flop 136.320 126.720 9.600 1.900.800 377.280 2.364.480

Register as Latch 136.320 126.720 9.600 1.900.800 377.280 2.364.480

CARRY8 8.520 7.920 600 118.800 23.580 147.780

F7 Muxes 34.080 31.680 2.400 475.200 94.320 591.120

F8 Muxes 17.040 15.840 1.200 237.600 47.160 295.560

F9 Muxes 8.520 7.920 600 118.800 23.580 147.780

CLB 8.520 7.920 600 118.800 23.580 147.780

CLBL 4.140 4.140 0 62.100 11.700 73.800

CLBM 4.380 3.780 600 56.700 11.880 73.980

LUT Flip Flop Pairs 68.160 63.360 4.800 950.400 188.640 1.182.240

Block RAM Tile 120 120 0 1.800 360 2.160

RAMB36/FIFO 120 120 0 1.800 360 2.160

RAMB18 240 240 0 3.600 720 4.320

URAM 64 64 0 960 720 960

DSPs 408 408 0 6.120 360 6.840

Bonded IOB 52 52 0 780 360 832

HPIOB_M 24 24 0 360 28 384

HPIOB_S 24 24 0 360 7 384

HPIOB_SNGL 4 4 0 60 6 64

HPIOBDIFFINBUF 48 48 0 720 14 720

HPIOBDIFFOUTBUF 48 48 0 720 14 720

BITSLICE_CONTROL 16 16 0 240 3 240

BITSLICE_RX_TX 104 104 0 1.560 3 1.560

BITSLICE_TX 16 16 0 240 11.520 240

RIU_OR 8 8 0 120 12 120

GLOBAL CLOCK BUFFERs

80 80 0 1.200 3 1.560

BUFGCE 48 48 0 720 3 720

BUFGCE_DIV 8 8 0 120 6 120

BUFG_GT 24 24 0 360 3 720

0

Summe 735.136 680.632 54.000 10.209.480 2.031.120 12.726.240

0,004243201448346090,802238524497416 0,159600950477124

V7-485 2.794.220

Diff 5

Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs

30.191 6.70992.424

RC2F-Prototyp Virtex-7 XC7VX485T

Prognostische Abschätzung für die Cloud Virtex-7 UltraScale+ XCVU9P

x 4,55

Table 2

infra inhomogen homogen

Slice LUTs 90560 32600 28400

LUT as Logic 90560 32600 28400

LUT as Memory 36744 14400 13200

Slice Registers 181120 65200 56800

Register as Flip Flop

181120 65200 56800

Register as Latch 181120 65200 56800

F7 Muxes 45280 16300 14200

F8 Muxes 22640 8150 7100

Slice 22640 8150 7100

SLICEL 13454 4550 3800

SLICEM 9186 3600 3300

LUT Flip Flop Pairs 90560 32600 28400

Block RAM Tile 358 100 100

RAMB36/FIFO 358 100 100

RAMB18 716 210 200

DSPs 692 340 340

GTXE2_COMMON 2

GTXE2_CHANNEL 8

IBUFDS_GTE2 1

BUFGCTRL 7

MMCME2_ADV 2

ICAPE2 1

PCIE_2_1 1

utili

zatio

n of

the

FP

GA

res

ourc

es

0 %

20 %

40 %

60 %

80 %

partial reconfigurable areas containing the vFPGA-Slots

static area containing the RC2F-infrastructure and 6 vFPGA-Frontends

areas not useable due to inhomogeneity

slice registers Block-RAM tiles slice LUTs DSPs

static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots

16,52 %

83,04 %

0,44 %

62,34 %6,59 %

31,07 %

static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots

RC2F-Prototype Virtex-7 XC7VX485T

estimation for a productive cloud Virtex-7 UltraScale+ XCVU9P

x 4,55

Proz

entu

aler

Ant

eil d

er H

ardw

are-

Res

sour

cen

an d

en R

esso

urce

n de

s g

esam

ten

FPG

As

0 %

20 %

40 %

60 %

80 %

Dynamisch rekonfigurierbare Bereiche für vFPGA-Slots

Statischer Bereich für RC2F-Infrastruktur und 6 vFPGA-Frontends

Aufgrund der Inhomogenität nicht nutzbare Bereiche

Slice Registers Block-RAM Tiles Slice LUTs DSPs

Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs

�1

!23

Page 23: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Results of the RC3E simulation with a synthetic workload (BAaaS)

a

b

c

Synthetic workload with 4,981 requests over 150 minutes.time in minutes

inco

min

g re

ques

ts

!24

a

b

c

d

Assignment of vFPGA-Slots in an simulation with additional migration to reduce defragmentation (+FPGAs +RC2F +migration).

time in minutes

num

ber

of o

ccup

ied

vFPG

A-S

lots

Number of vFPGA-Slots Allocated compute nodes Occupied vFPGA-Slots vFPGA-migration

Page 24: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Results of the RC3E Simulation (BAaaS)

— By using two Virtex-7 FPGAs per node, the energy demand of the cloud can be reduced by 30.65 %.

— RC2F virtualization reduces the energy demand of the cloud system to 24.43 %.

— An additional migration of vFPGAs adds an extra 1.45 % compared to a simple RC2F virtualization.

Scenario I (synthetic) — 4,981 requests

Basis (without FPGAs) +FPGAs +RC2F +Migration

Compute Nodes 357 132 26 24

Utilization of the FPGAs — 27.34 % 78.14 % 85.07 %

Energy demand (kWh) 35.37 kWh 24.53 kWh 8.64 kWh 8.13 kWh

Energy demand (%) 100 % 69.35 % 24.43 % 22.91 %

!25

Page 25: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Final considerations, summary and outlook

!26

Page 26: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Outlook: Aspects for our Future FPGA Cloud (with virtualization?)

— Provision of reconfigurable Hardware in a Cloud requires an abstraction from the physical hardware and a flexible environment (RC3E & RC2F).

— A fine-grained virtualization with multiple users on the same physical device is possible and can be reasonable for the background acceleration of suitable services in a productive cloud (BAaaS),

— … but inefficient for most of the typical scenarios in our research context at the HZDR.

HW Development Prototyping & CIPost-Process Data

Deep learning & Analyse

Store Data Filter & Compression

!27

Page 27: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

!28

https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/

https://fudzilla.com

/news/memory-and-storage/46

283-alibaba-gets-xi

linx-fpga-for-its-f3-c

loud

https://techcrunch.com/2018/04/18/facebook-has-a-new-job-posting-calling-for-chip-

designers/

https://www.forbes.co

m/sites/moorinsights/2017/09/2

7/amazon-and-xilinx-

deliver-new-fpga-solut

ions/#3f880862370a

https://thenewstack.io/developers-fpgas-cloud/

https://www.computerworld.com.au/article/640970/microsoft-takes-step-towards-delivering-fpgas

Page 28: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

References

[KG+18] Oliver Knodel, Paul Genßler,Fredo Erxleben and Rainer Spallek. “An Endless Tale of Virtualization, Elasticity and Efficiency“. In: International Journal on Advances in Systems and Measurements, vol 11 no 3&4. IARIA. 2016.

[KGS17] Oliver Knodel, Paul R. Genssler and Rainer G. Spallek. „Virtualizing Reconfigurable Hardware to Provide Scalability in Cloud Architectures“. In: Reconfigurable Architectures, Tools and Applications, RECATA 2017, Rome, Italy, September 10 - 14, ISBN: 978-1-61208-585-2, CENICS 2017 Best Paper Award. IARIA. 2017, S. 33–38.

[KGS16] Oliver Knodel, Paul Genßler and Rainer Spallek. “Migration of long-running Tasks between Reconfigurable Resources using Virtualization”. In: ACM SIGARCH Computer Architecture News Volume 44, HEART 2016, July 25-27, Hongkong. ACM. 2016.

[KLS16] Oliver Knodel, Patrick Lehmann and Rainer G Spallek. “RC3E: Reconfigurable Accelerators in Data Centres and their Provision by Adapted Service Models”. In: 9th Int’l Conf. on Cloud Computing, Cloud 2016, June 27 - July 2, San Francisco, CA, USA. IEEE. 2016.

[Kno14] Oliver Knodel. “Integration von FPGA-Ressourcen als Hardwarebeschleuniger und deren Bereitstellung sowie Verwaltung in einem Mehrbenutzersystem”. In: Dresdner Arbeitsta- gung Schaltungs- und Systementwurf, DASS 2014, Dresden, Germany. 2014.

[KS13] Oliver Knodel and Rainer Spallek. “FPGAs in der Cloud: Integration und Bereitstellung von rekonfigurierbaren Hardware-Ressourcen in einer Cloud-Infrastruktur”. In: 5. Workshop für Grid-, Cloud- und Big-Data-Technologien für Systementwurf und -analyse, Grid4Sys 2013, Dresden, Germany. 2013.

[KS15] Oliver Knodel and Rainer G. Spallek. “Computing Framework for Dynamic Integration of Reconfigurable Resources in a Cloud”. In: 2015 Euromicro Conference on Digital System Design, DSD 2015, Madeira, Portugal, August 26-28. IEEE. 2015, S. 337–344.

!29

Page 29: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Appendix

!30

Page 30: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Outlook: Aspects for Future FPGA Architectures

— SoC-FPGA with a static RC2F-like Infrastructure:Development of a static area for the administration with all necessary interfaces to the outside world.

— Structural changes within the FPGA architecture: Establishment of homogeneous and areas with own clock regions on the FPGAs.

— Implementation of a security concept: Security concept based on a trusted authority to provide verifiable RC2F infrastructure.

!31

https://www.wired.com/2016/09/microsoft-bets-future-chip-

reprogram-fly/

Page 31: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

vFPGAs and the HostFPGAHost

Hos

t H

yper

viso

r

FPG

A H

yper

viso

r

Har

dwar

e In

terf

ace

Bac

kend

Inte

rfac

eFr

onte

nd

Inte

rfac

e

||||||||

Har

dwar

e Tr

eibe

rDom0 — Verwaltung |||||||| Dom0 — VerwaltungvFPGA Konfiguration,Nutzerzuordnung und

Zugriffsverwaltung

Fron

tend

In

terf

ace

Ger

äte-

D

atei

en

Fron

tend

In

terf

ace

Ger

äte-

D

atei

en

DomU — VM 0

Bac

kend

Inte

rfac

eInter-Domain

Channelrc2f_system_readrc2f_system_writerc2f_config_readrc2f_config_write

/dev

rc2f_readrc2f_writerc2f_vcs

/dev

DomU — VM N

rc2f_readrc2f_writerc2f_vcs

/dev

Dev

ice

files

Zuor

dnun

g de

r G

erät

e

Bac

kend

In

terf

ace

||||||||

Mem

|||||||| DomU — vFPGA 0

||||||||

Mem

||||||||

||||||||||||||||

rc2f_vcs

rc2f_readrc2f_write

rc2f_hcs

rc2f_hcs

rc2f_system_readrc2f_system_write

rc2f_config_readrc2f_config_write

Fron

tend

In

terf

ace

||||||||

Mem

|||||||| DomU — vFPGA Nrc2f_vcs

rc2f_readrc2f_write

physische Verbindung

virtuelle Verbindung

|||||||| FIFO-Speicher

Ringspeicher

!32

Page 32: FPGAs and the Cloud – An Endless Tale of Virtualization ...specs/events/wrc2019... · Main Challenge: Pre-/post-processing and archiving of research data — Filter and compress

Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de

Virtex-7 Ultrascale+

!35