fpgas and the cloud – an endless tale of virtualization ...specs/events/wrc2019... · main...
TRANSCRIPT
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.deDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
FPGAs and the Cloud – An Endless Tale of Virtualization, Elasticity and Efficiency13th HiPEAC Workshop on Reconfigurable Computing (WRC) // València // January 21st, 2019 Dr.-Ing. Oliver Knodel // [email protected]
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
HW Development Prototyping & CI
Our Background @ HZDRMain Challenge: Pre-/post-processing and archiving of research data — Filter and compress measured or simulated data at the edge of the
datacenter (Cloud). — Accelerate compute-intensive tasks with dedicated low-latency,
high-bandwidth hardware.
HPC and hardware acceleration (OpenCL) — Many research questions require compute intensive
deep learning approaches suitable for GPUs and FPGAs. — The research data is located in the data centre anyway.
Prototyping and Continuous Integration (CI) — The custom FPGA designs have to be tested and verified
with every development cycle to meet high requirements. — FPGAs in the datacenter can be used for that during
idle times.
Generate Data
Post-Process Data Deep learning & Analyse
Store Data Filter & Compression
T-Elbe: ~35 GB/s (24/7)
PCI-Express: ~6 GByte/s
I/O per node: ~30 MByte/s
data
cent
er
!2
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Structure
1. Introduction and motivation 2. Requirements analysis and system design
— System architecture of the cloud — RC3E — Virtualization of the FPGAs — RC2F
3. Results of the prototypical implementation — Required FPGA-Resources — Behavior simulation of a FPGA-Cloud
4. Outlook: The future FPGA-Cloud at HZDR
!3
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
!4
II.Systemarchitectureofthecloud—RC3E
I.Requirementsandstakeholderanalysis
III.VirtualizationoftheFPGAs—RC2F
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Requirements and Stakeholder — Provision, Security, Service Models
Approach — The user groups require a different level of
visibility and virtualization. — With division into service models, this abstraction
can be integrated into a resource management system (RC3E).
Service-Models 1. Reconfigurable Silicon as a Service (RSaaS). 2. Reconfigurable Accelerator as a Service
(RAaaS). 3. Background Acceleration as a Service (BAaaS).
!5
Visibility of the hardware
FPGA-Prototyping Produktive cloud foracceleration- and security-tasks
Hardware accessibility
physicalFPGA
physicalFPGA
Compute Node
CPU user-modifiable
static
FPGA
Bitstream
UserApp
Design Tools
Driver
RC2F Host-Hypervisor
User VM
RSaaS
vFPGApartial
Bitstream
UserApp
Design Flow
RC2F Driver
RC2F Host-Hypervisor
Compute VM
RAaaS
vFPGApartial
Bitstream
Service
RC2F Driver
RC2F Host-Hypervisor
Service VM
BAaaS
BitstreamDB
Flexibility
RC2F-Sec: ✘ !✔
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Requirements and stakeholder — The user in the context of Cloud-FPGAs
Objectives — Flexible deployment for different groups of people with different needs. — The FPGA is not visible to the enduser and only virtual resources are used by the service providers.
!6
FPGA
Virtualization
PCIe
Passthrough
Datacenter Operator
physical architecture
CPU FPGA
visibility of the hardware
RSaaS
virtual architecture
Cloud Provider
VM vFPGA
RAaaS / BAaaS
Developer
access-rights
abstraction from the hardware
Service Provider
virtual infrastructure
VM vRAI
SaaS BAaaS
application/
service
End User
execution
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
!7
II.Systemarchitectureofthecloud—RC3E
I.Requirementsandstakeholderanalysis
III.VirtualizationoftheFPGAs—RC2F
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
II. RC3E System architecture — Architecture of the RC3E prototype
— View of the hardware architecture: • The cloud consists of nodes with
one CPU each and (two) PCIe-coupled FPGAs.
• Each physical FPGA can contain multiple virtual FPGAs.
— Manage resources (vFPGA and VMs) across all cloud nodes according to service models.
— Required components: • Local node-management, • RC2F host-hypervisor and • FPGA-hypervisor (located on the
FPGA device)
!8
RC3E Database
User-Statistics
vFPGA- Images/Instances
System- Images
Compute-Node
Local Node-Management — RC3E• Local VM-Management• (v)FPGA Assignment / Configuration• Configuration of virtual devices
Operating System• Network• …
Block- Images
Local Management - Dom0
VMs (DomU)
RC2F Host-Hypervisor• Monitoring• Assignment of devices• FPGA Configuration
vFPGA• Configuration• States• vCS
Physical FPGA
User-VM • Application• Tools / Datasets• vFPGA-API
Local Management - Dom0
System-Hypervisor• Provision of VMs• Virtualization-API• Physical devices
Cloud-Management — RC3E• System Preferences• Resource-Management• Management of Resources• User-API (for user requests)
Load Distribution and Scheduling• Global Resource-Assignment• BAaaS Job Queue• Global Resource-Monitoring
RC3E-Components System-ComponentsUser-Resources
FPGA-Hypervisor• States• Configuration
Client-System
• VM and vFPGA Management• System-Configuration• …
Client-Terminal• Available Resources • User Statistics/Preferences
Management-Node
DistributedFilesystem
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
II. RC3E System architecture — FPGA Images (vRAIs)
— All necessary files for a background accelerator are encapsulated in virtual Reconfigurable Acceleration Images (vRAIs).
— The vFPGA itself is described in the form of a Reconfigurable (Device) Configuration (RCFG).
— A simple VM with hardware accelerator can be easily described:
!9
VM FPGA
(v)FPGA
Node
Internet
User
ExampleofaRCFGforavFPGAinmodelBAaaS.
service = 'ba' #Service Model BAaaS
name = ['vfpga-kmeans'] #vFPGA/User Design Name
vm = ['vm1-pvm'] #VM-Instance Name
vfpga = [1] #Number of vFPGAs
size = [4] #vFPGA-Slots
memory = [4000] #DDR-Memory Size
boot= [‘idle'] #Initial vFPGA-State
design = ['kmeans-quad.vrai'] #Initial Design
Host Application
vRAI
RCFG
vFPGA
vFPGA Bereich
vFPGA
vFPGA Bereich
vFPGA-Image
vFPGA-Slots(FARs)
Bitmask(for Context relocation)
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
II. RC3E System architecture — Description of a more complex vRAI
!10
FPGAVM 1 VM 2
Node 1 Node 2
Internet
User
vFPGA vFPGA
ExampleofaRCFGdescribingavFPGAclusterwithtwohostVMs.
service = 'rs' #Service Model RSaaS
name = ['cluster'] #vFPGA/User Design Name
vm = [‘vm1’, 'vm2'] #VM-Instance Name
vfpga = [2, 2] #Number of vFPGAs
size = [6, 6] #vFPGA-Slots
memory = [2000] #DDR-Memory Size
debug = 'csp' #Debug Interface
vif = [‘ip=10.0.0.42’; ‘ip=10.0.0.45’] #IP-Adress-Range
boot= [‘idle'] #Initial vFPGA-State
Host Application
vRAI
RCFG
vFPGA
vFPGA Bereich
vFPGA
vFPGA Bereich
vFPGA-Image
vFPGA-Slots(FARs)
Bitmask(for Context relocation)
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
II. RC3E System architecture — Interactions between components
!11
Allocation eines vFPGAs mit VM
Starten eines vRAI Paketes
vFPGA von VM entfernen
FPGAHypervisor
Cloud-Management User-VMMiddleware vFPGA
Local Node-Management
FreigabevFPGA Freigabe
vFPGA Freigeben
vFPGA entfernen
vFPGA löschen
vFPGA stoppen
Kanäle entfernen
Belegung DB aktualisieren
Zuordnungentfernen
Status
allocatevFPGA
Cloud-Management NodeA
Compute-Node withvFPGA-ResourcesB Physical
FPGA-Resource C
Anwendung starten
Datenkanal
Start vRAI
vFPGAkonfigurieren
vFPGA status
vCS initialisieren
vFPGA status
vFPGA konfigurieren
Status
configure vFPGA
initializevFPGA
configure HCS with vFPGA data
channelset up
assign occupancyin database
system.rcfg
assignment VM/vFPGAsystem.rcfg
resources available?
system.rcfg
register in database
start VM
startvFPGA
configure vFPGA
vFPGA readyVM ready
Status, VM-IP
(I) Allocation of a vFPGA resource with additional VM
(II) Starten eines vRAI-Paketes
(III) Freigabe einer vFPGA-Ressource
startingpointforRC2Fvirtualization
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
!12
II.Systemarchitectureofthecloud—RC3E
I.Requirementsandstakeholderanalysis
III.VirtualizationoftheFPGAs—RC2F
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
III. RC2F Virtualization — Virtualization of the FPGAConcept for the virtualization of FPGAs (approach based on classical System-VMs): — The user core is located directly (bare-metal) on the FPGA-resources within a dynamically
reconfigurable region, the virtual FPGA (vFPGA). — The static part of the FPGA contains management structures and the physical interfaces. — The concept is equivalent to traditional virtual machines:
!13
Interface-Virtualization:
— ExternaldevicesareaccessedthroughparavirtualizationwithintheFPGAhypervisor(VMM).
— TheinterfacesofthevFPGAsarethefrontends,whichareconnectedtothebackendinthestaticregion.
static region partial reconfigurable region
vFP
GA
0
DomU
vFP
GA
1
DomU
vFP
GA
N
DomU
…Management /Configuration
Dom0
FrontendInterface
FrontendInterface
FrontendInterface
Hypervisor
BackendInterface
HardwareInterface
Physical FPGA (FPGA-Resources, PCI-Endpoint, …)
NativeTyp1System-/Bare-Metal-VirtualizationwithsameISA
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
III. RC2F Virtualization — Definitions— A vFPGA is perceived by the user as a stand-alone resource with a dynamic
number of hardware resources (slices, LUTs, registers, etc.).
— A vFPGA is mapped to vFPGA-Slots — smallest possible physical regions with a fixed number of hardware resources.
— A vFPGA-Design is the hardware design within a vFPGA.
— A partial Bitstream is called vFPGA-Image and
— if this is assigned to a user this becomes a vFPGA-Instance.
Distinctions in the term “Hypervisor”:
— The management structure for the vFPGAs on their host system is called the RC2F Host-Hypervisor.
— The FPGA-Hypervisor is the management structure on the FPGA (Dom0).
vFPGA-Slot 0
vFPGA-Slot 1
vFPGA-Slot 2
vFPGA-Slot 3
Frontend
0
Frontend
1
Frontend
2
Frontend
3
vFP
GA
0
vFPGA 1
PPRPPR PPR PPR
RC2F-Infrastructure
PPR
vFPGA-Instance
vFPGA-Design vFPGA-Image
!14
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1
vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||
||||||||||
vControl Unit- vCU -
Config Space (vCS)
||||||||||
||||||||||
hoststream
netpacket
hoststream
netpacket
vControl Unit- vCU -
Config Space (vCS)
ICAP Config
EncryptionUnit
AES, RSA, SHA
Provision of vFPGAs on the physical FPGA-device: — Multiple vFPGAs within vFPGA slots.
— The ICAP is used for dynamic partial reconfiguration.
— Direct management of the states of the vFPGAs on the FPGA within the vControl Unit (vCU).
Components of the infrastructure (FPGA-Hypervisor): — Internal system bus used for the Paravirtualization.
— Hypervisor Control Unit (HCU) to monitor the physical FPGA infrastructure.
— PCIe and Ethernet-Interfaces.
— Access to DDR3 memory on the FPGA board via page tables.
JTAGFlashBBRAM
PCIe Core Dom0
FPGA-Hypervisor
Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1
PCI-Express EndpointHypervisor Control Unit
- HCU -
Config Space (HCS)System StatusvFPGA-Control
InfrastructureManagement
ClockingFPGA Monitoring
vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||
Ethernet
Eth Core
PCI Virtualization VLAN/IP
||||||||||
128
vControl Unit- vCU -
Config Space (vCS)
||||||||||
||||||||||
32
hoststream
netpacket
3232 128 32
hoststream
netpacket
3232
… …
vControl Unit- vCU -
Config Space (vCS)
Memory Core
Memory-Controller
Memory-Virtualization (Pagetable)
256256
256
ICAP Config
EncryptionUnit
AES, RSA, SHA
DDR3 RAM
FPGA Board
Network
Host
Channel-Virtualization……
Asymmetrisch
Symmetrisch
JTAGFlashBBRAM
PCIe Core Dom0
FPGA-Hypervisor
Reconfig Area: vFPGA-Slot 0 Reconfig Area: vFPGA-Slot N-1
PCI-Express EndpointHypervisor Control Unit
- HCU -
Config Space (HCS)System StatusvFPGA-Control
InfrastructureManagement
ClockingFPGA Monitoring
vFPGA-Design— DomU vFPGA-Design — DomU…||||||||||
Ethernet
Eth Core
PCI Virtualization VLAN/IP
||||||||||
128
vControl Unit- vCU -
Config Space (vCS)
||||||||||
||||||||||
32
hoststream
netpacket
3232 128 32
hoststream
netpacket
3232
… …
vControl Unit- vCU -
Config Space (vCS)
Memory Core
Memory-Controller
Memory-Virtualization (Pagetable)
256256
256
ICAP Config
EncryptionUnit
AES, RSA, SHA
DDR3 RAM
FPGA Board
Network
Host
Backend-Interface
Frontend-Interface
Hardware-Interface
Channel-Interface
Backend-Interface
Frontend-Interface
Hardware-Interface
Channel-Virtualization……
Asymmetrisch
SymmetrischIII. RC2F Virtualization — RTL
!15
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
III. RC2F Virtualization — Homogeneous vFPGA-Slots
RC
2F-In
fras
truc
ture
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastructure (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 0
vFPGA-Slot 1
vFPGA-Slot 2
vFPGA-Slot 3
vFPGA-Slot 4
vFPGA-Slot 5
Inhomogenous columns
FPG
A-In
fras
truk
tur
PC
IeP
CIe
Partition Pin Regions (PPR)
Inhomogenous columns
RC
2F-In
fras
truc
ture
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastructure (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 0 vFPGA-Slot 0
vFPGA-Slot 1 vFPGA-Slot 1
vFPGA-Slot 2 vFPGA-Slot 2
vFPGA-Slot 3 vFPGA-Slot 3
vFPGA-Slot 4 vFPGA-Slot 4
vFPGA-Slot 5 vFPGA-Slot 5
— Establishment of vFPGA-Slots within the clock regions. — Realization of homogeneous vFPGA-Slots on the
physical FPGA to enable migration of vFPGA-Instances. — The distribution is dictated by the inhomogeneous
FPGA architecture.
Spe
iche
rbed
arf
Sei
tent
abel
le (k
Byt
e)
0
150
300
450
600
Seitengröße (MByte)
8 4 2 1 0,5 0,25
Speichergröße 4 GByte Speichergröße 8 GByte Speichergröße 16 GByte Speichergröße 32 GByte
Util
izat
ion
of t
he d
iffer
ent
vFP
GA-S
lots
co
mpa
red
to t
he c
ompl
ete
vFP
GA-S
lot
0
80 %
84 %
88 %
92 %
96 %
100 %
vFPGA-Slot 0 vFPGA-Slot 1 vFPGA-Slot 2 vFPGA-Slot 3 vFPGA-Slot 4 vFPGA-Slot 5 Homogener vFPGA-Slot
Slice Registers/LUTs Block-Ram Tiles
Aus
last
ung
der
unte
rsch
iedl
iche
n vF
PG
A-S
lots
ge
genü
ber
des
volls
tänd
igen
vFP
GA-S
lots
0
80 %
84 %
88 %
92 %
96 %
100 %
vFPGA-Slot 0 vFPGA-Slot 1 vFPGA-Slot 2 vFPGA-Slot 3 vFPGA-Slot 4 vFPGA-Slot 5 Homogener vFPGA-Slot
Slice Registers/LUTs Block-Ram Tiles
�2
BRAMs
Slice Registers / LUTs
DSPs
(Virtex-7 XC7VX485T)
!16
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
III. RC2F Virtualization — Application scenariovFPGA-Slot 0: I0 - BSMC8-vFPGA
vFPGA-Slot 4: I 4 - Crypto-vFPGA
vFPGA-Slot 2: I 2 - BSMC8-vFPGA
vFPGA-Slot 3: I 3 - Crypto-vFPGA
vFPGA-Slot 1: I 1 - Crypto-vFPGA
vFPGA-Slot 5: I 5 - BSMC8-vFPGAR
C2F
-Infr
astr
uktu
r(F
PG
A-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
Example assignment within a simple scenario — Full utilization of the physical FPGA by six single vFPGA-Instances. — Release of individual instances by users. — Migration to provide a larger contiguous area. — Placement of a larger vFPGA-Instance (Triple). — Provision of the entire FPGA for a maximum size (hexa) vFPGA-Instance.
vFPGA 0: —
vFPGA 4: —
vFPGA-Slot 2: I 2 - BSMC8-vFPGA
vFPGA 3: —
vFPGA-Slot 1: I 1 - Crypto-vFPGA
vFPGA-Slot 5: I 5 - BSMC8-vFPGA
RC
2F-In
fras
truk
tur
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 0: I 5 - BSMC8-vFPGA
vFPGA 4: —
vFPGA-Slot 2: I 2 - BSMC8-vFPGA
vFPGA 3: —
vFPGA-Slot 1: I 1 - Crypto-vFPGA
vFPGA 5: —
RC
2F-In
fras
truk
tur
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 0: I 5 - BSMC8-vFPGA
vFPGA-Slot 2: I 2 - BSMC8-vFPGA
vFPGA-Slot 1: I 1 - Crypto-vFPGA
vFPGA-Slot 3: I 6 - 32x32 Matrix-vFPGA
RC
2F-In
fras
truk
tur
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 3: I 7 - k-Means-vFPGA
RC
2F-In
fras
truk
tur
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
vFPGA-Slot 0: I 5 - BSMC8-vFPGA
vFPGA-Slot 2: I 2 - BSMC8-vFPGA
vFPGA-Slot 1: I 1 - Crypto-vFPGA
vFPGA-Slot 3: I 6 - 32x32 Matrix-vFPGA
RC
2F-In
fras
truk
tur
(FP
GA-H
yper
viso
r, P
CIe
, DD
R)
RC2F-Infrastruktur (Ethernet + ICAP)
Fron
tend
0Fr
onte
nd 1
Fron
tend
2Fr
onte
nd 3
Fron
tend
4Fr
onte
nd 5
!18
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
!19
Results of the prototypical implementationof the RC3E & RC2F
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Hardware prototype @ TU DresdenCompute-Node 0
CPUIntel Core i7-2600K @3.40GHz
PCIeGen2x4
PCIe Gen2x8
Virtex7 (VC707)
Virtex-6 (ML605)
Network (Gigabit-Ethernet)
Management-Node
XEN Hypervisor 4.6.5
Compute-Node 1
CPUIntel Core i5-4690 @3.50GHz
PCIeGen2x1
PCIeGen2x4
Kintex7 (KC705)
Virtex5 (ML505)
XEN Hypervisor 4.6.5
Local Node-Management Ubuntu 16.04.2 (Dom0)
Kernel 4.10.0,
Local Node-ManagementUbuntu 16.04.2 (Dom0)
Kernel 4.10.0,
VM(DomU)
… VM(DomU)
…
— Structure of the cloud prototype includes different FPGAs. — Used as a system for development (RSaaS) and teaching (RAaaS). — For the evaluation of a larger system (BAaaS) a RC3E-Simulation
was realized.
!20
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Results of the prototypical implementation — Migration
Relocate Mask (MByte) 4,8 9 13 17,3 21,3 25,3 20,3
Relocate Time (s) 0,0528 0,078 0,102 0,1278 0,1518 0,1758 0
Migration 1,72 3,15 4,51 5,98 7,34 8,70 0
single dual triple quad quint hex fullMByte
Spe
iche
rbed
arf
in M
Byt
e
1
10
100
1.000
Größe der (aggregierten) vFPGAs
Single Dual Triple Quad Quint Hexa
vFPGA-Image/Instanz vRAI (homogene vFPGAs)
Zeitd
auer
in S
ekun
den
(s)
0,01
0,1
1
10
Größe der (aggregierten) vFPGAs
Single Dual Triple Quad Quint Hexa
Konfiguration vFPGA-Image
Readback vFPGA-Instanz
Migration vFPGA-Instanz
Relocation vFPGA-Instanz
Demonstratoren
(I) a (I) a (II) (III) (IV) (V) Mean
slots 1 3 1 4 1 6 2,66666666666667
Szenario
I II III IV V
vFPGA-Image 0,264 0,132 1,7656 0,085 4,20 6,44931666666667
Laufzeit 60 60 60 60 60 300
Reg
LUT
BRAM
DSP
(Re-)Konfiguration, Readback und Migration vFPGAs aktiv
�2
Tim
e in
sec
onds
(s)
Configuration
!21
(Virtex-7 XC7VX485T)
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Results of the prototypical implementation — Utilization
Table 1
---inhomogen--- homogen leer 15 x vFPGA infra komplett
CLB LUTs 68.160 63.360 4.800 950.400 188.640 1.182.240
LUT as Logic 68.160 63.360 4.800 950.400 188.640 1.182.240
LUT as Memory 35.040 30.240 4.800 453.600 95.040 591.840
CLB Registers 136.320 126.720 9.600 1.900.800 377.280 2.364.480
Register as Flip Flop 136.320 126.720 9.600 1.900.800 377.280 2.364.480
Register as Latch 136.320 126.720 9.600 1.900.800 377.280 2.364.480
CARRY8 8.520 7.920 600 118.800 23.580 147.780
F7 Muxes 34.080 31.680 2.400 475.200 94.320 591.120
F8 Muxes 17.040 15.840 1.200 237.600 47.160 295.560
F9 Muxes 8.520 7.920 600 118.800 23.580 147.780
CLB 8.520 7.920 600 118.800 23.580 147.780
CLBL 4.140 4.140 0 62.100 11.700 73.800
CLBM 4.380 3.780 600 56.700 11.880 73.980
LUT Flip Flop Pairs 68.160 63.360 4.800 950.400 188.640 1.182.240
Block RAM Tile 120 120 0 1.800 360 2.160
RAMB36/FIFO 120 120 0 1.800 360 2.160
RAMB18 240 240 0 3.600 720 4.320
URAM 64 64 0 960 720 960
DSPs 408 408 0 6.120 360 6.840
Bonded IOB 52 52 0 780 360 832
HPIOB_M 24 24 0 360 28 384
HPIOB_S 24 24 0 360 7 384
HPIOB_SNGL 4 4 0 60 6 64
HPIOBDIFFINBUF 48 48 0 720 14 720
HPIOBDIFFOUTBUF 48 48 0 720 14 720
BITSLICE_CONTROL 16 16 0 240 3 240
BITSLICE_RX_TX 104 104 0 1.560 3 1.560
BITSLICE_TX 16 16 0 240 11.520 240
RIU_OR 8 8 0 120 12 120
GLOBAL CLOCK BUFFERs
80 80 0 1.200 3 1.560
BUFGCE 48 48 0 720 3 720
BUFGCE_DIV 8 8 0 120 6 120
BUFG_GT 24 24 0 360 3 720
0
Summe 735.136 680.632 54.000 10.209.480 2.031.120 12.726.240
0,004243201448346090,802238524497416 0,159600950477124
V7-485 2.794.220
Diff 5
Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs
30.191 6.70992.424
RC2F-Prototyp Virtex-7 XC7VX485T
Prognostische Abschätzung für die Cloud Virtex-7 UltraScale+ XCVU9P
x 4,55
Table 2
infra inhomogen homogen
Slice LUTs 90560 32600 28400
LUT as Logic 90560 32600 28400
LUT as Memory 36744 14400 13200
Slice Registers 181120 65200 56800
Register as Flip Flop
181120 65200 56800
Register as Latch 181120 65200 56800
F7 Muxes 45280 16300 14200
F8 Muxes 22640 8150 7100
Slice 22640 8150 7100
SLICEL 13454 4550 3800
SLICEM 9186 3600 3300
LUT Flip Flop Pairs 90560 32600 28400
Block RAM Tile 358 100 100
RAMB36/FIFO 358 100 100
RAMB18 716 210 200
DSPs 692 340 340
GTXE2_COMMON 2
GTXE2_CHANNEL 8
IBUFDS_GTE2 1
BUFGCTRL 7
MMCME2_ADV 2
ICAPE2 1
PCIE_2_1 1
utili
zatio
n of
the
FP
GA
res
ourc
es
0 %
20 %
40 %
60 %
80 %
partial reconfigurable areas containing the vFPGA-Slots
static area containing the RC2F-infrastructure and 6 vFPGA-Frontends
areas not useable due to inhomogeneity
slice registers Block-RAM tiles slice LUTs DSPs
2,43 %
24,71 %
72,86 %
11,09 %
32,78 %
58,50 %
4,08 %
34,76 %
61,17 %
8,92 %
32,78 %
58,30 %
62,34 %6,59 %
31,07 %
static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots
static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots
RC2F-Prototype Virtex-7 XC7VX485T
estimation for a productive cloud Virtex-7 UltraScale+ XCVU9P
x 4,55
Proz
entu
aler
Ant
eil d
er H
ardw
are-
Res
sour
cen
an d
en R
esso
urce
n de
s g
esam
ten
FPG
As
0 %
20 %
40 %
60 %
80 %
Dynamisch rekonfigurierbareBereiche für vFPGA-Slots
Statischer Bereich für RC2F-Infrastruktur und 6 vFPGA-Frontends
Aufgrund der Inhomogenitätnicht nutzbare Bereiche
Slice Registers Block-RAM Tiles Slice LUTs DSPs
Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs
�1
!22
(Virtex-7 XC7VX485T)
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Results of the prototypical implementation — Virtex-7 UltraScale+
Table 1
---inhomogen--- homogen leer 15 x vFPGA infra komplett
CLB LUTs 68.160 63.360 4.800 950.400 188.640 1.182.240
LUT as Logic 68.160 63.360 4.800 950.400 188.640 1.182.240
LUT as Memory 35.040 30.240 4.800 453.600 95.040 591.840
CLB Registers 136.320 126.720 9.600 1.900.800 377.280 2.364.480
Register as Flip Flop 136.320 126.720 9.600 1.900.800 377.280 2.364.480
Register as Latch 136.320 126.720 9.600 1.900.800 377.280 2.364.480
CARRY8 8.520 7.920 600 118.800 23.580 147.780
F7 Muxes 34.080 31.680 2.400 475.200 94.320 591.120
F8 Muxes 17.040 15.840 1.200 237.600 47.160 295.560
F9 Muxes 8.520 7.920 600 118.800 23.580 147.780
CLB 8.520 7.920 600 118.800 23.580 147.780
CLBL 4.140 4.140 0 62.100 11.700 73.800
CLBM 4.380 3.780 600 56.700 11.880 73.980
LUT Flip Flop Pairs 68.160 63.360 4.800 950.400 188.640 1.182.240
Block RAM Tile 120 120 0 1.800 360 2.160
RAMB36/FIFO 120 120 0 1.800 360 2.160
RAMB18 240 240 0 3.600 720 4.320
URAM 64 64 0 960 720 960
DSPs 408 408 0 6.120 360 6.840
Bonded IOB 52 52 0 780 360 832
HPIOB_M 24 24 0 360 28 384
HPIOB_S 24 24 0 360 7 384
HPIOB_SNGL 4 4 0 60 6 64
HPIOBDIFFINBUF 48 48 0 720 14 720
HPIOBDIFFOUTBUF 48 48 0 720 14 720
BITSLICE_CONTROL 16 16 0 240 3 240
BITSLICE_RX_TX 104 104 0 1.560 3 1.560
BITSLICE_TX 16 16 0 240 11.520 240
RIU_OR 8 8 0 120 12 120
GLOBAL CLOCK BUFFERs
80 80 0 1.200 3 1.560
BUFGCE 48 48 0 720 3 720
BUFGCE_DIV 8 8 0 120 6 120
BUFG_GT 24 24 0 360 3 720
0
Summe 735.136 680.632 54.000 10.209.480 2.031.120 12.726.240
0,004243201448346090,802238524497416 0,159600950477124
V7-485 2.794.220
Diff 5
Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs
30.191 6.70992.424
RC2F-Prototyp Virtex-7 XC7VX485T
Prognostische Abschätzung für die Cloud Virtex-7 UltraScale+ XCVU9P
x 4,55
Table 2
infra inhomogen homogen
Slice LUTs 90560 32600 28400
LUT as Logic 90560 32600 28400
LUT as Memory 36744 14400 13200
Slice Registers 181120 65200 56800
Register as Flip Flop
181120 65200 56800
Register as Latch 181120 65200 56800
F7 Muxes 45280 16300 14200
F8 Muxes 22640 8150 7100
Slice 22640 8150 7100
SLICEL 13454 4550 3800
SLICEM 9186 3600 3300
LUT Flip Flop Pairs 90560 32600 28400
Block RAM Tile 358 100 100
RAMB36/FIFO 358 100 100
RAMB18 716 210 200
DSPs 692 340 340
GTXE2_COMMON 2
GTXE2_CHANNEL 8
IBUFDS_GTE2 1
BUFGCTRL 7
MMCME2_ADV 2
ICAPE2 1
PCIE_2_1 1
utili
zatio
n of
the
FP
GA
res
ourc
es
0 %
20 %
40 %
60 %
80 %
partial reconfigurable areas containing the vFPGA-Slots
static area containing the RC2F-infrastructure and 6 vFPGA-Frontends
areas not useable due to inhomogeneity
slice registers Block-RAM tiles slice LUTs DSPs
static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots
16,52 %
83,04 %
0,44 %
62,34 %6,59 %
31,07 %
static area containing the RC2F-infrastructure and frontendsareas not useable due to inhomogeneitypartial reconfigurable areas containing the vFPGA-Slots
RC2F-Prototype Virtex-7 XC7VX485T
estimation for a productive cloud Virtex-7 UltraScale+ XCVU9P
x 4,55
Proz
entu
aler
Ant
eil d
er H
ardw
are-
Res
sour
cen
an d
en R
esso
urce
n de
s g
esam
ten
FPG
As
0 %
20 %
40 %
60 %
80 %
Dynamisch rekonfigurierbare Bereiche für vFPGA-Slots
Statischer Bereich für RC2F-Infrastruktur und 6 vFPGA-Frontends
Aufgrund der Inhomogenität nicht nutzbare Bereiche
Slice Registers Block-RAM Tiles Slice LUTs DSPs
Statischer Bereich der RC2F-Infrastruktur und FrontendsAufgrund der Inhomogenität nicht nutzbare BereicheRekonfigurierbarer Bereiche für homogene vFPGAs
�1
!23
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Results of the RC3E simulation with a synthetic workload (BAaaS)
a
b
c
Synthetic workload with 4,981 requests over 150 minutes.time in minutes
inco
min
g re
ques
ts
!24
a
b
c
d
Assignment of vFPGA-Slots in an simulation with additional migration to reduce defragmentation (+FPGAs +RC2F +migration).
time in minutes
num
ber
of o
ccup
ied
vFPG
A-S
lots
Number of vFPGA-Slots Allocated compute nodes Occupied vFPGA-Slots vFPGA-migration
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Results of the RC3E Simulation (BAaaS)
— By using two Virtex-7 FPGAs per node, the energy demand of the cloud can be reduced by 30.65 %.
— RC2F virtualization reduces the energy demand of the cloud system to 24.43 %.
— An additional migration of vFPGAs adds an extra 1.45 % compared to a simple RC2F virtualization.
Scenario I (synthetic) — 4,981 requests
Basis (without FPGAs) +FPGAs +RC2F +Migration
Compute Nodes 357 132 26 24
Utilization of the FPGAs — 27.34 % 78.14 % 85.07 %
Energy demand (kWh) 35.37 kWh 24.53 kWh 8.64 kWh 8.13 kWh
Energy demand (%) 100 % 69.35 % 24.43 % 22.91 %
!25
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Final considerations, summary and outlook
!26
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Outlook: Aspects for our Future FPGA Cloud (with virtualization?)
— Provision of reconfigurable Hardware in a Cloud requires an abstraction from the physical hardware and a flexible environment (RC3E & RC2F).
— A fine-grained virtualization with multiple users on the same physical device is possible and can be reasonable for the background acceleration of suitable services in a productive cloud (BAaaS),
— … but inefficient for most of the typical scenarios in our research context at the HZDR.
HW Development Prototyping & CIPost-Process Data
Deep learning & Analyse
Store Data Filter & Compression
!27
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
!28
https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/
https://fudzilla.com
/news/memory-and-storage/46
283-alibaba-gets-xi
linx-fpga-for-its-f3-c
loud
https://techcrunch.com/2018/04/18/facebook-has-a-new-job-posting-calling-for-chip-
designers/
https://www.forbes.co
m/sites/moorinsights/2017/09/2
7/amazon-and-xilinx-
deliver-new-fpga-solut
ions/#3f880862370a
https://thenewstack.io/developers-fpgas-cloud/
https://www.computerworld.com.au/article/640970/microsoft-takes-step-towards-delivering-fpgas
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
References
[KG+18] Oliver Knodel, Paul Genßler,Fredo Erxleben and Rainer Spallek. “An Endless Tale of Virtualization, Elasticity and Efficiency“. In: International Journal on Advances in Systems and Measurements, vol 11 no 3&4. IARIA. 2016.
[KGS17] Oliver Knodel, Paul R. Genssler and Rainer G. Spallek. „Virtualizing Reconfigurable Hardware to Provide Scalability in Cloud Architectures“. In: Reconfigurable Architectures, Tools and Applications, RECATA 2017, Rome, Italy, September 10 - 14, ISBN: 978-1-61208-585-2, CENICS 2017 Best Paper Award. IARIA. 2017, S. 33–38.
[KGS16] Oliver Knodel, Paul Genßler and Rainer Spallek. “Migration of long-running Tasks between Reconfigurable Resources using Virtualization”. In: ACM SIGARCH Computer Architecture News Volume 44, HEART 2016, July 25-27, Hongkong. ACM. 2016.
[KLS16] Oliver Knodel, Patrick Lehmann and Rainer G Spallek. “RC3E: Reconfigurable Accelerators in Data Centres and their Provision by Adapted Service Models”. In: 9th Int’l Conf. on Cloud Computing, Cloud 2016, June 27 - July 2, San Francisco, CA, USA. IEEE. 2016.
[Kno14] Oliver Knodel. “Integration von FPGA-Ressourcen als Hardwarebeschleuniger und deren Bereitstellung sowie Verwaltung in einem Mehrbenutzersystem”. In: Dresdner Arbeitsta- gung Schaltungs- und Systementwurf, DASS 2014, Dresden, Germany. 2014.
[KS13] Oliver Knodel and Rainer Spallek. “FPGAs in der Cloud: Integration und Bereitstellung von rekonfigurierbaren Hardware-Ressourcen in einer Cloud-Infrastruktur”. In: 5. Workshop für Grid-, Cloud- und Big-Data-Technologien für Systementwurf und -analyse, Grid4Sys 2013, Dresden, Germany. 2013.
[KS15] Oliver Knodel and Rainer G. Spallek. “Computing Framework for Dynamic Integration of Reconfigurable Resources in a Cloud”. In: 2015 Euromicro Conference on Digital System Design, DSD 2015, Madeira, Portugal, August 26-28. IEEE. 2015, S. 337–344.
!29
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Appendix
!30
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Outlook: Aspects for Future FPGA Architectures
— SoC-FPGA with a static RC2F-like Infrastructure:Development of a static area for the administration with all necessary interfaces to the outside world.
— Structural changes within the FPGA architecture: Establishment of homogeneous and areas with own clock regions on the FPGAs.
— Implementation of a security concept: Security concept based on a trusted authority to provide verifiable RC2F infrastructure.
!31
https://www.wired.com/2016/09/microsoft-bets-future-chip-
reprogram-fly/
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
vFPGAs and the HostFPGAHost
Hos
t H
yper
viso
r
FPG
A H
yper
viso
r
…
…
Har
dwar
e In
terf
ace
Bac
kend
Inte
rfac
eFr
onte
nd
Inte
rfac
e
||||||||
Har
dwar
e Tr
eibe
rDom0 — Verwaltung |||||||| Dom0 — VerwaltungvFPGA Konfiguration,Nutzerzuordnung und
Zugriffsverwaltung
Fron
tend
In
terf
ace
Ger
äte-
D
atei
en
…
Fron
tend
In
terf
ace
Ger
äte-
D
atei
en
DomU — VM 0
…
Bac
kend
Inte
rfac
eInter-Domain
Channelrc2f_system_readrc2f_system_writerc2f_config_readrc2f_config_write
/dev
rc2f_readrc2f_writerc2f_vcs
/dev
DomU — VM N
rc2f_readrc2f_writerc2f_vcs
/dev
Dev
ice
files
Zuor
dnun
g de
r G
erät
e
Bac
kend
In
terf
ace
||||||||
Mem
|||||||| DomU — vFPGA 0
||||||||
Mem
||||||||
||||||||||||||||
rc2f_vcs
rc2f_readrc2f_write
rc2f_hcs
rc2f_hcs
rc2f_system_readrc2f_system_write
rc2f_config_readrc2f_config_write
Fron
tend
In
terf
ace
||||||||
Mem
|||||||| DomU — vFPGA Nrc2f_vcs
rc2f_readrc2f_write
physische Verbindung
virtuelle Verbindung
|||||||| FIFO-Speicher
Ringspeicher
!32
Member of the Helmholtz AssociationPageDr.-Ing. Oliver Knodel | Department of Information Services and Computing | Computational Science Group | www.hzdr.de
Virtex-7 Ultrascale+
!35