l fine-grained reconfigurable computing:...

20
6.888 PARALLEL AND HETEROGENEOUS COMPUTER ARCHITECTURE SPRING 2013 LECTURE 16 FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAS JOEL EMER AND DANIEL SANCHEZ

Upload: others

Post on 24-Apr-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

6.888 PARALLEL AND HETEROGENEOUS COMPUTER ARCHITECTURE SPRING 2013

LECTURE 16 FINE-GRAINED RECONFIGURABLE

COMPUTING: FPGAS

JOEL EMER AND DANIEL SANCHEZ

Page 2: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Field Programmable Gate Arrays (FPGA)

And

00 0

01 0

10 0

11 1

LUT  Latch  

RAM   .....

Or

00 0

01 0

10 1

11 1

Page 3: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Evolution of FPGA applications 3

¨  Logic Replacement ¤ Low design cost and effort ¤ Low volume applications ¤ Often replaced with ASIC as volume increases

¨  Algorithmic computation ¤ Offloads a general purpose processor ¤ Used for multiple algorithms ¤ ASIC replacement not expected

6.888 Spring 2013 - Sanchez and Emer – L16

Page 4: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Benefits of FPGA computation 4

¨ Custom operations/data types – custom operations/data types

¨  Flexible flow control - control flow based on arbitrary state machines ¨  Local state access - local state elements allows parallel state access ¨  Fine grain parallelism - replicated logic permits easy parallelism

¨  Custom communication - explicit direct inter-module communication ¨  Reduced memory references – more direct reuse of data

¨  Better power efficiency – more activity directly applied to computation ¨  Better area efficiency – more area directly applied to computation

6.888 Spring 2013 - Sanchez and Emer – L16

Page 5: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

QPI-attached FPGA platform Intel QuickAssist QPI-based FPGA Accelerator Platform (QAP)

Four Socket QuickAssist Platform Topology

QPI Links

P M

P M

M

P M

CS

I/O I/O

CS

I/O I/O

QPI Links

Altera Stratix IV Module

Xilinx Virtex 6 Module

Intel® Xeon processor 7000 series

Accelerator Hardware Module (AHM)

FPGA

QuickAssist FPGA

PCIe attached FPGA

6.888 Spring 2013 - Sanchez and Emer – L16

5

Page 6: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Reed Solomon Results

Xilinx IP Catapult-C

Bluespec

Equivalent Gate Count 297,409 596,730 267,741

Frequency (MHz) 145.3 91.2 108.5

Steady State (Cycles/Block)

660 2073 276

Data rate (Mbps) 392.8 89.7 701.3

Lower is better Higher is better

WiMAX requirement is to support a throughput of 134Mbps

Source: MIT, Abhinav Agarwal, Alfred Ng – CSG

Page 7: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

BORPH

¨  Berkeley Operating system for ReProgrammable Hardware

¨  OS for reconfigurable computers ¤  Treats reconfigurable hardware as computational

resources ¨  UNIX interface to HW designs

¤  Familiar to both software and hardware engineers ¤ Design language independent

¨  Goal:

Make FPGA-based reconfigurable computers easy to use

6.888 Spring 2013 - Sanchez and Emer – L16

7

Page 8: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

6.888 Spring 2013 - Sanchez and Emer – L16

Conventional View of FPGA Systems

User Process (SW)

OS Kernel

User Process (SW)

User Process (SW)

User Library

Hardware Platform (Network, UART, HD…)

file IPC

Device Driver

Softw

are

Har

dwar

e

socket pipe

FPGA “coprocessor”

Master-Slave Relationship

6.888 Spring 2013 - Sanchez and Emer – L16

8

Page 9: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

6.888 Spring 2013 - Sanchez and Emer – L16

BORPH Layers User Process

(SW) User Process

(SW) User Process

(SW)

Hardware Platform (Network, UART, HD…)

Device Driver

User Process (HW)

User Process (HW)

Hardware User Library

BORPH Kernel

Softw

are

Har

dwar

e

User Library

file IPC socket pipe

ioreg virtual

file

Peer-to-Peer Relationship

6.888 Spring 2013 - Sanchez and Emer – L16

9

Page 10: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Overview of BORPH Concepts

¨  Hardware process ¨  Hardware syscall interface

¨  Interacting with an FPGA ¤  ioreg virtual file interface ¤  Hardware file I/O

SW SW SW

Hardware Platform (Network, UART, HD…)

Device Driver

HW HW

Hardware User Library

BORPH Kernel Softw

are

Har

dwar

e

User Library file IPC socket pipe

ioreg

6.888 Spring 2013 - Sanchez and Emer – L16

Page 11: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Hardware Process (1)

¨  An executing instance of a hardware design ¤  SW: An executing instance of a

program

¨  Normal UNIX process ¤  Has pid, check status with ps, kill, etc

¨  Unit of management ¨  Created when a BORPH Object

File (BOF) file is exec-ed ¤  Kernel selects and configure

hardware region automatically

6.888 Spring 2013 - Sanchez and Emer – L16

SW SW SW

Hardware Platform (Network, UART, HD…)

Device Driver

HW HW

Hardware User Library

BORPH Kernel Softw

are

Har

dwar

e

User Library file IPC socket pipe

ioreg

11

Page 12: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Benefits of UNIX Process Model

¨  Very easy for user to reason about ¨  Enable FPGA designs to become active component of the

system ¤  e.g. an FIR filter: ¤ Conventional: a passive entity where software sends/receives data ¤  BORPH: an active entity that pulls/pushes data as needed

¨  Enable multiple instances of the same FPGA design running in the system ¤ No more fixed accelerator concept ¤ Works well in true reconfigurable computing systems

6.888 Spring 2013 - Sanchez and Emer – L16

12

Page 13: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

SW SW SW

Hardware Platform (Network, UART, HD…)

Device Driver

HW HW

Hardware User Library

BORPH Kernel Softw

are

Har

dwar

e

User Library file IPC socket pipe

ioreg

HW Processes I/O ¨  I/O managed by kernel

¤  Similar to SW

¨  Hide details from users ¤  e.g. HW-SW, HW-HW UNIX

file pipe

¨  Standard UNIX I/O mechanism ¤  File I/O, pipe, signal

¨  HW specific service ¤  ioreg virtual file system

Don’t ask “How do I … in HW”. Think: “What if it were SW?”

6.888 Spring 2013 - Sanchez and Emer – L16 13

Page 14: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

ioreg Virtual File System ¨  Maps user defined hardware constructs as virtual files under the process’s /

proc/<pid>/hw/ioreg/ directory ¤  Single word register ¤  Memory: On-chip + Off-chip ¤  FIFO

¨  Example: ¤  /proc/123/hw/ioreg/COUNTERVAL

¨  ioreg information embedded in the executing BOF file ¨  read and write system calls translated to message packet by the kernel

¤  Any UNIX program can communicate with hardware processes n  Shell: echo 1 > /proc/123/hw/ioreg/enable n  C: MEM_FILE = n  fopen(“/proc/123/hw/ioreg/MyMemory”, “r”); n  fread(swbuf, 1, MEM_SIZE, MEM_FILE); n  Python, Java, etc…

6.888 Spring 2013 - Sanchez and Emer – L16

14

Page 15: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Hardware File I/O ¨  Access to the general file system from hardware processes ¨  Debug by printing

¤  printf ¨  Read test vectors, record output

¨  SW/HW processes chained by file pipe

Baseband Process A/D Analog

Frontend Upper Layer

Decode Resize Edge

Detect Encode

video.in video.out

bash$ decode video.in | resize | edgdet.bof | encode > video.out

bash$ receiver.bof < file.in > file.out

6.888 Spring 2013 - Sanchez and Emer – L16

15

Page 16: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Latency-Insensitive Design: A Higher Semantic

¨  Inter-module communication by latency insensitive channels ¤  Changing the timing behavior of a module does not affect functional correctness of

the program

¨  Many HW designs use this methodology ¤  Improved modularity ¤  Simplified design-space exploration

¨  Implemented with guarded FIFOs in current RTLs

Control  

Timing Partition

Exe  Decode  Fetch  

FPGA

Functional Partition

Exe  Decode  Fetch  

Control Partition

16

Page 17: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

FPGA

FPGA1

FPGA0

Latency-Insensitive Design: A Higher Semantic

Timing Partition

Exe  Decode  Fetch  

Functional Partition

Exe  Decode  Fetch  

Control  

Control Partition

17 Behavior of LI channels does not affect functional correctness.

Page 18: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Latency-Insensitive Design: A Higher Semantic

¨  There are many FIFOs in the design ¤  It may not be safe to modify some of them

¨  Compilers see only wires and registers ¤  Reasoning about cycle accuracy is difficult

Control  

Timing Partition

Exe  Decode  Fetch  

FPGA

Functional Partition

Exe  Decode  Fetch  

Control Partition

18 But the programmer knows about the LI property…

Page 19: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

A Syntax for LI Design

¨  Programmer needs to differentiate LI channels from normal FIFOs

¨  Latency-Insensitive Send/Recv endpoints ¤  Implementation chosen by

compiler ¤  FIFO order ¤  Guaranteed delivery

¨  Explicit programmer contract ¤  Unspecified buffering &

unspecified latency ¤  Programmer responsible for

correct annotation

module mkTimeP; Send#(Inst) send <- mkSend(“Decode”); endmodule

module mkFuncP; Recv#(Inst) recv <- mkRecv(“Decode”); endmodule

mkFuncP RTL

mkTimeP RTL

19 Easy to use – often a textual substitution!

Page 20: L FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAScourses.csail.mit.edu/6.888/spring13/lectures/L16-fpgas.pdf · Benefits of FPGA computation 4 ! Custom operations/data types – custom

Pla8orm_1  

Pla8orm_1  Comms  

Pla8orm_0  

Pla8orm_0  Comms  

Connected User Application

mkApplicaAon  

mkB_Stub

mkC

mkA_Stub

mkApplicaAon_Stub  

mkB

mkC_Stub

mkA

Library Generated User Key: 6.888 Spring 2013 - Sanchez and Emer – L16

20