l16-fpgas

Upload: omairakhtar12345

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 L16-fpgas

    1/20

    6.888 PARALLELAND HETEROGENEOUS COMPUTER ARCHITECTURE

    SPRING 2013

    LECTURE 16FINE-GRAINED RECONFIGURABLE

    COMPUTING: FPGAS

    JOEL EMERAND DANIEL SANCHEZ

  • 7/30/2019 L16-fpgas

    2/20

    Field Programmable Gate Arrays (FPGA)

    And

    00 0

    01 0

    10 0

    11 1

    LUTLatch

    RAM

    .....

    Or

    00 0

    01 0

    10 1

    11 1

  • 7/30/2019 L16-fpgas

    3/20

    Evolution of FPGA applications3

    Logic ReplacementLow design cost and effortLow volume applicationsOften replaced with ASIC as volume increases

    Algorithmic computationOffloads a general purpose processorUsed for multiple algorithmsASIC replacement not expected

    6.888 Spring 2013 - Sanchez and Emer L16

  • 7/30/2019 L16-fpgas

    4/20

    Benefits of FPGA computation4

    Custom operations/data typescustom operations/data types Flexible flow control- control flow based on arbitrary state machines Local state access- local state elements allows parallel state access Fine grain parallelism- replicated logic permits easy parallelism Custom communication- explicit direct inter-module communication Reduced memory references more direct reuse of data Better power efficiencymore activity directly applied to computation Better area efficiencymore area directly applied to computation

    6.888 Spring 2013 - Sanchez and Emer L16

  • 7/30/2019 L16-fpgas

    5/20

    QPI-attached FPGA platformIntel QuickAssist QPI-based FPGA AcceleratorPlatform (QAP)

    Four Socket QuickAssist Platform

    Topology

    QPI

    Links

    PM

    PM

    M

    P M

    CS

    I/O I/O

    CS

    I/O I/O

    QPI

    Links

    Altera Stratix IVModule

    Xilinx Virtex 6 Module

    Intel Xeon processor 7000 series

    Accelerator HardwareModule (AHM)

    FPGA

    QuickAssist FPGA

    PCIe attached FPGA

    6.888 Spring 2013 - Sanchez and Emer L16

    5

  • 7/30/2019 L16-fpgas

    6/20

    Reed Solomon Results

    Xilinx IP Catapult-C

    Bluespec

    EquivalentGate Count 297,409 596,730 267,741

    Frequency(MHz)

    145.3 91.2 108.5

    Steady State(Cycles/Block)

    660 2073 276

    Data rate(Mbps)

    392.8 89.7 701.3

    Lower is better Higher is better

    WiMAX requirement is to support a throughput of 134Mbps

    Source: MIT, Abhinav Agarwal, Alfred Ng CSG

  • 7/30/2019 L16-fpgas

    7/20

    BORPH

    Berkeley Operating system for ReProgrammableHardware

    OS for reconfigurable computers Treats reconfigurable hardware as computational

    resources

    UNIX interface to HW designs Familiar to both software and hardware engineers Design language independent

    Goal:

    Make FPGA-based reconfigurable computerseasy to use

    6.888 Spring 2013 - Sanchez and Emer L16

    7

  • 7/30/2019 L16-fpgas

    8/20

    6.888 Spring 2013 - Sanchez and Emer L16

    Conventional View of FPGA Systems

    User Process(SW)

    OS Kernel

    User Process(SW)

    User Process(SW)

    User Library

    Hardware Platform

    (Network, UART, HD)

    file IPC

    Device Driver

    Softw

    are

    Hardware

    socketpipe

    FPGA coprocessor

    Master-Slave

    Relationship

    6.888 Spring 2013 - Sanchez and Emer L16

    8

  • 7/30/2019 L16-fpgas

    9/20

    6.888 Spring 2013 - Sanchez and Emer L16

    BORPH LayersUser Process

    (SW)User Process

    (SW)User Process

    (SW)

    Hardware Platform(Network, UART, HD)

    Device Driver

    User Process(HW)

    User Process(HW)

    Hardware User Library

    BORPH KernelSoftw

    are

    Hardware

    User Library

    fileIPC socketpipe

    ioregvirtual

    file

    Peer-to-Peer

    Relationship

    6.888 Spring 2013 - Sanchez and Emer L16

    9

  • 7/30/2019 L16-fpgas

    10/20

    Overview of BORPH Concepts

    Hardware process Hardware syscall interface

    Interacting with an FPGA ioreg virtual file interface Hardware file I/O

    SW SW SW

    Hardware Platform(Network, UART, HD)

    Device Driver

    HW HW

    Hardware User Library

    BORPH KernelSoftware

    Hardware

    User Library

    fileIPC socketpipe

    ioreg

    6.888 Spring 2013 - Sanchez and Emer L16

  • 7/30/2019 L16-fpgas

    11/20

    Hardware Process (1)

    An executing instance of ahardware design

    SW: An executing instance of aprogram

    Normal UNIX process Has pid, check status withps,kill, etc

    Unit of management Created when a BORPH Object

    File (BOF) file is

    exec-ed Kernel selects and configure

    hardware region automatically

    6.888 Spring 2013 - Sanchez and Emer L16

    SW SW SW

    Hardware Platform(Network, UART, HD)

    Device Driver

    HW HW

    Hardware User Library

    BORPH KernelSoftware

    Hardware

    User Library

    fileIPC socketpipe

    ioreg

    11

  • 7/30/2019 L16-fpgas

    12/20

    Benefits of UNIX Process Model

    Very easy for user to reason about Enable FPGA designs to become active component of the

    system

    e.g. an FIR filter:

    Conventional: apassive entity where software sends/receives data BORPH: an active entity that pulls/pushes data as needed

    Enable multiple instances of the same FPGA design runningin the system No more fixed accelerator concept Works well in true reconfigurable computing systems

    6.888 Spring 2013 - Sanchez and Emer L16

    12

  • 7/30/2019 L16-fpgas

    13/20

    SW SW SW

    Hardware Platform(Network, UART, HD)

    Device Driver

    HW HW

    Hardware User Library

    BORPH KernelSoftware

    Hardware

    User Library

    fileIPC socketpipe

    ioreg

    HW Processes I/O

    I/O managed by kernel Similar to SW

    Hide details from users e.g. HW-SW, HW-HW UNIX

    file pipe

    Standard UNIX I/Omechanism

    File I/O, pipe, signal HW specific service

    ioreg virtual file system

    Dont ask How do I in HW.Think: What if it were SW?

    6.888 Spring 2013 - Sanchez and Emer L16 13

  • 7/30/2019 L16-fpgas

    14/20

    ioreg Virtual File System

    Maps user defined hardware constructs as virtual files under the processs /proc//hw/ioreg/ directory Single word register Memory: On-chip + Off-chip FIFO

    Example: /proc/123/hw/ioreg/COUNTERVAL

    ioreg information embedded in the executing BOF file readandwrite system calls translated to message packet by the kernel

    Any UNIX program can communicate with hardware processesn Shell: echo 1 > /proc/123/hw/ioreg/enablen

    C:MEM_FILE =n fopen(/proc/123/hw/ioreg/MyMemory, r);n fread(swbuf, 1, MEM_SIZE, MEM_FILE);n Python, Java, etc

    6.888 Spring 2013 - Sanchez and Emer L16

    14

  • 7/30/2019 L16-fpgas

    15/20

    Hardware File I/O

    Access to the general file system from hardware processes Debug by printing

    printf Read test vectors, record output

    SW/HW processes chained by file pipe

    BasebandProcess

    A/DAnalogFrontendUpperLayer

    Decode ResizeEdge

    DetectEncode

    video.in video.out

    bash$decode video.in | resize | edgdet.bof | encode > video.out

    bash$receiver.bof < file.in > file.out

    6.888 Spring 2013 - Sanchez and Emer L16

    15

  • 7/30/2019 L16-fpgas

    16/20

    Latency-Insensitive Design: A Higher Semantic

    Inter-module communication by latency insensitive channels Changing the timing behavior of a module does not affect functional correctness of

    the program

    Many HW designs use this methodology Improved modularity Simplified design-space exploration

    Implemented with guarded FIFOs in current RTLs

    Control

    Timing Partition

    ExeDecodeFetch

    FPGA

    Functional Partition

    ExeDecodeFetch

    Control Partition

    16

  • 7/30/2019 L16-fpgas

    17/20

    FPGA

    FPGA1

    FPGA0

    Latency-Insensitive Design: A Higher Semantic

    Timing Partition

    ExeDecodeFetch

    Functional Partition

    ExeDecodeFetch

    Control

    Control Partition

    17Behavior of LI channels does not affect functional correctness.

  • 7/30/2019 L16-fpgas

    18/20

    Latency-Insensitive Design: A Higher Semantic

    There are many FIFOs in the design It may not be safe to modify some of them

    Compilers see only wires and registers Reasoning about cycle accuracy is difficult

    Control

    Timing Partition

    ExeDecodeFetch

    FPGA

    Functional Partition

    ExeDecodeFetch

    Control Partition

    18But the programmer knows about the LI property

  • 7/30/2019 L16-fpgas

    19/20

    A Syntax for LI Design

    Programmer needs todifferentiate LI channels fromnormal FIFOs

    Latency-Insensitive Send/Recvendpoints Implementation chosen by

    compiler

    FIFO order Guaranteed delivery

    Explicit programmer contract Unspecified buffering &

    unspecified latency

    Programmer responsible forcorrect annotation

    module mkTimeP;

    Send#(Inst) send

  • 7/30/2019 L16-fpgas

    20/20

    Pla8orm_1

    Pla8orm_1

    Comms

    Pla8orm_0

    Pla8orm_0

    Comms

    Connected User Application

    mkApplicaon

    mkB_Stub

    mkC

    mkA_Stub

    mkApplicaon_Stub

    mkB

    mkC_Stub

    mkA

    Library GeneratedUserKey: 6.888 Spring 2013 - Sanchez and Emer L16

    20