creating an agile hardware flo€¦ · agile hardware design is possible! we produced 2 chips with...

32
Creating An Agile Hardware Flow Stanford AHA! Agile Hardware Center August 18, 2019 Stanford AHA! 1/29

Upload: others

Post on 02-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Creating An Agile Hardware Flow

Stanford AHA! Agile Hardware Center

August 18, 2019

Stanford AHA! 1/29

Page 2: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

How hardware design is done today

Study APP

and DSE

Design

hardware

Write

software

Stanford AHA! 2/29

Page 3: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

How hardware design is done today

Study APP

and DSE

Design

hardware

Write

software

Verilog, VHDL, SystemVerilog...

Stanford AHA! 3/29

Page 4: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

How hardware design is done today

Study APP

and DSE

Design

hardware

Write

software

Stanford AHA! 4/29

Page 5: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

This is a waterfall approach to design

We have a linear staged design flow:

1. create a specification,

2. hardware design,

3. software design.

Problems:

◮ Change of application requirements.

◮ Incomplete knowledge/understanding of the problem.

Stanford AHA! 5/29

Page 6: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Is there an agile approach to hardware design?

Page 7: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Requirements of an agile hardware/software design approach

◮ An end-to-end system supporting continuous integration/deployment

◮ From applications to a running hardware/software system.

◮ Evolve that system to make it more efficient.

Applications

HW + SW

Compiler

Execution

Bug Fixes and

Evaluation

CI/CD

Stanford AHA! 7/29

Page 8: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Requirements of an agile hardware/software design approach

◮ An end-to-end system supporting continuous integration/deployment

◮ From applications to a running hardware/software system.

◮ Evolve that system to make it more efficient.

Applications

HW + SW

Compiler

Execution

Bug Fixes and

Evaluation

CI/CD

Stanford AHA! 8/29

Page 9: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Requirements of an agile hardware/software design approach

◮ An end-to-end system supporting continuous integration/deployment

◮ From applications to a running hardware/software system.

◮ Evolve that system to make it more efficient.

Applications

HW + SW

Compiler

Execution

Bug Fixes and

Evaluation

CI/CD

Stanford AHA! 8/29

Page 10: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Requirements of an agile hardware/software design approach

◮ An end-to-end system supporting continuous integration/deployment

◮ From applications to a running hardware/software system.

◮ Evolve that system to make it more efficient.

Applications

HW + SW

Compiler

Execution

Bug Fixes and

Evaluation

CI/CD

Stanford AHA! 8/29

Page 11: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Requirements of an agile hardware/software design approach

◮ An end-to-end system supporting continuous integration/deployment

◮ From applications to a running hardware/software system.

◮ Evolve that system to make it more efficient.

Applications

HW + SW

Compiler

Execution

Bug Fixes and

Evaluation

CI/CD

Stanford AHA! 8/29

Page 12: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Evolving software applications

◮ With a fixed hardware, we can continuously optimize the applications

◮ And the compiler.

◮ Problem: we don’t have fixed hardware when designing accelerators in an agileapproach.

Applications

Compiler

System Stack

Bug Fixes and

Evaluation

X86/GPU

Stable

Stanford AHA! 9/29

Page 13: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Design space exploration

◮ If we have a fixed set of applications.

◮ Extract key application parameters.

◮ Problem: applications and compilers are always changing to be more performant.

Applications

Design Space

Exploration

Bug Fixes and

Evaluation

Stable

Stanford AHA! 10/29

Page 14: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

A new agile hardware/software design: stable interface

Applications

CoreIR

Execution

Bug Fixes and

Evaluation

Halide

Compiler

Stanford AHA! 11/29

Page 15: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

A new agile hardware/software design: CGRA

Applications

CoreIR

CGRA

Bug Fixes and

Evaluation

Halide

Compiler

MEM

MEM

MEM

PE

PE

PE

PE

PE

PE

Coarse-Grain ReconfigurableArchitecture (CGRA).

Stanford AHA! 12/29

Page 16: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Jade: Our first generation of CGRA

CGRA Programming

Toolchain

Halide Compiler

Application in Halide

Mapper

Place and Route

Configured CGRA

CoreIR Graph

Configuration Bitstream

1. Finished in a year with a group of grad students.2. Built with Genesis2, a hardware generation framework

that uses Perl to meta-program hardware moduleswritten in SystemVerilog.

3. Taped out in Summer 2018 and we received packagedparts in January.

Stanford AHA! 13/29

Page 17: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Problem: changes affect many tools

PEPE

New Application comes

and we need to change

the PE

PEPEPEPE

PEPE

PEPE

PEPE

SB

SB

SB

SB

SB

SB

SB

SB

SB

Interconnect

FF000101 00AF000B

00000101 0000000C

00000201 0000000A

FF0000201 00AF00C

Bitstream

Configuration

Verilog/RTL

CGRA Generator

PnR

Stanford AHA! 14/29

Page 18: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Maintaining the flow and collateral generation becomes more difficult

assign mode = config_mem[1:0];

assign tile_en = config_mem[2];

assign depth = config_mem[15:3];

assign almost_count = config_mem[19:16];

assign enable_chain = config_mem[19];

// ... 216 lines later ...

//;my $filename = "MEM".$self->mname();

//;open(MEMINFO, ">$filename") or die "Couldn't open file $filename, $!";

//;print MEMINFO " <mode bith='1' bitl='0'>00</mode>\n";

//;print MEMINFO " <tile_en bith='2' bitl='2'>0</tile_en>\n";

//;print MEMINFO \n";

//;print MEMINFO " <almost_count bith='19' bitl='16'>0</almost_count>\n";

//;print MEMINFO " <chain_enable bith='20' bitl='20'>0</chain_enable>\n";

//;close MEMINFO;

memory_core.vp

16

Stanford AHA! 15/29

Page 19: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Domain-specific languages (DSLs) to the rescue

Why choose DSLs

◮ We want generators to be the spec of a design that produce both RTL and aconsistent API for downstream tools to query. This is the single source of truth.

◮ Generating arbitrary hardware from arbitrary higher-level specifications is anextremely difficult problem.

◮ Dividing the problem into generating specific types of hardware, such as ALU andinterconnect, makes the problem more tractable.

Stanford AHA! 16/29

Page 20: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

PEak: DSL for Processing Elements (PEs)

◮ Python-embedded DSL for describingthe ISA and functional model of a PE

◮ Has precise formal semantics

Stanford AHA! 17/29

Page 21: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Multiple interpretations of the same PEak object

Python

Context

Functional

Model

PEak

Program

Magma

Context

PEak

Program

RTL

SMT

Context

PEak

Program

Symbolic

Representation

PEak Program

Stanford AHA! 18/29

Page 22: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Overview of DSL designs in our new tool chain

RTL

Gemstone

PeakProgram

Peak DSL

PE

PE Mapper

LakeProgram

Lake DSL

MEM

MEM Mapper

CanalProgram

Canal DSL

Interconnect

PnR Graph

Separation of concerns and stages

Naming convention

Stanford AHA! 19/29

Page 23: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Next-generation generators

Page 24: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Physical implementation issues

Due to the requirement of physical design, sometimes we have to change the logicalimplementation.

Common solution:

◮ Change how the generator produces RTL.

Result:

◮ The generator becomes brittle and dependent on a particular technology.

◮ Changes not re-usable.

Stanford AHA! 21/29

Page 25: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Lessons learned: Generate design in stages

◮ Logical design goes through many “passes” to create the final design.

◮ Re-usable passes for other designs.

Global Controller

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

What if we want to change how the global signals are wired during physical design?

Stanford AHA! 22/29

Page 26: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Gemstone: A staged generator

Gemstone is a structural staged generator that allows multiple passes to change theRTL design.

◮ Well-defined primitives on circuit objects, such as add/remove ports andinstantiate generators/circuits.

◮ Allows multiple passes to change the RTL and generate non-RTL collateral

Logical

Description

Logical

Description’

Logical

Description’’

Logical

Description’’’

Physical

Design

RTL with power

domains

RTL with

configuration

registers

distributed

Initial RTL from

logical designer

Pass

RTL with global

signals river

routed

Pass PassVerilog to

physical

design

Floorplanning

Constraints

Timing

Constraints

Stanford AHA! 23/29

Page 27: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Passes to modify Gemstone generator objects

Global Controller

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Global Controller

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

An example pass that changes the fanout global signals into a daisy chain.

Stanford AHA! 24/29

Page 28: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Architecture diagram for Garnet

Garnet is more complex and has more component dependencies than the firstgeneration Jade chip.

CGRA

Global BufferDMA

Engine

DMA

Engine

TLX

CoreLink

Control Processor

(ARM Cortex M3)

Ne

ste

d V

ect

or

Inte

rru

pt

Co

ntr

oll

er

(NV

IC)

Wa

ke

-up

In

terr

up

t

Co

ntr

oll

er

(WIC

)

Memory Protection

Unit (MPU)

I-Bus D-Bus

De

bu

g J

TA

G

TLX

DRAM

DRAM

Controller

Application

Processor

FPGA

SRAM

AXI

AXIAXIAXIAXI

AHB AHB

SRAM

AXI

SoC

Stanford AHA! 25/29

Page 29: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Putting everything together

Garnet SoC layout diagram

◮ Re-implemented the entire tool-chainfrom scratch in 7 months.

◮ Produced a tape-out ready design.◮ Full-sized SoC working in RTL

simulation.

Chip Spec:◮ 32x16 CGRA Array◮ 16-bit integer and BFloat PEs◮ 4 MB second-level memory◮ ARM Cortex M3 with CoreLink

Stanford AHA! 26/29

Page 30: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Future plan

◮ Use DSE feedback to automatically optimize the design.

◮ Continuously harden the CGRA design to make it more area and power efficient.

◮ A Linux-capable SoC design.

◮ More polished tool-chain flow that tests and evaluates the system all the waythrough to physical design.

Stanford AHA! 27/29

Page 31: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Agile hardware design is possible!

We produced 2 chips with complete software stacks in 2 years and we continue toevolve the hardware and software stack.

1. Clean interface between software and hardware allows both of them to beimproved together.

2. DSLs for hardware generators make hardware design more tractable and easier tochange.

3. Staged generators allow for a separation of concerns between logical descriptionand physical implementation, making the system more robust to changes.

Stanford AHA! 28/29

Page 32: Creating An Agile Hardware Flo€¦ · Agile hardware design is possible! We produced 2 chips with complete software stacks in 2 years and we continue to evolve the hardware and software

Stanford AHA! Agile Hardware Center

Contributors: Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Nate Chizgi,Ross G Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, PatHanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Taeyoung Kong, Zheng Liang,Qiaoyi Liu, Makai Mann, Zachary Alexander Myers, Ankita Nayak, Aina Niemetz,Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter,Daniel Stanley, Maxwell Strange, Charles Tsao, James Thomas, Leonard Truong, XuanYang, Keyi Zhang

Special thanks to our sponsors: DARPA DSSoC program, and Intel, Facebook, Google,Amazon, and NVIDIA.

All our tools are open sourced at https://github.com/StanfordAHA and we would behappy to collaborate.

Stanford AHA! 29/29