hasim fpga-based processor models: fast, accurate and flexible

Post on 22-Feb-2016

66 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible. Michael Adler Elliott Fleming Michael Pellauer Joel Emer. Outline. Problem & goals Basic model structure Modeling a pipelined microarchitecture Modeling memory hierarchies Modeling multiprocessors - PowerPoint PPT Presentation

TRANSCRIPT

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

Michael AdlerElliott FlemingMichael PellauerJoel Emer

2

Outline

• Problem & goals• Basic model structure• Modeling a pipelined microarchitecture• Modeling memory hierarchies• Modeling multiprocessors• FPGA implementation details

3

Standard Scaling Problem Slide

• Single core targets: model performance scaled with processor speed• Multi-core targets: problem size grows with each generation

• Solutions:

– Reduce fidelity:• Shorter runs• Subset of available cores• Lightweight model

– Structural simulator change:• Parallelize it• Find a new method

4

Dependence Problems in Parallel Software Models

Option 1: Target CPUs ➞ Simulator Threads– Uncore causes dependence between simulator threads– High performance models (e.g. Graphite) relax the dependence

Fetch Decode Execute

Core 0 Core 1Uncore

Option 2: Target Pipeline Stages ➞ Simulator Threads– Lots of data movement– Cyclic pipelines impose complex dependence

5

Why is Hardware Difficult to Model in Software?

• Constant data movement through pipelines• Many points of dependence between “parallel” regions• Large, irregular, memory footprint• Difficult to vectorize• Branchy

6

Software Model Compromises

• Speed: Detailed model– Slow– Studies limited by run-time (e.g. large cache replacement policy)

• Accuracy: Simplified model– Model writer makes decisions about fidelity, hoping not to affect

predictions– Multi-core interactions remain difficult to parallelize

Find a new method?

7

FPGAs

• Shares the same properties as the target machine– Abundant wires– “ parallelism– “ registers

• Obvious mapping of pipelines• Already ubiquitous for RTL verification• Fast

Detailed FPGA models are often faster than simple models!

8

Aggregate Simulator Throughput (Parsec Black-Scholes)

9

Classification of FPGA-Based Designs

10

Prototype

• Final RTL, mapped to a different technology– E.g. an ASIC emulated on an FPGA

• This is what most people imagine for FPGA-based models

Characteristics:• Useful for verification before producing final hardware

– Shorter debugging loop– Internal state is more visible than final hardware– Masks are expensive

• Too late to make big micro-architectural decisions• Often too large to fit on a single FPGA• Often too late or too slow to be useful for software development

11

Functional Emulator

• Model architectural semantics• No prediction of run-time

Characteristics:• Can be written faster than prototypes• Potentially more FPGA-area efficient

– Use FPGA-friendly structures (e.g. no big CAMs)– Multiplex functional pipelines (like SMT)

• Useful as a software development platform• Not useful for microarchitectural research

12

Model

• Project metrics of interest (e.g. timing, power, reliability)• Emulate functional behavior as needed to compute metrics

Characteristics:• Metric may be computed algorithmically (even time)• An extension of functional emulators: function + metrics

13

Model Terminology

Modeling hardware on hardware leads to terminology confusion:– Both have caches, pipelines, memories…

• Target machine means the microarchitecture being studied• FPGA, functional-model and timing-model all refer to

implementation details. (E.g. functional memory cache is an FPGA structure.)

• Host is the general purpose machine to which FPGAs are connected

14

Why isn’t everyone building timing models with FPGAs?

15

Fast, Accurate or Now?

Accuracy

Development TimeModel Speed

16

FPGA Picture is Different

Accuracy

Development TimeModel Speed

17

Reducing Development Time: Managing Complexity

Use FPGAs while focusing on my algorithm? HAsim LEAP

Development Time

Model time? A-Ports Re-use components?

Split functional / timing models AWB

Fit a large problem on FPGAs? Multiplexing Latency Insensitivity Multiple FPGAs

How do I:

18

STDIO on General Purpose Machines

FILE *f = fopen(path, “w”);const char *name = “Kenneth”;fprintf(f, “%s, what is the frequency?\n”, name);

19

I/O In Hardware Description Languages (System Verilog)

Integer f = fopen(path, “w”);string name = “Kenneth”;fwrite(f, “%s, what is the frequency?\n”, name);

20

Nothing Comes from Nothing

FPGAs have:• No standard physical device• No standard device model• No standard system interface• No standard API

21

What Makes Hardware General Purpose?

The software!

• Compilers and library APIs make code “universal”• Hardware standards (ACPI, PCIe) make OS development and

compiler writing easier. Little impact on user programs.• ISA matters if you want to avoid recompiling. ISA is part of the

software API, along with standard libraries.

22

LEAP Platform

RRR

Platform Interface

STDIOScratchpadMemory

Control

Timing Partition

Functional Partition

Remote Memory Channel

FPGA Physical Platform

ExeDecodeFetch

RRR

Channel

Software Physical Platform

VirtualPlatform

Control

Software Services

StreamsMemoryStateEmulate

VirtualPlatform

FPGA Software

23

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

24

Bluespec on One Foot

• Functional language derived from Haskell• Generates Verilog• Modules – the analog of C++ classes

– May be polymorphic (types are abstract)

• Methods are the callable routines exposed by modules– Inlined statically at compile time into a calling rule

• Rules are:– Executed atomically– Guarded (predicated)

• Guard is both explicit (user specified) and implicit• Implicit guards come from guards on methods called in a rule

25

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

main()

26

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

Control Logic

27

Hello World in LEAP

module [CONNECTED_MODULE] mkConnectedApplication ();

STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n");

Reg#(STATE) state <- mkReg(STATE_start);

rule hello (state == STATE_start); stdio.printf(msg, List::nil); state <= STATE_finish; endrule

endmodule

STDIO

28

LEAP Gives FPGAs Key “General Purpose” Properties

Virtual Platform– I/O– Virtual memory abstract ion (scratchpads)

Topology– Named channels (FIFOs) instead of hard-coded wires– Host/FPGA remote procedure calls– Automated mapping to multiple FPGAs

Debugging Aids– Deadlock detection– Automated scan chains– User scan chains

29

LEAP Platform Users

• HAsim timing models• Prototypes

– SSD Functional Model– AirBlue wireless network stack

• Algorithmic accelerators– H.264 decoder– Matrix multiplication– …

30

Key Concept: Latency Insensitivity

31

Latency Insensitive Channel Semantics

• Guaranteed:– FIFO– Accurate– Always allow at least one message to be in flight

• Not guaranteed:– Latency

Why?– Allows for replacement of algorithms – even to software– Permits use of hierarchical memories (caches)– Simplifies communication – especially off-chip

This is a common software strategy (pipes, TCP/IP, pthread mutex)

32

Named Channels

• Name both endpoints of a FIFO• Software builds the connection• Replaces user’s hand-routed Verilog channels• Automatically route, even across FPGAs

Common in software:– Named ports in software timing models– UUCP has been dead for a long time (for a reason)

33

34

Finally, an Explanation of our Project’s Name

LINC: Latency-Insensitive Named Channel

LEAP: LINC-based Environment for Application Programming

HAsim: Hardware-based micro-Architecture Simulator

35

http://asim.csail.mit.edu/redmine

top related