cs294-3 reconfigurable computingcs294-3/sp04/lectures/intro.pdf · even in an application with...

33
1 CS294-3 Reconfigurable Computing J. Wawrzynek 1/21/04

Upload: others

Post on 25-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1

CS294-3Reconfigurable Computing

J. Wawrzynek1/21/04

Page 2: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 2

Course Info

My info:John Wawrzynek631 Soda Hall, [email protected] hours: W 1-3, or by appointment

Class page: http://inst.eecs.berkeley.edu/~cs294-3/

Check often for reading assignments, announcements, links, and other info.

Page 3: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 3

Course Structure

Run like a seminar:A few lectures by me, and visitorsLots of paper reading. • I will ask you to write

a few paragraphs on each paper and to be prepared to discuss it in class.

In-class oral project reviews.• Proposal• Periodic status

reports• Final report

Final grade based on submitted paper reports, class participation, and project.

Page 4: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 4

Project Info

Many possible themes:Evaluation of new RC idea through analysis and simulationNovel implementation and evaluation of important application on RC platform (Calinx board, BEE, SCORE simulator)Develop a new, or enhance an existing tool.

Goal is conference quality research.I’ll put together a list of project ideas.Groups ok.Early start. Start thinking about it now.

Page 5: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 5

Processor Efficiency

Computational Density Trend

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Pentium MMX(P55C)

Celeron(Mendocino)

Pentium III EB Pentium III-S Penitum 4(Willamette)

Pentium 4(Northwood)

(MO

PS/M

Hz/

Mili

on T

rans

isto

r)

Page 6: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 6

Challenging our Assumptions

“Are we making copies in sub-micron CMOS VLSI of copies in NMOS of copies in TTL of early vacuum tube computer designs?”

A. DeHon

10000x increase in single-chip silicon capacity changes the underlying design costs.

Von Neumann architectures designed to heavily time-multiplex the expensive ALU resource.

General-purpose computing machines don’t have to look like processors.

Page 7: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 7

What and Why?

What is “reconfigurable computing (RC)”?Many definitions.Our “standard” definition:Computing via a post-fabrication and spatially

programmed connection of processing elements.• FPGA implementation of a processor core to run a

program is excluded - not spatial mapping of problem.

• ASIC implementations excluded – not post-fabrication programmable.

Does this include arrays of processors?Often the definition restricts RC to mapping to “fine-grained” devices (such as FPGAs).

Page 8: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 8

Spatial Computation

Example:

grade = 0.2 × mt1 + 0.2 × mt2 + 0.2 × mt3 + 0.4 × project;

A hardware resource (multiplier or adder) is allocated for each operator in the compute graph.The abstract computation graph becomes the implementation template.

xx xx

++

+

0.2 mt1 0.2 mt2 0.4 proj0.2 mt3

grade

x

+

+

0.2

mt1 mt2 0.4 proj

mt3

grade

x

+

Page 9: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 9

Temporal ComputationA hardware resource is time-multiplexed to implement the actions of the operators in the compute graph.Close to a sequential processor/software solution. Many in-between cases exist.Do we want to include these in our definition?When would this be a valid model?

acc1 = mt1 + mt2;acc1 = acc1 + mt3;acc1 = 0.2 x acc1;acc2 = 0.4 x proj;grade = acc1 + acc2;

controller

ALU

mt1 mt1mt3 proj

acc1acc2

x

+

+

0.2

mt1 mt2 0.4 proj

mt3

grade

x

+Abstract computation-graph

Implementation

Page 10: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 10

RC, Processors, & ASIC

Page 11: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 11

An Important Distinction

Instruction Binding TimeWhen do we decide what operation needs to be performed?

General PrincipleEarlier the decision is bound, the less area &

delay required for the implementation.

Page 12: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 12

RC Strategy

Exploit cases where operation can be bound and then reused a large number of times.Customize for operator type, width, and interconnect.Low-overhead exploitation of application parallelism.

Page 13: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 13

Advantages of RC

Conventional processors have three major sources of inefficiency:

Heavy time-multiplexing of Function Units (ALUs).Instruction issue overhead.Memory hierarchy to deal with memory latency.

λ

λ

Peak (raw) performance

Page 14: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 14

Advantages of RC

Relative to microprocessors: on average a higher percentage of peak (or raw) computational density is achieved withreconfigurable devices.Fine-grain flexibility leads to exploitation of problem specific parallelism at many levels. Also, many different computation models (or patterns) can be supported. In general, it is possible to match problem characteristics to hardware, through the use of problem specific architectures and low-level circuit specialization.

Page 15: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 15

Advantages of RC

Spatial mapping of computation versus multiplexing of function units (as in processors) relieves pressure for memory capacity, BW, and low-latency and local communication patterns.

Page 16: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 16

Advantages of RC

Modern FPGAs make good system-level components:

Relatively large number of IOs (many parallel memory ports) High-BW communications.Machines based on these components can easily scale peak performance by riding Moore’s curve (FPGAs are process drivers).Low-level redundancy permits fault-tolerance and great cost savings.Built-in microprocessors.

Is there still room for research in novel devices for RC?

Page 17: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 17

Advantages of RC

Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or ASIC approach:

FPGAs are processes drivers, therefore a generation ahead of ASIC.Increasing NREs for ASIC and full-custom has pushed "cross-over" point way out.Time to market advantage.Programmability leads to:• project risk management• extended product life-times

Page 18: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 18

FPGAs vs. ASIC Cost-argument

ASIC: High NRE costs ($2M for 0.35um chip). Relatively Low cost per die.FPGAs: Very low NRE costs. Relatively low silicon efficiency ⇒ high cost per part.Cross-over volume from cost effective FPGA design to ASIC in the 10K range.

volume

totalcost

FPGAs cost effective

ASICs costeffective

FPGA

ASIC

Page 19: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 19

Cross-over Point is Moving Right

ASIC: Increasing NRE costs ($40M for 90nm chip1) (mask costs2, verification, etc.)Limited number of standard silicon designs becomes inevitable.FPGAs: Obvious candidate for one of those designs, furthermore, FPGAs better able to follow Moore’s Law, relatively cheaper to test.

volume

totalcost

FPGAs cost effective

ASICs costeffective

FPGAASIC

1 Vahid Manian, VP manufacturing and operations, Broadcom Corp.2 Roger Minear, Agere Systems Inc, 30- 35- layer mask set ≈$650,000 for 130nm and $1.4M for 90nm.

Page 20: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 20

Post-fabrication Customization

Gate Array like devices return to fill the gap. Post-fabcustomization with limited mask layers or direct-write e-beam.LSI Logic Rapid-Chip, CMU/VPGA, e-beam programmable devicesLower NREs than ASICs, more silicon efficiency than FPGAs.So, why bother with FPGAs?

volume

totalcost

FPGAASICGateArrays

Page 21: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 21

FPGAs are Reconfigurable

1. Commercial applications have not taken advantage of reconfigurability • Xilinx/Altera haven’t done much to help.• Methodologies/tools nearly nonexistent.

2. Volume/cost graphs don’t accurately capture the potential real costs and other advantages.

Reconfiguration uses:Field upgrades ⇒ product life extension, changing requirements.In system board-level testing and field diagnostics.Tolerance to manufacturing faults. Risk-management in system development.Runtime reconfiguration ⇒ higher silicon efficiency.• Time-multiplexed pre-designed circuits take maximum use of resources. • Runtime specialized circuit generation.

Seemingly obvious point but …

Page 22: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 22

Advantages of RC

Dynamic reconfiguration might permit even higher efficiency through hardware sharing (multiplexing) and on the fly circuit specialization.

Largely unexploited (unproven) to date.A few research projects have explored this idea.

Page 23: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 23

Multi-modal Computing Tasks

Mini/Micro-UAVsOne piece of silicon for all of sensor processing, navigation, communications, planning, logging, etc. At different times different tasks take priority and consume higher percentage of resources.

Other example: hand-heldmulti-function device with GPS, smart image capture/analysis, communications.

The premier applications for reconfigurable devices are those with constrained size/weight, need multiple functions at near ASIC performance.

Multiple ASICs too expensive/big. Processor too slow.Fine-grained reconfigurable devices has the flexibility to efficiently matchtask parallelism over a wide variety of tasks – deployed as needed.

Mars-rover Rocky4

Page 24: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 24

Fine-grained Reconfigurable Fabrics

Some work in RC has evolved to course-grain (processor based) arrays to achieve higher compute densities:

Broadcom Calisto/silicon-spice/matrix. MIT/RAW, UCB/MIT DSA.

Will homogeneous fine-grained (CLB based) arrays will be more important in the future?1. Unlikely that a single array-of-processors type

architecture will be efficient for more than a small class of apps (the general purpose parallel machine architecture problem).• Parallel machines designed for one application would

yield low computational density on other problems.

Page 25: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 25

Fine-grained Reconfigurable Fabrics

2. Homogeneous fine-grained arrays are maximally flexible:a. admit a wide variety of computational architectures

models: arrays of processors, hybrid approaches, hard-wired dataflow, systolic processing, vector processing, etc.

b. Admit a wide variety of parallelism modes: SIMD, MIMD, bit-level, etc. Resources can be deployed to low-latency when required for tight feedback loops (not possible with may parallel architectures that optimize for throughput).

c. Supports many compilation/resource management models: Statically compiled, dynamically mapped.

3. Fine-grained redundancy provide opportunity to efficiency map around manufacturing faults.

Safe bet as a standard device.

Page 26: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 26

Fine-grained Reconfigurable Fabrics

Xilinx and Altera doing a great job on current vector, but:

Tools are weak• No convenient programming model (still circuit

design)• No architecture synthesis, retiming• No runtime support or operating system.

No Dynamic reconfigurabilityNo fault tolerance Power consumption much higher than it needs to be.

Page 27: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 27

Dynamic Reconfiguration

1. Time-multiplexing resources allows more efficient use of silicon (in ways ASICs typically do not):

a. Low-duty cycle or “off critical path” computations time share fabric while critical path stays mapped in:

Why dynamic reconfiguration?

amount of fabric

total runtime

size of maximum efficiency

Page 28: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 28

Dynamic Reconfiguration

b. Course data-dependent control flow maps in only useful dataflow:

c. Allowable task foot-print may change as other tasks come and go or faults occur.

2. Fabric virtualizationallows automatic migration up and down in device sizes and eases application development.

If-then-else

Page 29: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 29

Dynamic Reconfiguration

3. Runtime Circuit Specialization:• Example: fixed coefficient multipliers in

adaptive filter changing value at low rate.• Aggressive constant propagation (based

perhaps on runtime profiling), reduces circuit size and delay.

• Could use “branch/value/range prediction” to map most common case and fault in exceptional cases.

• Can be template based – “fill in the blanks”, but better if we put PPR in runtime loop!

• New work using array assisted place and route may make it possible.

Page 30: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 30

Garp – Hybrid Processor Model

Function Speedupstrlen (len 16) 1.77strlen (len 1024) 14sort 2.1image median filter 26.9DES (ECB mode) 19.6image dithering 16.3

Speedups over 4-way superscalarUltraSparc on same process and comparable die size and memory system.

“Garp: A MIPS Processor with a ReconfigurableCoprocessor”, In Proceedings of the IEEE Symposiumon Field-Programmable Custom Computing Machines(FCCM ‘97, April 16-18, 1997)

• Pre-generated circuits for common program kernels cached within reconfigurable array and used to accelerate MIPS programs.

• nSec configuration swap time.• Limited speedup – tied to single

execution thread.

Page 31: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 31

SCORE – Virtualized Fabric Model

If-else

High silicon efficiency:♦ Only active parts of data-flow

consume resources.

♦ High-duty cycle critical path of computation stays mapped and remaining resources are shared by lower duty cycle paths.

♦ Particularly effective for multi-tasking environment with time-varying task requirements.

♦ Fabric virtualization with demand paging:• Get most out of available resources by

automatically time-multiplexing.• Automatic migration up and down in device sizes. • Eases application development.

Page 32: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 32

Topics this SemesterFundamentals:

Comparisons to traditional approaches.

Hardware platforms:

FPGAsFPGA-based machines.Novel architectures.

Applications.

Computation and execution models.Programming models and languages.

HDLs, Parallel Programming.

Low-level Mapping tools.Other?

Page 33: CS294-3 Reconfigurable Computingcs294-3/sp04/lectures/intro.pdf · Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or

1/21/04 294-3 Reconfigurable Computing 33

Reading Assignment

Read and write short report. Due 9:30am day of class.Detailed instructions linked to web page calendar (next to “Reading” header)For Monday:

Read only: • Reconfigurable Computing: What, Why, and Implications

for Design Automation.Read and review:

• The Density Advantage of Configurable Computing,• FPGAs vs. CPUs: Trend in Peak Floating-Point

Performance,• A Quantitative Analysis of the Speedup Factors of FPGAs

over Processors.