combinational circuits in bluespec arvind computer science & artificial intelligence lab

24
September 8, 2009 http:// csg.csail.mit.edu/korea L03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Upload: astro

Post on 23-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Now we call this Guarded Atomic Actions. Bluespec: Two-Level Compilation. Bluespec (Objects, Types, Higher-order functions). Lennart Augustsson - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 http://csg.csail.mit.edu/korea L03-1

Combinational Circuits in Bluespec

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

Page 2: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-2http://csg.csail.mit.edu/korea

Object code(Verilog/C)

Bluespec: Two-Level Compilation

Rules and Actions(Term Rewriting System)

• Rule conflict analysis• Rule scheduling

James Hoe & Arvind@MIT 1997-2000

Bluespec(Objects, Types,

Higher-order functions)

Level 1 compilation• Type checking• Massive partial evaluation and static elaboration

Level 2 synthesis

Lennart Augustsson@Sandburst 2000-2002

Now we call this Guarded Atomic Actions

Page 3: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-3http://csg.csail.mit.edu/korea

Static Elaboration

.exe

compile

design2 design3design1

elaborate w/params

run1run1run2.1…

run1run1run3.1…

run1run1run1.1…

run w/params

run w/params

run1run1…

run

At compile time Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most

data structure operations

sourceSoftwareToolflow: source

HardwareToolflow:

Page 4: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-4http://csg.csail.mit.edu/korea

Combinational IFFT

in0

in1

in2

in63

in3

in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3

out4

Perm

ute

Perm

ute

Perm

ute

All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...

*

*

*

*

+

-

-

+

+

-

-

+

*jt2

t0

t3

t1

Page 5: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-5http://csg.csail.mit.edu/korea

4-way Butterfly Node

function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);

BSV has a very strong notion of types Every expression has a type. Either it is declared by

the user or automatically deduced by the compiler The compiler verifies that the type declarations are

compatible

*

*

*

*

+

-

-

+

+

-

-

+

*i

t0

t1

t2

t3

k0

k1

k2

k3

Page 6: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-6http://csg.csail.mit.edu/korea

BSV code: 4-way Butterflyfunction Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);

Vector#(4,Complex) m, y, z;

m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3];

y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]);

z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3];

return(z);endfunction

Polymorphic code: works on any type of numbers for which *, + and - have been defined

*

*

*

*

+

-

-

+

+

-

-

+

*i

m y z

Note: Vector does not mean storage

Page 7: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-7http://csg.csail.mit.edu/korea

Complex ArithmeticAddition

zR = xR + yR

zI = xI + yI

Multiplication zR = xR * yR - xI * yI

zR = xR * yI + xI * yR

The actual arithmetic for FFT is different because we use a non-standard fixed point representations

Page 8: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-8http://csg.csail.mit.edu/korea

BSV code for Addition

typedef struct{ Int#(t) r; Int#(t) i;} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag});endfunction

Page 9: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-9http://csg.csail.mit.edu/korea

Combinational IFFT

in0

in1

in2

in63

in3

in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3

out4

Perm

ute

Perm

ute

Perm

ute

stage_f function

repeat stage_f three times

function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in);

function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);

Page 10: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-10http://csg.csail.mit.edu/korea

BSV Code: Combinational IFFTfunction Vector#(64, Complex) ifft

(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]);return(stage_data[3]);

The for loop is unfolded and stage_f is inlined during static elaboration

Note: no notion of loops or procedures during execution

Page 11: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-11http://csg.csail.mit.edu/korea

BSV Code: Combinational IFFT- Unfoldedfunction Vector#(64, Complex) ifft

(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data;

stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]);

Stage_f can be inlined now; it could have been inlined before loop unfolding also.

Does the order matter?

stage_data[1] = stage_f(0,stage_data[0]);stage_data[2] = stage_f(1,stage_data[1]);stage_data[3] = stage_f(2,stage_data[2]);

Page 12: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-12http://csg.csail.mit.edu/korea

Bluespec Code for stage_ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out); twid’s are

mathematically derivable constants

Page 13: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-13http://csg.csail.mit.edu/korea

Higher-order functions:Stage functions f1, f2 and f3function f1(x); return (stage_f(1,x)); endfunction

function f2(x); return (stage_f(2,x)); endfunction

function f3(x); return (stage_f(3,x)); endfunction

What is the type of f1(x) ?

Page 14: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-14http://csg.csail.mit.edu/korea

Suppose we want to reuse some part of the circuit ...

in0

in1

in2

in63

in3

in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3

out4

Perm

ute

Perm

ute

Perm

ute

Reuse the same circuit three times to reduce area

But why?

Page 15: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 http://csg.csail.mit.edu/korea L03-15

Architectural Exploration:Area-Performance tradeoff in 802.11a Transmitter

Page 16: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-16http://csg.csail.mit.edu/korea

802.11a Transmitter Overview

Controller Scrambler Encoder

Interleaver Mapper

IFFTCyclicExtend

headers

data

IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain)

complex numbers accounts for 85% area

24 Uncoded

bits

One OFDM symbol (64 Complex Numbers)

Must produce one OFDM symbol every 4 sec

Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

Page 17: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-17http://csg.csail.mit.edu/korea

Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind

Design Lines of RelativeBlock Code (BSV) AreaController 49 0%Scrambler 40 0%Conv. Encoder 113 0%Interleaver 76 1%Mapper 112 11%IFFT 95 85%Cyc. Extender 23 3%

Complex arithmetic libraries constitute another 200 lines of code

Page 18: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-18http://csg.csail.mit.edu/korea

Combinational IFFT

in0

in1

in2

in63

in3

in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3

out4

Perm

ute

Perm

ute

Perm

ute

Reuse the same circuit three times to reduce area

Page 19: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-19http://csg.csail.mit.edu/korea

f g

Design AlternativesReuse a block over multiple cycles

we expect:

Throughput to

Area to

ff g

decrease – less parallelism

The clock needs to run faster for the same throughput hyper-linear increase in energy

decrease – reusing a block

Page 20: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-20http://csg.csail.mit.edu/korea

Circular pipeline: Reusing the Pipeline Stage

in0

in1

in2

in63

in3

in4

out0

out1

out2

out63

out3

out4

Bfly4

Bfly4

Perm

ute

Stage Counter

Page 21: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-21http://csg.csail.mit.edu/korea

Superfolded circular pipeline: Just one Bfly-4 node!in0

in1

in2

in63

in3

in4

out0

out1

out2

out63

out3

out4

Bfly4

Perm

ute

Index == 15?

Index: 0 to 15

64

, 2-w

ay

Muxes

4, 1

6-w

ay

Muxes

4, 1

6-w

ay

DeM

uxes

Stage 0 to 2

Page 22: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-22http://csg.csail.mit.edu/korea

Pipelining a block

inQ outQ

f2f1 f3Combinational

C

inQ outQ

f2f1 f3Pipeline

P

inQ outQ

f Folded Pipeline

FP

Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < P

Page 23: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-23http://csg.csail.mit.edu/korea

Syntax: Vector of Registers

Register suppose x and y are both of type Reg. Then

x <= y means x._write(y._read())

Vector of Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j))

Vector of Registers x[i] <= y[j] does not work. The parser thinks it means

(sel(x,i)._read)._write(sel(y,j)._read), which will not type check

(x[i]) <= y[j] parses as sel(x,i)._write(sel(y,j)._read), and works correctly

Don’t ask me why

Page 24: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

September 8, 2009 L03-24http://csg.csail.mit.edu/korea

Static vs dynamic expressions

Expressions that can be evaluated at compile time will be evaluated at compile-time 3+4 7

Some expressions do not have run-time representations and must be evaluated away at compile time; an error will occur if the compile-time evaluation does not succeed Integers, reals, loops, lists, functions, …