auto-pipe - engineering school class web sites · pdf filewhat is auto-pipe software stream...

12
Auto-pipe An introduction by Jonathan Beard ([email protected] ) 1 Thursday, January 26, 12

Upload: truongkhuong

Post on 06-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

Auto-pipeAn introduction

by Jonathan Beard ([email protected])

1Thursday, January 26, 12

Page 2: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

Big-Data

1986 1993 2000 2007

295 exabytes

54.5 exabytes

15.8 exabytes2.6 exabytes

2Thursday, January 26, 12

We’ve moved from a world with very little data (relatively) in 1986 to a world beginning to drown in data (2007).

Page 3: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

Real-time

1986 1993 2000 2007

3Thursday, January 26, 12

Human naturally want to make sense of the world. Give us data and we’ll want to process it, find patterns, etc. However, computers of course do this much better than humans do so we designed algorithms to find patters for us. Now that we have exabytes of data to deal with we still want the data as fast as possible, in real-time even. How do we do that? We could use one of the Top 500 supercomputers, they’re reconfigurable and can probably do what we want if we can come up with an algorithm for what we need. Great, we’ve solved our problem. The high frequency traders, gene assemblers, planet hunters and pharmaceutical industry can rejoice. New problem, actually two of them. How do we pay for the electricity bill (upwards of $20 million USD electricity bill a year) and how do we find programmers skilled enough to turn our algorithm into something that can utilize 80,000 compute nodes. From 1986 to 2007 computers have gained significantly in compute capability, going from hundreds of FLOPS to multiple petaflops, however the power usage has also increase in correlation with compute capability. Using the dogs on the previous slide as an example and assuming our ability to provide power is relatively fixed we can only provide incrementally more compute capability (dog food) for our now massive dog (2007). What do we do? Looks like our dog is going to go hungry. We’ll try something different, lets try a different way of thinking about computing.

Page 4: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

Data-flowfor i←0 through N do

a[i] ←(b[i] + c[i]) ÷ 2i++

end do

i++

a,b,c,i

i <=N

a[i] ←(b[i] + c[i]) ÷ 2

exitout←b+c

out←in/2

read b,c

write a

4Thursday, January 26, 12

As a simple example, lets look at the algorithm in green above. Its a simple for loop that takes two elements from an array, adds them together, divides the sum by two and then assigns the result to the corresponding index in the third array. For a load/store architecture this loop is fairly efficient, but imagine how much simpler it can be with a data-flow architecture. We begin looking at each operation as a function connected by FIFO queues transmitting data between them. At right we can see one function (read) which supplies data, an add function which sends the sum of b and c, an division function which divides the input value by two and a write function. Conceptually this allows pipelining of the application, it also provides an easy way to expose instruction level parallelism that can be exploited on Load/Store architectures and in hardware (i.e. the b+c can be performed on as many elements as we have available, or in a load store the limit is currently 8 32-bit elements at a time).

Page 5: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

What is Auto-pipeSoftware stream coordination language & runtime

Kernel A Kernel B Kernel C Kernel D

5Thursday, January 26, 12

So, what is auto-pipe. It provides two things: 1) a means to describe the data-flow application topology 2) a runtime infrastructure to hook the application topology together via FIFO queues. In this example the blue compute kernels are the transfer functions or combinations of functions on a single resource and the green arrows represent FIFO queues between resources. These queues are resizable via configuration parameters (in AP we call these configs) within the coordination language (we’ll get to that in a few slides). The FIFOs are implemented both in main shared memory and synthesized on the FPGA.

Page 6: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

What goes in an AP Compute Kernel

C Kernel

•ap_<BlockName>_init - everything for this kernel that needs to be initialized

•ap_<BlockName>_destroy - everything that needs to be released

•ap_<BlockName>_push - called when data is available on an input port

•ap_ <BlockName>_go - called continuously until the return value is non-zero

•ap_ <BlockName>_push_signal - called when a signal is received

6Thursday, January 26, 12

We’ll see several examples of C AP Kernels. In order to be of use, most AP applications must have some sort of data pushed to them or receive from the FPGA on a CPU. Within the C source (.c) and header (.h) file we must implement and define (respectively) the functions listed above. They are fairly self explanatory, however you should know that you can rely on the init being called before all the other functions and destroy being called after. Otherwise go will be called continuously unless it returns a non-zero value and push will be called by an upstream node when data is available. If more than one port “enters” this kernel then you must check to see which port has data and take appropriate action when push is called.

Page 7: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

What goes in an AP Compute Kernel

HDL Kernel

Signal Dir Type

clkrst

inin

std_logicstd_logic

avail_<in_name>input_<in_name>read_<in_name>

inin

out

std_logic<X_TYPE>std_logic

output_<out_name>write_<out_name>afull_<out_name>

outoutin

<X_TYPE>std_logicstd_logic

7Thursday, January 26, 12

For Kernels with HDL implementations we must of course define clk (clock) and rst (reset) signals then for every port “entering” this kernel we must define the ports highlighted in green. The dark green ports are for input. Avail goes high when input is available, otherwise it is low. Once data is read read is set high by the kernel (you must do this in your logic) to signal the FIFO implemented behind this kernel to release the data. Implied with the FIFO releasing the data is that you have your input registered within your block (this you must implement within your kernel). The light green signals define the output ports of your kernel. The output is sent to output, when it is valid (i.e. you have output you want to write) the kernel should send write high (you must do this in your logic). The input afull comes from the fifo downstream of this kernel, it is sent high if there is no more room to write to, you must decide what logic to include in your kernel to handle this (i.e. drop data, stop processing, store then stop, etc.).

Page 8: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

HW Kernel Layout

Kernel

FIFO FIFOavail_x0

input_x0

read_x0

output_y0

write_y0

afull_y0

clk rst

1

X, quick and dirty

Eric [email protected]

These slides

• HDL block interface

• HDL block example

• C block interface

• Wiring in the X language

HDL interface

• You already sawthis

• Additional signal:afull_f for each output,used for backpressure.You can output ONE more element after

seeing afull go high

HDL example: FFTentity X_fft is

generic( width: in X_UNSIGNED16 := x"00c0" );port( clk: in std_logic;

rst: in std_logic;

-- input sideinput_in : in X_array(16*192-1 downto 0);avail_in : in std_logic;read_in : out std_logic;

-- output sideoutput_out_real : out X_array(16*128-1 downto 0);write_out_real : out std_logic;afull_out_real : in std_logic;

output_out_imag : out X_array(16*128-1 downto 0);write_out_imag : out std_logic;afull_out_imag : in std_logic

);end X_fft;

HDL example: FFTarchitecture arch of X_fft is

-- from coregencomponent fft_s16_256 ... end component;

-- convert from flat vector to arrays of valuestype inputl_in_t is array(0 to 191) of X_unsigned16;signal inputl_in : inputl_in_t;type outputl_out_t is array(0 to 127) of X_unsigned16;signal outputl_out_real, outputl_out_imag :outputl_out_t;

-- this will eventually be the sort of thing that gets autogen'dfixinp: for i in 0 to 191 generate

inputl_in(i) <= x_unsigned16(input_in(i*16+15 downto i*16));end generate;

fixoutp: for i in 0 to 127 generateoutput_out_real(i*16+15 downto i*16) <= std_logic_vector(outputl_out_real(i));output_out_imag(i*16+15 downto i*16) <= std_logic_vector(outputl_out_imag(i));

end generate;

... begin ...

HDL example: FFTdoinputclk: process(clk,rst) isbegin

if clk'event and clk='1' and rst='1' thenfft_start <= '0';loading <= '0';read_in <= '0';

elsif clk'event and clk='1' thenfft_start <= '0';read_in <= '0';loading <= loading;

if avail_in='1' and loading='0' and fft_busy='0' and afull_out_real='0' and afull_out_imag='0' then

loading <= '1';fft_start <= '1';

end if;if loading='1' then

-- if fft_xn_index = x"ff" thenif fft_xn_index = x"7f" then

loading <= '0';read_in <= '1';

end if;end if;

end if;end process doinputclk;

8Thursday, January 26, 12

Page 9: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

X-coordination language

<kernName>.x

Describes ports for a

kernel and the implementations available, i.e. C, VHDL, etc.

(note C & C_x86 reduce to the same

thing)

Input port type. These are simply type definitions.

Source file name, minus the file extension.

Block name.

Platform type

Must match block name, i.e. this is an implementation for block sum2U32

9Thursday, January 26, 12

For each kernel (we usually call them blocks, you’ll notice us use these terms interchangeably) we have a .x file that defines its ports and implementations. The HDL and C blocks must have the same port input types (notice we only define the input types, we abstract away the HDL signals since they have fixed types). We then describe the implementation (impl in the X syntax) of each block. In this case we have a C platform and a VHDL platform defined with source files (denoted by the base=”” syntax) of sum2U32. We leave the file extension off, the X compiler assumes .vhd for vhdl files and .c for C source files.

Page 10: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

X-coordination language

algo.x

algorithm mapping file,

names included kernels,

describes edges, allows per kernel

static config params

all block .x definition files are included here

Configuration parameters go here. These are defined within the block .x

file and within the c header file. VHDL files can use generics for these.

We declare and instantiate blocks here, the genU32 is a type of block and gen1 is the

instance name.

Edges between blocks are defined here, each port must have an edge

connecting it.This simply says which block defines

the overall application

10Thursday, January 26, 12

Page 11: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

X-coordination language

map.x

describes resources

usage, must include

definition of resource (i.e.

algo.x describes app)

You can use any type of C style #defines or macros you want, the CPP will insert

them.

Files to include, “std.x” defines the C, C_x86 and VHDL platform types.

An X array syntax, if you had two processes this would be:

resource proc[2] is C_x86 {(file=”proc_1_.cpp”, cxx=”true”,xsim=”true”),

(file=”proc_2_.cpp”,cxx=”true”,xsim=”true”)};

The map statement defines the resource which a portion of the application would run. In this example we’re running the entire

application on process 1.

11Thursday, January 26, 12

Page 12: Auto-pipe - Engineering School Class Web Sites · PDF fileWhat is Auto-pipe Software stream coordination language & runtime ... There are quite a few functions described in the autopipe

Platform Definitions

• ipc.x - Shared Memory Link Definition

• std.x - x86 process definitions, basic VHDL types

• tcp.x - TCP connection definitions

12Thursday, January 26, 12

There are quite a few functions described in the autopipe port interface: http://goo.gl/5o6K5. There will be separate instructions on how to use the build system.