hw/sw co-design lecture 4: lab 2 – passive hw accelerator design course material designed by...

38
HW/SW Co-design HW/SW Co-design Lecture 4: Lecture 4: Lab 2 – Passive HW Lab 2 – Passive HW Accelerator Design Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NT RA: Yi-Chiun Fang, EE Dept, NTHU

Upload: constance-ada-robertson

Post on 03-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

HW/SW Co-designHW/SW Co-design

Lecture 4:Lecture 4:Lab 2 – Passive HW Accelerator Lab 2 – Passive HW Accelerator

DesignDesign

Course material designed by Professor Yarsun Hsu, EE Dept, NTHURA: Yi-Chiun Fang, EE Dept, NTHU

Page 2: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

OutlineOutline

Introduction to AMBA Bus SystemPassive Hardware DesignInterrupt Service RoutineEnvironment ConfigurationCo-designed System with GHDL SimulationCo-designed System on FPGA

Page 3: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

INTRODUCTION TO AMBA INTRODUCTION TO AMBA BUS SYSTEMBUS SYSTEM

Page 4: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (1/7)AMBA 2.0 Bus System (1/7)Established by ARMAdvanced High-performance Bus (AHB)

For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller

Advanced Peripheral Bus (APB)Optimized for minimal power consumption and reduced interface complexity to support peripheral functions

For more details, please refer to the following documentsAMBA 2.0 SpecificationIntroduction to AMBA Bus SystemGRLIB AHBCTRL - AMBA AHB controller with plug&play support

Page 5: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (2/7)AMBA 2.0 Bus System (2/7)

Slave on AHBThe only master on APB

Page 6: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (3/7)AMBA 2.0 Bus System (3/7)

AMBA AHB is designed to be used with a central multiplexor interconnection scheme

Avoids tri-state bus

Page 7: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (4/7)AMBA 2.0 Bus System (4/7)

An AHB transfer consists of two distinct sections

The address phase, which lasts only a single cycleThe data phase, which may require several cycles

This is achieved using the HREADY signal

Page 8: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (5/7)AMBA 2.0 Bus System (5/7)

A slave may insert wait states into any transferFor write operations, the bus master will hold the data stable throughout the extended cyclesFor read transfers, the slave does not have to provide valid data until the transfer is about to complete

wait states

Page 9: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (6/7)AMBA 2.0 Bus System (6/7)

GRLIB implements AMBA AHB with slight modificationsPlease refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed information

Page 10: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

AMBA 2.0 Bus System (7/7)AMBA 2.0 Bus System (7/7)The GRLIB implementation of AHB includes a mechanism to provide plug&play support

The implementation is located at grlib-gpl-1.0.19-b3188/lib/grlib/amba/

The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal

identification of attached units

address mapping of slaves

interrupt routing

type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

Page 11: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

PASSIVE HARDWARE DESIGNPASSIVE HARDWARE DESIGN

Page 12: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Passive HW AcceleratorsPassive HW Accelerators

The accelerator (bus slave) does not actively send signals to the bus

It only responds to the masterThe master gives commands to the slave via its control registers and probes its status registers

master

slave

Page 13: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Passive 1-D IDCT HW Acc. (1/4)Passive 1-D IDCT HW Acc. (1/4)

A simple 2-stage designGate delay

Stage 1: ~1 multStage 2: ~3 add

Action registerWrite ‘1’ to start, resetto 0 automatically by theaccelerator when done

Mode registerRow/column mode

No wait statesImmediate response

action

mode

Page 14: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Passive 1-D IDCT HW Acc. (2/4)Passive 1-D IDCT HW Acc. (2/4)

Data packingSince the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit)We pack two values together to increase data bus utilization and reduce the communication overheadThe action bit and mode bit are also packed together

Y2n, x2n

32 bits

16 bits 16 bits

MSB

Y2n+1, x2n+1 actionmodeUNUSED

31 012

Page 15: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Passive 1-D IDCT HW Acc. (3/4)Passive 1-D IDCT HW Acc. (3/4)

1-D IDCT calculationSTEP1: Write Y registers (4 transfers)STEP2: Write mode bit & action bitSTEP3: Poll the action bitSTEP4: Read x registers after action bit reset

Page 16: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Passive 1-D IDCT HW Acc. (4/4)Passive 1-D IDCT HW Acc. (4/4)

static voidhw_idct_1d(short *dst, short *src, unsigned int mode){ long *long_ptr = (long *)src;

Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1]; ...

*c_reg = (long)((mode << 1) | 0x1);

while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1]; ...}

Page 17: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

INTERRUPT SERVICE INTERRUPT SERVICE ROUTINEROUTINE

Page 18: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GRLIB GPTIMER (1/2)GRLIB GPTIMER (1/2)General Purpose Timer UnitTimers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement)Acts as a slave on AMBA APBProvides a common decrementing prescaler (clocked by the system clock) and decrementing timersCapable of assertinginterrupt on timerunderflowWe initialize timer 2 for1ms resolution (i.e. aninterrupt will be assertedevery 1ms)

Page 19: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GRLIB GPTIMER (2/2)GRLIB GPTIMER (2/2)

Please refer to the GRLIB IP Cores Manual for detailed information

Page 20: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

eCos ISR (1/3)eCos ISR (1/3)

When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR)One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute

interrupt latency

Page 21: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

eCos ISR (2/3)eCos ISR (2/3)

Basic API for implementing ISRPlease refer to the eCos Reference Manual for detailed information#include <cyg/kernel/kapi.h>

void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_tdata, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr);void cyg_interrupt_delete(cyg_handle_t interrupt);void cyg_interrupt_attach(cyg_handle_t interrupt);void cyg_interrupt_detach(cyg_handle_t interrupt);void cyg_interrupt_acknowledge(cyg_vector_t vector);void cyg_interrupt_mask(cyg_vector_t vector);void cyg_interrupt_unmask(cyg_vector_t vector);

Page 22: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

eCos ISR (3/3)eCos ISR (3/3)

An ISR is a C function which takes the following formAn ISR should complete as soon as possible

cyg_uint32isr_function(cyg_vector_t vector, cyg_addrword_t data){ ... /* do the service routine */ return CYG_ISR_HANDLED;}

Page 23: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Program Profiling (1/2)Program Profiling (1/2)

We use GPTIMER for time measurmentEvery time the timer asserts an interrupt, the timer ISR will increase a global variable time_tickcyg_uint32timer_isr(cyg_vector_t vector, cyg_addrword_t data){ unsigned long *time_tick = (unsigned long *) data;

(*time_tick)++;

cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED;}

Page 24: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Program Profiling (2/2)Program Profiling (2/2)

We record the latency of every function block by monitoring the time_tick variable

voidfunc(){ unsigned long local_timer = time_tick;

...

time_elapsed += (time_tick - local_timer);}

Page 25: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

ENVIRONMENT ENVIRONMENT CONFIGURATIONCONFIGURATION

Page 26: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Build SW ApplicationBuild SW Application

Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory

Replace the Makefile and modify the path for ECOSDIR in Makefile

Type “make” to build-D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench

Without this flag, hw_idct_2d() will be identical to sw_idct_2d()

-D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end

Page 27: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Install IDCT AcceleratorInstall IDCT Accelerator

Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and replace the original fileCopy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b3188/lib/

The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd

Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ and replace the original file

Page 28: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

CO-DESIGNED SYSTEM WITH CO-DESIGNED SYSTEM WITH GHDL SIMULATIONGHDL SIMULATION

Page 29: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (1/6)GHDL Simulation (1/6)

We compile our program as a virtual SDRAM for LEON3 processorLEON3 will fetch the instructions and perform the corresponding operationsAll the hardware signals can be recorded and dumped by GHDL

Page 30: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (2/6)GHDL Simulation (2/6)In order to perform GHDL simulation, we disallow our program to link with eCos

Remove -D__ECOS &-I$(ECOSDIR)/include from CFLAGSRemove -Ttarget.ld, -nostdlib, &-L$(ECOSDIR)/lib from LFLAGSRemove –D_PROFILING_ flag

You can remove -D_VERBOSE_ for faster simulationYou can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterationsType “make” to buildYou should see a file named sdram.srec

Page 31: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (3/6)GHDL Simulation (3/6)

Start Cygwincd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/make distcleanmake softCopy sdram.srec webuilt into this directoryand replace theoriginal onemake ghdl

You can check forsyntax errors throughGHDL

Page 32: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (4/6)GHDL Simulation (4/6)

Type “./testbench.exe --vcd=waveform.vcd” after compilation to begin simulationYou should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator

Page 33: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (5/6)GHDL Simulation (5/6)

The dump file waveform.vcd can be viewed on-the-fly using GTKWaveDrag waveform.vcd and drop it over the gtkwave.exe icon to open

You can also use Windows cmd to open“File → Reload Waveform” in GTKWave to update the dump file

Page 34: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

GHDL Simulation (6/6)GHDL Simulation (6/6)

addrphase

dataphase

stage1

stage2

probecontrol reg

Page 35: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

CO-DESIGNED SYSTEM ON CO-DESIGNED SYSTEM ON FPGAFPGA

Page 36: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Build FPGA Bitstream (1/2)Build FPGA Bitstream (1/2)

Type “make ise | tee ise_log” under grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ after you install the acceleratorIt is strongly suggested that you verify the hardware with GHDL simulation firstIt is also suggested that you take a look at ise_log for more informationConfigure your FPGA with leon3mp.bit after generating the bitstream

Page 37: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Build FPGA Bitstream (2/2)Build FPGA Bitstream (2/2)

After entering GRMON, check the system configuration using “info sys”You should see a device with “Unknown vendor” appear

Page 38: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE

Profiling ResultsProfiling Results

Build the program with -D_PROFILING_ flag onCompare the computation results of sw_idct_2d() and hw_idct_2d()Compare thecomputationresults withand without-D_VERBOSE_flag