![Page 1: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/1.jpg)
HW/SW Co-designHW/SW Co-design
Lecture 4:Lecture 4:Lab 2 – Passive HW Accelerator Lab 2 – Passive HW Accelerator
DesignDesign
Course material designed by Professor Yarsun Hsu, EE Dept, NTHURA: Yi-Chiun Fang, EE Dept, NTHU
![Page 2: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/2.jpg)
OutlineOutline
Introduction to AMBA Bus SystemPassive Hardware DesignInterrupt Service RoutineEnvironment ConfigurationCo-designed System with GHDL SimulationCo-designed System on FPGA
![Page 3: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/3.jpg)
INTRODUCTION TO AMBA INTRODUCTION TO AMBA BUS SYSTEMBUS SYSTEM
![Page 4: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/4.jpg)
AMBA 2.0 Bus System (1/7)AMBA 2.0 Bus System (1/7)Established by ARMAdvanced High-performance Bus (AHB)
For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller
Advanced Peripheral Bus (APB)Optimized for minimal power consumption and reduced interface complexity to support peripheral functions
For more details, please refer to the following documentsAMBA 2.0 SpecificationIntroduction to AMBA Bus SystemGRLIB AHBCTRL - AMBA AHB controller with plug&play support
![Page 5: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/5.jpg)
AMBA 2.0 Bus System (2/7)AMBA 2.0 Bus System (2/7)
Slave on AHBThe only master on APB
![Page 6: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/6.jpg)
AMBA 2.0 Bus System (3/7)AMBA 2.0 Bus System (3/7)
AMBA AHB is designed to be used with a central multiplexor interconnection scheme
Avoids tri-state bus
![Page 7: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/7.jpg)
AMBA 2.0 Bus System (4/7)AMBA 2.0 Bus System (4/7)
An AHB transfer consists of two distinct sections
The address phase, which lasts only a single cycleThe data phase, which may require several cycles
This is achieved using the HREADY signal
![Page 8: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/8.jpg)
AMBA 2.0 Bus System (5/7)AMBA 2.0 Bus System (5/7)
A slave may insert wait states into any transferFor write operations, the bus master will hold the data stable throughout the extended cyclesFor read transfers, the slave does not have to provide valid data until the transfer is about to complete
wait states
![Page 9: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/9.jpg)
AMBA 2.0 Bus System (6/7)AMBA 2.0 Bus System (6/7)
GRLIB implements AMBA AHB with slight modificationsPlease refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed information
![Page 10: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/10.jpg)
AMBA 2.0 Bus System (7/7)AMBA 2.0 Bus System (7/7)The GRLIB implementation of AHB includes a mechanism to provide plug&play support
The implementation is located at grlib-gpl-1.0.19-b3188/lib/grlib/amba/
The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal
identification of attached units
address mapping of slaves
interrupt routing
type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;
type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;
![Page 11: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/11.jpg)
PASSIVE HARDWARE DESIGNPASSIVE HARDWARE DESIGN
![Page 12: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/12.jpg)
Passive HW AcceleratorsPassive HW Accelerators
The accelerator (bus slave) does not actively send signals to the bus
It only responds to the masterThe master gives commands to the slave via its control registers and probes its status registers
master
slave
![Page 13: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/13.jpg)
Passive 1-D IDCT HW Acc. (1/4)Passive 1-D IDCT HW Acc. (1/4)
A simple 2-stage designGate delay
Stage 1: ~1 multStage 2: ~3 add
Action registerWrite ‘1’ to start, resetto 0 automatically by theaccelerator when done
Mode registerRow/column mode
No wait statesImmediate response
action
mode
![Page 14: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/14.jpg)
Passive 1-D IDCT HW Acc. (2/4)Passive 1-D IDCT HW Acc. (2/4)
Data packingSince the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit)We pack two values together to increase data bus utilization and reduce the communication overheadThe action bit and mode bit are also packed together
Y2n, x2n
32 bits
16 bits 16 bits
MSB
Y2n+1, x2n+1 actionmodeUNUSED
31 012
![Page 15: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/15.jpg)
Passive 1-D IDCT HW Acc. (3/4)Passive 1-D IDCT HW Acc. (3/4)
1-D IDCT calculationSTEP1: Write Y registers (4 transfers)STEP2: Write mode bit & action bitSTEP3: Poll the action bitSTEP4: Read x registers after action bit reset
![Page 16: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/16.jpg)
Passive 1-D IDCT HW Acc. (4/4)Passive 1-D IDCT HW Acc. (4/4)
static voidhw_idct_1d(short *dst, short *src, unsigned int mode){ long *long_ptr = (long *)src;
Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1]; ...
*c_reg = (long)((mode << 1) | 0x1);
while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1]; ...}
![Page 17: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/17.jpg)
INTERRUPT SERVICE INTERRUPT SERVICE ROUTINEROUTINE
![Page 18: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/18.jpg)
GRLIB GPTIMER (1/2)GRLIB GPTIMER (1/2)General Purpose Timer UnitTimers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement)Acts as a slave on AMBA APBProvides a common decrementing prescaler (clocked by the system clock) and decrementing timersCapable of assertinginterrupt on timerunderflowWe initialize timer 2 for1ms resolution (i.e. aninterrupt will be assertedevery 1ms)
![Page 19: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/19.jpg)
GRLIB GPTIMER (2/2)GRLIB GPTIMER (2/2)
Please refer to the GRLIB IP Cores Manual for detailed information
![Page 20: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/20.jpg)
eCos ISR (1/3)eCos ISR (1/3)
When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR)One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute
interrupt latency
![Page 21: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/21.jpg)
eCos ISR (2/3)eCos ISR (2/3)
Basic API for implementing ISRPlease refer to the eCos Reference Manual for detailed information#include <cyg/kernel/kapi.h>
void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_tdata, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr);void cyg_interrupt_delete(cyg_handle_t interrupt);void cyg_interrupt_attach(cyg_handle_t interrupt);void cyg_interrupt_detach(cyg_handle_t interrupt);void cyg_interrupt_acknowledge(cyg_vector_t vector);void cyg_interrupt_mask(cyg_vector_t vector);void cyg_interrupt_unmask(cyg_vector_t vector);
![Page 22: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/22.jpg)
eCos ISR (3/3)eCos ISR (3/3)
An ISR is a C function which takes the following formAn ISR should complete as soon as possible
cyg_uint32isr_function(cyg_vector_t vector, cyg_addrword_t data){ ... /* do the service routine */ return CYG_ISR_HANDLED;}
![Page 23: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/23.jpg)
Program Profiling (1/2)Program Profiling (1/2)
We use GPTIMER for time measurmentEvery time the timer asserts an interrupt, the timer ISR will increase a global variable time_tickcyg_uint32timer_isr(cyg_vector_t vector, cyg_addrword_t data){ unsigned long *time_tick = (unsigned long *) data;
(*time_tick)++;
cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED;}
![Page 24: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/24.jpg)
Program Profiling (2/2)Program Profiling (2/2)
We record the latency of every function block by monitoring the time_tick variable
voidfunc(){ unsigned long local_timer = time_tick;
...
time_elapsed += (time_tick - local_timer);}
![Page 25: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/25.jpg)
ENVIRONMENT ENVIRONMENT CONFIGURATIONCONFIGURATION
![Page 26: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/26.jpg)
Build SW ApplicationBuild SW Application
Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory
Replace the Makefile and modify the path for ECOSDIR in Makefile
Type “make” to build-D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench
Without this flag, hw_idct_2d() will be identical to sw_idct_2d()
-D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end
![Page 27: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/27.jpg)
Install IDCT AcceleratorInstall IDCT Accelerator
Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and replace the original fileCopy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b3188/lib/
The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd
Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ and replace the original file
![Page 28: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/28.jpg)
CO-DESIGNED SYSTEM WITH CO-DESIGNED SYSTEM WITH GHDL SIMULATIONGHDL SIMULATION
![Page 29: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/29.jpg)
GHDL Simulation (1/6)GHDL Simulation (1/6)
We compile our program as a virtual SDRAM for LEON3 processorLEON3 will fetch the instructions and perform the corresponding operationsAll the hardware signals can be recorded and dumped by GHDL
![Page 30: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/30.jpg)
GHDL Simulation (2/6)GHDL Simulation (2/6)In order to perform GHDL simulation, we disallow our program to link with eCos
Remove -D__ECOS &-I$(ECOSDIR)/include from CFLAGSRemove -Ttarget.ld, -nostdlib, &-L$(ECOSDIR)/lib from LFLAGSRemove –D_PROFILING_ flag
You can remove -D_VERBOSE_ for faster simulationYou can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterationsType “make” to buildYou should see a file named sdram.srec
![Page 31: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/31.jpg)
GHDL Simulation (3/6)GHDL Simulation (3/6)
Start Cygwincd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/make distcleanmake softCopy sdram.srec webuilt into this directoryand replace theoriginal onemake ghdl
You can check forsyntax errors throughGHDL
![Page 32: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/32.jpg)
GHDL Simulation (4/6)GHDL Simulation (4/6)
Type “./testbench.exe --vcd=waveform.vcd” after compilation to begin simulationYou should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator
![Page 33: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/33.jpg)
GHDL Simulation (5/6)GHDL Simulation (5/6)
The dump file waveform.vcd can be viewed on-the-fly using GTKWaveDrag waveform.vcd and drop it over the gtkwave.exe icon to open
You can also use Windows cmd to open“File → Reload Waveform” in GTKWave to update the dump file
![Page 34: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/34.jpg)
GHDL Simulation (6/6)GHDL Simulation (6/6)
addrphase
dataphase
stage1
stage2
probecontrol reg
![Page 35: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/35.jpg)
CO-DESIGNED SYSTEM ON CO-DESIGNED SYSTEM ON FPGAFPGA
![Page 36: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/36.jpg)
Build FPGA Bitstream (1/2)Build FPGA Bitstream (1/2)
Type “make ise | tee ise_log” under grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ after you install the acceleratorIt is strongly suggested that you verify the hardware with GHDL simulation firstIt is also suggested that you take a look at ise_log for more informationConfigure your FPGA with leon3mp.bit after generating the bitstream
![Page 37: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/37.jpg)
Build FPGA Bitstream (2/2)Build FPGA Bitstream (2/2)
After entering GRMON, check the system configuration using “info sys”You should see a device with “Unknown vendor” appear
![Page 38: HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef45503460f94c069f6/html5/thumbnails/38.jpg)
Profiling ResultsProfiling Results
Build the program with -D_PROFILING_ flag onCompare the computation results of sw_idct_2d() and hw_idct_2d()Compare thecomputationresults withand without-D_VERBOSE_flag