iecon slides
TRANSCRIPT
Address generation unit for multimedia applications
on application specific instruction set processors
Marc MorenoBerengue, Guillermo Talavera Velilla, Aitor RodriguezAlsina, Jordi Carrabina
Universitat Autònoma de Barcelona (Spain)
IECON 20107–10 November – Phoenix, AZ, USA
Motivation
➢ Design a custom Address Generation Unit (AGU)➢ Connected to an ASIP datapath
➢ Benefits of custom AGU design➢ Previous software optimizations.➢ Multimedia applications
2
Structure➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
3
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
Multimedia applications features
5
➢ Multimedia applications
➢ Complex index manipulation➢ Large number of data access
➢ Require
➢ High performance ➢ Low energy consumption
It is crucial reduce these data accesses and related address computations in an effective way
SW optimizations
6
Data Transfer and Storage Exploration (DTSE)* methodology has oriented to:
➢ Reduce data transfers between memories and processor
➢ Improve the energy efficiency
➢ Reduce the execution time
SW transformations create high overhead in the address generation and control flow
*Methodology developed at IMEC research center
SW optimizations...
for (x=1; x<=N-2; ++x)
for (y=1; y<=N-2; ++y)
for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];
A[x][y] /=tot;
}
...
...
for (y=0; y<=M+2; ++y){
for (x=0; x<=N+2; ++x) {
if (x>=0&&x<N &&y>=1&&y<=M-2)
D[x%3] = B[(y*N+x)%8704+
(y*N+x)/8704*16384+7680] ;
if (x-1>=1&&x-1<=N-2
&&y>=1&&y<=M-2) {
for (k=-1; k<=1; ++k)
acc += D[(x-1+k)%3]*C[abs(k)]; }
acc /= tot;}
}
...
7
SW optimizations...
for (x=1; x<=N-2; ++x)
for (y=1; y<=N-2; ++y)
for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];
A[x][y] /=tot;
}
...
...
for (y=0; y<=M+2; ++y){
for (x=0; x<=N+2; ++x) {
if (x>=0&&x<N &&y>=1&&y<=M-2)
D[x%3] = B[(y*N+x)%8704+
(y*N+x)/8704*16384+7680] ;
if (x-1>=1&&x-1<=N-2
&&y>=1&&y<=M-2) {
for (k=-1; k<=1; ++k)
acc += D[(x-1+k)%3]*C[abs(k)]; }
acc /= tot;}
}
...
Need to be optimized
8
Address Generation UnitThe Address Generation Unit (AGU) is a coprocessor which use the address equation (AE) to generate the address sequence (AS).
&X[AE]=AS
Example:
B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]
AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680
AS = 7680,7681,7682,7683, ...
9
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
Application specific instruction set processor
Application specific instruction set processor (ASIP)
➢ Extend its instruction set➢ Fast interface for read/write data from/to specific
hardware➢ 1 Instruction➢ 1 Cycle
11
AGU design
➢ AGU attached to the ASIP datapath save execution time
● 1 instruction● 1 cycle
12
AGU skeletonThe AGU has one control unit, one process unit and one FIFO
13
Custom Instruction interface
Read AS values
AS generation
Change AE values
CI unit
CO unit
AGU skeletonThe AGU has one control unit, one process unit and one FIFO
14
➢ CI (custom instruction) unit• AE configuration & read FIFO
Custom Instruction interface
Read AS values
AS generation
Change AE values
CI unit
CO unit
AGU skeletonThe AGU has one control unit, one process unit and one FIFO
15
➢ CI (custom instruction) unit• AE configuration & read FIFO
➢ CO (coprocessador) unit• Calculate the AE to generate the
AS and store all values in the FIFO
Custom Instruction interface
Read AS values
AS generation
Change AE values
CI unit
CO unit
AGU Creator
16
Web based application
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
Work Flow
18
Work Flow
int A[70],B[70],C=0;
...
for (i=7; i<70; i++)
{
B[i]=A[i-7]+B[i-7];
A[i]=i;
C+=B[i];
}
...
int A[7],B[7],C=0;
...
for (i=7; i<70; i++)
{
B[i%7]=A[(i-7)%7]
+B[(i-7)%7];
A[i%7]=i;
C+=B[i%7];
}
...
19
int A[7],B[7],C=0,ix,x;
initAGU(); initAGU2();
...
for (i=7; i<70; i++)
{
x=readAGU();
ix=readAGU2();
B[x]=A[ix]+B[ix];
A[x]=i;
C+=B[x];
}
...
Init.c Opt.c CI_code.c
SW Opt.(DTSE)
AGUs
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
Test environment ➢ NIOS II softcore processor (Altera)
● 32 bits RISC processor● Harvard memory architecture● Data/Instructions cache ● 256 Custom Instructions (Fast datapath interface)
➢ Cyclone II EP2C35 Altera FPGA
21
Test Applications
➢ Cavity Detector
Medical imaging application to detect cavities on tomography scans
➢ Quadtree Structured Difference Pulse Code Modulation (QSDPCM)
An interframe compression technique for video imaging.
22
Speedup
23
Init HW AGU inclusion
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Speedup ( Cavity )
Init HW AGU inclusion
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Speedup ( QSDPCM )
Speedup: 1.26 Speedup: 1.19
DTSE AGU inclusion DTSE AGU inclusion
Energy improvements
24
Init HW AGU inclusion
0
0.2
0.4
0.6
0.8
1
Energy ( Cavity )
Init HW AGU inclusion
0
0.2
0.4
0.6
0.8
1
Energy ( QSDPCM )
Energy reduction: 27% Energy reduction: 21%
DTSE AGU inclusion DTSE AGU inclusion
Area penalties
25
Cavity (LEs) QSPCM (LEs)
NIOS-F 2644 2644
NIOS-F +AGU 3596 3592
The AGU inclusion in the NIOS II architecture use2.9% of total FPGA resources (33216LEs)
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions
Conclusions➢ Extend an ASIP by AGUs is an efficient way to meet the
performance/energy requirements of multimedia applications after some SW optimizations
➢ The innovation of connecting the AGU in the processor datapath and working in parallel with the main processor allow calculate a wide range of values before the processor needs them
➢ Use an AGU skeleton and a wizard decrease the design and implementation time.
27
Future Work➢ Improve the AGU wizard in order to:
● Detect automatically AEs and show relevant informations about each AE for a given C file.
● Generate the appropriate AGU for a specific set of AEs
● Generate AGUs for more than one ASIP
➢ Extend the set of applications have been used in this work
28
Thank you!!
Questions?