iecon slides

Post on 21-Jul-2015

173 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Address generation unit for multimedia applications

on application specific instruction set processors

 Marc Moreno­Berengue,  Guillermo Talavera Velilla, Aitor Rodriguez­Alsina,  Jordi Carrabina

Universitat Autònoma de Barcelona (Spain)

IECON 20107–10 November – Phoenix, AZ, USA

Motivation

➢ Design a custom Address Generation Unit (AGU)➢ Connected to an ASIP data­path

➢ Benefits of custom AGU design➢ Previous software optimizations.➢ Multimedia applications

2

Structure➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

3

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Multimedia applications features

5

➢ Multimedia applications

➢ Complex index manipulation➢ Large number  of data access

➢ Require

➢ High performance ➢ Low energy consumption

It is crucial reduce these data accesses and related address computations in an effective way

SW optimizations

6

Data Transfer and Storage Exploration (DTSE)* methodology has oriented to:

➢ Reduce data transfers between memories and processor

➢ Improve the energy efficiency

➢ Reduce the execution time

SW transformations create high overhead in the address generation and control flow

*Methodology developed at IMEC research center

SW optimizations...

for (x=1; x<=N-2; ++x)

for (y=1; y<=N-2; ++y)

for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];

A[x][y] /=tot;

}

...

...

for (y=0; y<=M+2; ++y){

for (x=0; x<=N+2; ++x) {

if (x>=0&&x<N &&y>=1&&y<=M-2)

D[x%3] = B[(y*N+x)%8704+

(y*N+x)/8704*16384+7680] ;

if (x-1>=1&&x-1<=N-2

&&y>=1&&y<=M-2) {

for (k=-1; k<=1; ++k)

acc += D[(x-1+k)%3]*C[abs(k)]; }

acc /= tot;}

}

...

7

SW optimizations...

for (x=1; x<=N-2; ++x)

for (y=1; y<=N-2; ++y)

for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];

A[x][y] /=tot;

}

...

...

for (y=0; y<=M+2; ++y){

for (x=0; x<=N+2; ++x) {

if (x>=0&&x<N &&y>=1&&y<=M-2)

D[x%3] = B[(y*N+x)%8704+

(y*N+x)/8704*16384+7680] ;

if (x-1>=1&&x-1<=N-2

&&y>=1&&y<=M-2) {

for (k=-1; k<=1; ++k)

acc += D[(x-1+k)%3]*C[abs(k)]; }

acc /= tot;}

}

...

Need to be optimized

8

Address Generation UnitThe Address Generation Unit (AGU) is a co­processor which use the address equation (AE) to generate the address sequence (AS).

&X[AE]=AS 

Example:

B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]

AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680

   AS = 7680,7681,7682,7683, ...

9

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Application specific instruction set processor

Application specific instruction set processor (ASIP) 

➢ Extend its instruction set➢ Fast interface for read/write data from/to specific 

hardware➢ 1 Instruction➢ 1 Cycle

11

AGU design

➢ AGU attached to the ASIP data­path save execution time

● 1 instruction● 1 cycle

12

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

13

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

14

➢ CI (custom instruction) unit• AE configuration & read FIFO

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

15

➢ CI (custom instruction) unit• AE configuration & read FIFO

➢  CO (co­processador) unit• Calculate the AE to generate the 

AS  and store all values in the FIFO

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

AGU Creator

16

Web based application

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Work Flow

18

Work Flow

int A[70],B[70],C=0;

...

for (i=7; i<70; i++)

{

B[i]=A[i-7]+B[i-7];

A[i]=i;

C+=B[i];

}

...

int A[7],B[7],C=0;

...

for (i=7; i<70; i++)

{

B[i%7]=A[(i-7)%7]

+B[(i-7)%7];

A[i%7]=i;

C+=B[i%7];

}

...

19

int A[7],B[7],C=0,ix,x;

initAGU(); initAGU2();

...

for (i=7; i<70; i++)

{

x=readAGU();

ix=readAGU2();

B[x]=A[ix]+B[ix];

A[x]=i;

C+=B[x];

}

...

Init.c Opt.c CI_code.c

SW Opt.(DTSE)

AGUs

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Test environment ➢ NIOS II soft­core processor (Altera)

● 32 bits RISC processor● Harvard memory architecture● Data/Instructions cache ● 256 Custom Instructions (Fast data­path interface)

➢ Cyclone II EP2C35 Altera FPGA

21

Test Applications

➢ Cavity Detector

Medical imaging application to detect cavities on tomography scans

➢ Quad­tree Structured Difference Pulse Code Modulation (QSDPCM)

An inter­frame compression technique for video imaging.

22

Speedup

23

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Speedup ( Cavity )

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Speedup ( QSDPCM )

Speedup: 1.26 Speedup: 1.19

DTSE AGU inclusion DTSE AGU inclusion

Energy improvements 

24

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

Energy ( Cavity )

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

Energy ( QSDPCM )

Energy reduction: 27% Energy reduction: 21%

DTSE AGU inclusion DTSE AGU inclusion

Area penalties

25

Cavity (LEs) QSPCM (LEs)

NIOS-F 2644 2644

NIOS-F +AGU 3596 3592

The AGU inclusion in the NIOS II architecture use2.9% of total FPGA resources (33216LEs)

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Conclusions➢ Extend an ASIP by AGUs is an efficient way to meet the 

performance/energy requirements of multimedia applications after some SW optimizations

➢ The innovation of connecting the AGU in the processor data­path and working in parallel with the main processor allow calculate a wide range of values before the processor needs them

➢ Use an AGU skeleton and a wizard decrease the design and implementation time.

27

Future Work➢ Improve the AGU wizard in order to:

● Detect automatically AEs  and show relevant informations about each AE for a given C file.

● Generate the appropriate AGU for a specific set of AEs

● Generate AGUs for more than one ASIP

➢ Extend the set of applications have been used in this work

28

Thank you!!

Questions?

top related