iecon slides

29
Address generation unit for multimedia applications on application specific instruction set processors  Marc Moreno-Berengue,  Guillermo Talavera Velilla, Aitor Rodriguez-Alsina,  Jordi Carrabina Universitat Autònoma de Barcelona (Spain) IECON 2010 7–10 November – Phoenix, AZ, USA

Upload: marc-moreno-berengue

Post on 21-Jul-2015

172 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Iecon slides

Address generation unit for multimedia applications

on application specific instruction set processors

 Marc Moreno­Berengue,  Guillermo Talavera Velilla, Aitor Rodriguez­Alsina,  Jordi Carrabina

Universitat Autònoma de Barcelona (Spain)

IECON 20107–10 November – Phoenix, AZ, USA

Page 2: Iecon slides

Motivation

➢ Design a custom Address Generation Unit (AGU)➢ Connected to an ASIP data­path

➢ Benefits of custom AGU design➢ Previous software optimizations.➢ Multimedia applications

2

Page 3: Iecon slides

Structure➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

3

Page 4: Iecon slides

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Page 5: Iecon slides

Multimedia applications features

5

➢ Multimedia applications

➢ Complex index manipulation➢ Large number  of data access

➢ Require

➢ High performance ➢ Low energy consumption

It is crucial reduce these data accesses and related address computations in an effective way

Page 6: Iecon slides

SW optimizations

6

Data Transfer and Storage Exploration (DTSE)* methodology has oriented to:

➢ Reduce data transfers between memories and processor

➢ Improve the energy efficiency

➢ Reduce the execution time

SW transformations create high overhead in the address generation and control flow

*Methodology developed at IMEC research center

Page 7: Iecon slides

SW optimizations...

for (x=1; x<=N-2; ++x)

for (y=1; y<=N-2; ++y)

for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];

A[x][y] /=tot;

}

...

...

for (y=0; y<=M+2; ++y){

for (x=0; x<=N+2; ++x) {

if (x>=0&&x<N &&y>=1&&y<=M-2)

D[x%3] = B[(y*N+x)%8704+

(y*N+x)/8704*16384+7680] ;

if (x-1>=1&&x-1<=N-2

&&y>=1&&y<=M-2) {

for (k=-1; k<=1; ++k)

acc += D[(x-1+k)%3]*C[abs(k)]; }

acc /= tot;}

}

...

7

Page 8: Iecon slides

SW optimizations...

for (x=1; x<=N-2; ++x)

for (y=1; y<=N-2; ++y)

for (k=-1; k<=1; ++k){ A[x][y] += B[x+k][y] *C[abs(k)];

A[x][y] /=tot;

}

...

...

for (y=0; y<=M+2; ++y){

for (x=0; x<=N+2; ++x) {

if (x>=0&&x<N &&y>=1&&y<=M-2)

D[x%3] = B[(y*N+x)%8704+

(y*N+x)/8704*16384+7680] ;

if (x-1>=1&&x-1<=N-2

&&y>=1&&y<=M-2) {

for (k=-1; k<=1; ++k)

acc += D[(x-1+k)%3]*C[abs(k)]; }

acc /= tot;}

}

...

Need to be optimized

8

Page 9: Iecon slides

Address Generation UnitThe Address Generation Unit (AGU) is a co­processor which use the address equation (AE) to generate the address sequence (AS).

&X[AE]=AS 

Example:

B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]

AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680

   AS = 7680,7681,7682,7683, ...

9

Page 10: Iecon slides

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Page 11: Iecon slides

Application specific instruction set processor

Application specific instruction set processor (ASIP) 

➢ Extend its instruction set➢ Fast interface for read/write data from/to specific 

hardware➢ 1 Instruction➢ 1 Cycle

11

Page 12: Iecon slides

AGU design

➢ AGU attached to the ASIP data­path save execution time

● 1 instruction● 1 cycle

12

Page 13: Iecon slides

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

13

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

Page 14: Iecon slides

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

14

➢ CI (custom instruction) unit• AE configuration & read FIFO

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

Page 15: Iecon slides

AGU skeletonThe AGU has one control unit, one process unit and one FIFO

15

➢ CI (custom instruction) unit• AE configuration & read FIFO

➢  CO (co­processador) unit• Calculate the AE to generate the 

AS  and store all values in the FIFO

Custom Instruction interface

Read AS values

AS generation

Change AE values

CI unit

CO unit

Page 16: Iecon slides

AGU Creator

16

Web based application

Page 17: Iecon slides

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Page 18: Iecon slides

Work Flow

18

Page 19: Iecon slides

Work Flow

int A[70],B[70],C=0;

...

for (i=7; i<70; i++)

{

B[i]=A[i-7]+B[i-7];

A[i]=i;

C+=B[i];

}

...

int A[7],B[7],C=0;

...

for (i=7; i<70; i++)

{

B[i%7]=A[(i-7)%7]

+B[(i-7)%7];

A[i%7]=i;

C+=B[i%7];

}

...

19

int A[7],B[7],C=0,ix,x;

initAGU(); initAGU2();

...

for (i=7; i<70; i++)

{

x=readAGU();

ix=readAGU2();

B[x]=A[ix]+B[ix];

A[x]=i;

C+=B[x];

}

...

Init.c Opt.c CI_code.c

SW Opt.(DTSE)

AGUs

Page 20: Iecon slides

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Page 21: Iecon slides

Test environment ➢ NIOS II soft­core processor (Altera)

● 32 bits RISC processor● Harvard memory architecture● Data/Instructions cache ● 256 Custom Instructions (Fast data­path interface)

➢ Cyclone II EP2C35 Altera FPGA

21

Page 22: Iecon slides

Test Applications

➢ Cavity Detector

Medical imaging application to detect cavities on tomography scans

➢ Quad­tree Structured Difference Pulse Code Modulation (QSDPCM)

An inter­frame compression technique for video imaging.

22

Page 23: Iecon slides

Speedup

23

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Speedup ( Cavity )

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Speedup ( QSDPCM )

Speedup: 1.26 Speedup: 1.19

DTSE AGU inclusion DTSE AGU inclusion

Page 24: Iecon slides

Energy improvements 

24

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

Energy ( Cavity )

Init HW AGU inclusion

0

0.2

0.4

0.6

0.8

1

Energy ( QSDPCM )

Energy reduction: 27% Energy reduction: 21%

DTSE AGU inclusion DTSE AGU inclusion

Page 25: Iecon slides

Area penalties

25

Cavity (LEs) QSPCM (LEs)

NIOS-F 2644 2644

NIOS-F +AGU 3596 3592

The AGU inclusion in the NIOS II architecture use2.9% of total FPGA resources (33216LEs)

Page 26: Iecon slides

➢ Introduction

➢ Design

➢ Work Flow

➢ Results

➢ Conclusions

Page 27: Iecon slides

Conclusions➢ Extend an ASIP by AGUs is an efficient way to meet the 

performance/energy requirements of multimedia applications after some SW optimizations

➢ The innovation of connecting the AGU in the processor data­path and working in parallel with the main processor allow calculate a wide range of values before the processor needs them

➢ Use an AGU skeleton and a wizard decrease the design and implementation time.

27

Page 28: Iecon slides

Future Work➢ Improve the AGU wizard in order to:

● Detect automatically AEs  and show relevant informations about each AE for a given C file.

● Generate the appropriate AGU for a specific set of AEs

● Generate AGUs for more than one ASIP

➢ Extend the set of applications have been used in this work

28

Page 29: Iecon slides

Thank you!!

Questions?