automated design flow for coarse-grained reconfigurable

87
Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL Multi-Standard Decoder Use-Case XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation SAMOS XIV - 2014 July 14 th - Samos Island (Greece) Endri Bezati, Simone Casale-Brunet, Marco Mattavelli SCI STI MM École Politecnique Fédérale de Lausanne Carlo Sau, Luigi Raffo DIEE Università degli Studi di Cagliari Francesca Palumbo POLCOMING Università degli Studi di Sassari

Upload: others

Post on 25-Apr-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow for Coarse-Grained

Reconfigurable Platforms: an RVC-CAL

Multi-Standard Decoder Use-Case

XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation

SAMOS XIV - 2014 July 14th - Samos Island (Greece)

Endri Bezati, Simone Casale-Brunet, Marco Mattavelli

SCI STI MM

École Politecnique Fédérale de Lausanne

Carlo Sau, Luigi Raffo

DIEE

Università degli Studi di Cagliari

Francesca Palumbo

POLCOMING

Università degli Studi di Sassari

Page 2: Automated Design Flow for Coarse-Grained Reconfigurable

Outline • Introduction

– Problem Statement

– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability

• Step 2: Dataflow Model of Computation and RVC-CAL

– Generation of Multi-Dataflow Graphs

• Automated Design Flow

– Composition: the Multi-Dataflow Composer

– Optimization: TURNUS Co-Exploration Framework

– RTL Generation: Xronos High Level Synthesis

– Tools Integration

• Experimental Results

– An MPEG-4 SP Decoder Use Case

– Synthesis Results

– Performance Results

• Conclusions

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 1

Page 3: Automated Design Flow for Coarse-Grained Reconfigurable

Outline • Introduction

– Problem Statement

– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability

• Step 2: Dataflow Model of Computation and RVC-CAL

– Generation of Multi-Dataflow Graphs

• Automated Design Flow

– Composition: the Multi-Dataflow Composer

– Optimization: TURNUS Co-Exploration Framework

– RTL Generation: Xronos High Level Synthesis

– Tools Integration

• Experimental Results

– An MPEG-4 SP Decoder Use Case

– Synthesis Results

– Performance Results

• Conclusions

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 2

Page 4: Automated Design Flow for Coarse-Grained Reconfigurable

Problem Statement

.

• portable: dimensions and

battery life limits have to be

taken into consideration.

• multimedial: different

applications have to be

executed, very often at the

same time.

• higly efficient: real time

response is needed in many

cases.

• easily evolvable: technology

and algorithms fast evolution

have to be followed by the

devices.

Electronic devices are converging to platforms that are:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3 SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes

Page 5: Automated Design Flow for Coarse-Grained Reconfigurable

Problem Statement

.

• portable: dimensions and

battery life limits have to be

taken into consideration.

• multimedial: different

applications have to be

executed, very often at the

same time.

• higly efficient: real time

response is needed in many

cases.

• easily evolvable: technology

and algorithms fast evolution

have to be followed by the

devices.

Electronic devices are converging to platforms that are:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3

SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes

Page 6: Automated Design Flow for Coarse-Grained Reconfigurable

Problem Statement

.

• portable: dimensions and

battery life limits have to be

taken into consideration.

• multimedial: different

applications have to be

executed, very often at the

same time.

• higly efficient: real time

response is needed in many

cases.

• easily evolvable: technology

and algorithms fast evolution

have to be followed by the

devices.

need of new design flows to address this complex scenario

Electronic devices are converging to platforms that are:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3

SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes

Page 7: Automated Design Flow for Coarse-Grained Reconfigurable

Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau

• Systems are required to be flexible and efficient.

4

Page 8: Automated Design Flow for Coarse-Grained Reconfigurable

Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau

ASIC

GPP

DSP

RP

Performance

Fle

xib

ilit

y

• Systems are required to be flexible and efficient.

• Reconfigurable Paradigm (RP) to hardware design: specialized

computing platforms, capable of changing configuration to serve the

targeted computations.

4

Page 9: Automated Design Flow for Coarse-Grained Reconfigurable

Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau

FINE-

GRAINED

(FG)

COARSE-

GRAINED

(CG)

Bit-level Word-level

Flexibility

Reconf.

Speed

Config.

Storage

ASIC

GPP

DSP

RP

Performance

Fle

xib

ilit

y

• Systems are required to be flexible and efficient.

• Reconfigurable Paradigm (RP) to hardware design: specialized

computing platforms, capable of changing configuration to serve the

targeted computations.

• The more the hardware is specialized, the more it is difficult to

program.

4

Page 10: Automated Design Flow for Coarse-Grained Reconfigurable

A dataflow program is a directed graph of functional units (actors)

exchanging sequences of data (tokens) through dedicated channels.

actions

state

Actors encapsulate their state and communicate exclusively by sending

and receiving tokens. Such an absence of race conditions, along with

the intrinsic modularity of the dataflow graphs, make it possible to

explicit the algorithmic parallelism of the programs.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 5

Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (1)

Page 11: Automated Design Flow for Coarse-Grained Reconfigurable

Standard RVC Model

Decoder

Composition

Mechanism

VTL [2]

(RVC-CAL FUs)

Abstract Decoder

Model

(XDF + RVC-Cal)

Proprietary Decoder

Decoder

Implementation Proprietary

Tool Library

Decoder

Solution

[1] MPEG-B part 4 (2009), [2] MPEG-C part 4 (2010)

CAL[1] = Cal Actor Language, high-level programming language for

describing dataflow actors.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 6

Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (2)

Page 12: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications

A

B

C D

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7

Page 13: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications

1:1

HW platform

A

B

C D

A

B

C D

clock

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7

Page 14: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications

1:1

HW platform

A

B

C D

A

B

C D

clock

A

B

C D

B E D

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7

Page 15: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications

1:1

HW platform

HW CG reconfigurable platform

A

B

C D

A

B

C D

clock

A

B

C D

B E D

A

B

C

D

clock

E

2:1

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7

Page 16: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

Challenges of the modern electronic devices design:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

Page 17: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

resource sharing

CG reconfiguration

dataflow parallelism highlighting

dataflow modularity (VTL)

Challenges of the modern electronic devices design:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

Page 18: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

resource sharing

CG reconfiguration

dataflow parallelism highlighting

dataflow modularity (VTL)

Challenges of the modern electronic devices design:

perfect matching, but…

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

Page 19: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

resource sharing

CG reconfiguration

dataflow parallelism highlighting

dataflow modularity (VTL)

Challenges of the modern electronic devices design:

perfect matching, but… what happens if the design complexity grows?

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

Page 20: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

resource sharing

CG reconfiguration

dataflow parallelism highlighting

dataflow modularity (VTL)

Challenges of the modern electronic devices design:

perfect matching, but… what happens if the design complexity grows?

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

Page 21: Automated Design Flow for Coarse-Grained Reconfigurable

Generation of Multi-Dataflow Graphs (2)

• low area and power (portability)

• flexibility (multimediality)

• performances (high efficiency)

• code reusability (fast evolution)

resource sharing

CG reconfiguration

dataflow parallelism highlighting

dataflow modularity (VTL)

Challenges of the modern electronic devices design:

perfect matching, but… what happens if the design complexity grows?

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8

need of an automated design flow

Page 22: Automated Design Flow for Coarse-Grained Reconfigurable

Outline • Introduction

– Problem Statement

– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability

• Step 2: Dataflow Model of Computation and RVC-CAL

– Generation of Multi-Dataflow Graphs

• Automated Design Flow

– Composition: the Multi-Dataflow Composer

– Optimization: TURNUS Co-Exploration Framework

– RTL Generation: Xronos High Level Synthesis

– Tools Integration

• Experimental Results

– An MPEG-4 SP Decoder Use Case

– Synthesis Results

– Performance Results

• Conclusions

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 9

Page 23: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow Functionalities required by the automated design flow:

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10

Page 24: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow

multi-dataflow network composition

Functionalities required by the automated design flow:

A

B C D

B E D

A

B

C

E

D

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10

Page 25: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow

multi-dataflow network composition

Functionalities required by the automated design flow:

dataflow network buffers optimization

A

B C D

B E D

A

B

C

E

D

A

B

C

E

D

A

B

C

E

D

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10

Page 26: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow

multi-dataflow network composition

Functionalities required by the automated design flow:

dataflow network buffers optimization

multi-dataflow corresponding RTL generation

A

B C D

B E D

A

B

C

E

D

A

B

C

E

D

A

B

C

E

D

A

B

C

E

D

A

B

C

D

clock

E

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10

Page 27: Automated Design Flow for Coarse-Grained Reconfigurable

Open RVC-CAL

Compiler (ORCC) is a

framework able to

generate, from an

RVC-CAL specification,

the corresponding

source code for

different target

platforms (hardware,

software or mixed).

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 28: Automated Design Flow for Coarse-Grained Reconfigurable

Open RVC-CAL

Compiler (ORCC) is a

framework able to

generate, from an

RVC-CAL specification,

the corresponding

source code for

different target

platforms (hardware,

software or mixed).

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

RVC-CAL

specification

CAL

Actors

XDF

Graph

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 29: Automated Design Flow for Coarse-Grained Reconfigurable

Open RVC-CAL

Compiler (ORCC) is a

framework able to

generate, from an

RVC-CAL specification,

the corresponding

source code for

different target

platforms (hardware,

software or mixed).

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

RVC-CAL

specification

Intermediate

Representation

CAL

Actors

XDF

Graph

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 30: Automated Design Flow for Coarse-Grained Reconfigurable

Open RVC-CAL

Compiler (ORCC) is a

framework able to

generate, from an

RVC-CAL specification,

the corresponding

source code for

different target

platforms (hardware,

software or mixed).

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

RVC-CAL

specification

Intermediate

Representation

CAL

Actors

XDF

Graph

Source

Code

1:1

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 31: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 32: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

fro

nt-

en

d

ba

ck-e

nd

MDC

MDC exploits

the ORCC front-

end to generate,

from a set of

RVC-CAL

specifications,

the hardware

description of a

reconfigurable

platform.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 33: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

fro

nt-

en

d

ba

ck-e

nd

MDC

RVC-CAL

specifications

CAL

Actors XDF

Graphs

MDC exploits

the ORCC front-

end to generate,

from a set of

RVC-CAL

specifications,

the hardware

description of a

reconfigurable

platform.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 34: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

fro

nt-

en

d

ba

ck-e

nd

MDC

RVC-CAL

specifications

CAL

Actors XDF

Graphs

Intermediate

Representation

MDC exploits

the ORCC front-

end to generate,

from a set of

RVC-CAL

specifications,

the hardware

description of a

reconfigurable

platform.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 35: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

fro

nt-

en

d

ba

ck-e

nd

Multi-

Dataflow

XDF Graph

MDC

RVC-CAL

specifications

CAL

Actors XDF

Graphs

Intermediate

Representation

MDC exploits

the ORCC front-

end to generate,

from a set of

RVC-CAL

specifications,

the hardware

description of a

reconfigurable

platform.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 36: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

back-ends

C Java LLVM …

ORCC

Composition: the Multi-Dataflow

Composer (1)

fro

nt-

en

d

ba

ck-e

nd

Multi-

Dataflow

XDF Graph

HDL

Components

Library

HW

Reconfigurable

Platform (HDL)

MDC

RVC-CAL

specifications

CAL

Actors XDF

Graphs

Intermediate

Representation

MDC exploits

the ORCC front-

end to generate,

from a set of

RVC-CAL

specifications,

the hardware

description of a

reconfigurable

platform.

n:1

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11

Page 37: Automated Design Flow for Coarse-Grained Reconfigurable

Composition: the Multi-Dataflow

Composer (2)

A

B

C D

B E D

RVC-CAL dataflow specifications

dataflow α

dataflow β

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12

Page 38: Automated Design Flow for Coarse-Grained Reconfigurable

Composition: the Multi-Dataflow

Composer (2)

A

B

C D

B E D

RVC-CAL dataflow specifications

MD

C fro

nt-

en

d

A

B

C

D s0

E

s1

multi-dataflow specification

dataflow α

dataflow β

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12

Page 39: Automated Design Flow for Coarse-Grained Reconfigurable

Composition: the Multi-Dataflow

Composer (2)

HW CG reconfigurable platform

A

B

C D

B E D

A

B

C

D

clock

E

s0 s1

RVC-CAL dataflow specifications

MD

C fro

nt-

en

d

A

B

C

D s0

E

s1

multi-dataflow specification

MDC back-end

config

LUT α β

s1 0 1

s2 0 1

dataflow α

dataflow β

0

1

0

1

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12

Page 40: Automated Design Flow for Coarse-Grained Reconfigurable

TURNUS is a design space

exploration framework for

heterogeneous parallel

systems. It provides high-level

modeling and simulation

methods and tools

for system level performances

estimation and optimization.

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 41: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

It provides the execution

causation traces for a given

dataflow specification in relation

to a particular input stimulus.

The causation traces are

graphs describing the

dependencies among the

actions executed during the

input stimulus processing.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 42: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

RVC-CAL

specification

CAL

Actors

XDF

Graph

back-end

Input

stimulus

It provides the execution

causation traces for a given

dataflow specification in relation

to a particular input stimulus.

The causation traces are

graphs describing the

dependencies among the

actions executed during the

input stimulus processing.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 43: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

RVC-CAL

specification

Simulation

CAL

Actors

XDF

Graph

back-end

Input

stimulus

It provides the execution

causation traces for a given

dataflow specification in relation

to a particular input stimulus.

The causation traces are

graphs describing the

dependencies among the

actions executed during the

input stimulus processing.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 44: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

RVC-CAL

specification

Simulation

CAL

Actors

XDF

Graph

back-end

Input

stimulus

Execution

Causation

Traces

It provides the execution

causation traces for a given

dataflow specification in relation

to a particular input stimulus.

The causation traces are

graphs describing the

dependencies among the

actions executed during the

input stimulus processing.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 45: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

Execution

Causation

Traces

The causation traces post-

processing analyses the

causation traces along with

given action weights (latency of

the actions for a specific target

platform) in order to optimize

the size of the FIFO memories

in the dataflow network through

a Design Space Exploration.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 46: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

Execution

Causation

Traces

Actions

weights

The causation traces post-

processing analyses the

causation traces along with

given action weights (latency of

the actions for a specific target

platform) in order to optimize

the size of the FIFO memories

in the dataflow network through

a Design Space Exploration.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 47: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

Execution

Causation

Traces

Actions

weights

The causation traces post-

processing analyses the

causation traces along with

given action weights (latency of

the actions for a specific target

platform) in order to optimize

the size of the FIFO memories

in the dataflow network through

a Design Space Exploration.

Design

Space

Exploration

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 48: Automated Design Flow for Coarse-Grained Reconfigurable

front-end

TURNUS

Optimization: TURNUS Co-Exploration

Framework

back-end

Execution

Causation

Traces

Actions

weights

The causation traces post-

processing analyses the

causation traces along with

given action weights (latency of

the actions for a specific target

platform) in order to optimize

the size of the FIFO memories

in the dataflow network through

a Design Space Exploration.

Design

Space

Exploration

Optimal

FIFO

Sizes

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13

Page 49: Automated Design Flow for Coarse-Grained Reconfigurable

Xronos is a framework for the

generation of RTL descriptions

from dataflow or sequential

applications.

It supports three basic data

types (int, uint and bool), flow

control statements (branches,

loops, parallel blocks) and

procedural abstractions

(sequences of statements)

called tasks.

front-end

Xronos

RTL Generation: Xronos High

Level Synthesis (1)

back-end

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14

Page 50: Automated Design Flow for Coarse-Grained Reconfigurable

Xronos is a framework for the

generation of RTL descriptions

from dataflow or sequential

applications.

It supports three basic data

types (int, uint and bool), flow

control statements (branches,

loops, parallel blocks) and

procedural abstractions

(sequences of statements)

called tasks.

front-end

Xronos

RTL Generation: Xronos High

Level Synthesis (1)

RVC-CAL

specification

CAL

Actors

XDF

Graph

back-end

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14

Page 51: Automated Design Flow for Coarse-Grained Reconfigurable

Xronos is a framework for the

generation of RTL descriptions

from dataflow or sequential

applications.

It supports three basic data

types (int, uint and bool), flow

control statements (branches,

loops, parallel blocks) and

procedural abstractions

(sequences of statements)

called tasks.

front-end

Xronos

RTL Generation: Xronos High

Level Synthesis (1)

RVC-CAL

specification

Intermediate

Representation

CAL

Actors

XDF

Graph

back-end

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14

Page 52: Automated Design Flow for Coarse-Grained Reconfigurable

Xronos is a framework for the

generation of RTL descriptions

from dataflow or sequential

applications.

It supports three basic data

types (int, uint and bool), flow

control statements (branches,

loops, parallel blocks) and

procedural abstractions

(sequences of statements)

called tasks.

front-end

Xronos

RTL Generation: Xronos High

Level Synthesis (1)

RVC-CAL

specification

Intermediate

Representation

CAL

Actors

XDF

Graph

Actors

.v

1:1

back-end

Top

.vhd Test

.v

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14

Page 53: Automated Design Flow for Coarse-Grained Reconfigurable

RTL Generation: Xronos High

Level Synthesis (2)

A

B

C D

A

C D

B clock

RVC-CAL dataflow specification HW platform

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 15

Page 54: Automated Design Flow for Coarse-Grained Reconfigurable

RTL Generation: Xronos High

Level Synthesis (2)

A

B

C D

A

C D

B clock

RVC-CAL dataflow specification HW platform

Actor_X queue_Actor_X_in1 in_DATA in_SEND in_ACK in_RDY in_COUNT

out_DATA out_SEND

out_ACK out_COUNT

in1_DATA in1_SEND in1_ACK in1_COUNT

out1_DATA out1_SEND

out1_ACK

out1_COUNT out1_RDY

fanout_Actor_X_out1 in_DATA in_SEND in_ACK in_RDY in_COUNT

fanout_Actor_X_outM in_DATA in_SEND in_ACK in_RDY in_COUNT

out_DATA out_SEND

out_ACK

out_COUNT out_RDY

out_DATA out_SEND

out_ACK

out_COUNT out_RDY

outM_DATA outM_SEND

outM_ACK

outM_COUNT outtM_RDY

inN_DATA inN_SEND inN_ACK inN_COUNT

queue_Actor_X_inN in_DATA in_SEND in_ACK in_RDY in_COUNT

out_DATA out_SEND

out_ACK out_COUNT

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 15

Page 55: Automated Design Flow for Coarse-Grained Reconfigurable

START: MDC compeses the multi-dataflow

network

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

MDC

RVC-CAL

specifications

IR XDF multi

dataflow

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 56: Automated Design Flow for Coarse-Grained Reconfigurable

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

MDC

RVC-CAL

specifications

IR XDF multi

dataflow

TURNUS

Co-

Exploration

Framework

STEP 1: TURNUS generates the execution

causation traces

TRACES

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 57: Automated Design Flow for Coarse-Grained Reconfigurable

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

MDC

RVC-CAL

specifications

IR

XRONOS

High Level

Synthesis

XDF multi

dataflow

WEIGHTS TURNUS

Co-

Exploration

Framework

TRACES

STEP 2: Xronos generates the actions weights

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 58: Automated Design Flow for Coarse-Grained Reconfigurable

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

MDC

RVC-CAL

specifications

IR

XRONOS

High Level

Synthesis

XDF multi

dataflow

WEIGHTS

FIFO

SIZES

TURNUS

Co-

Exploration

Framework

TRACES

STEP 3: TURNUS generates the FIFO optimal

sizes

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 59: Automated Design Flow for Coarse-Grained Reconfigurable

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

HDL

Components

Library

MDC

RVC-CAL

specifications

IR

XRONOS

High Level

Synthesis

XDF multi

dataflow

WEIGHTS

FIFO

SIZES

TURNUS

Co-

Exploration

Framework

HW CG

Reconfigurable

Platform

(with SBox

queues/fanouts) TRACES

STEP 4: Xronos generates the RVC compliant

reconfigurable hardware platform

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 60: Automated Design Flow for Coarse-Grained Reconfigurable

Tools Integration

front-end

back-ends

ORCC

CAL

XDF

front-

end

back-e

nd

HDL

Components

Library

opt HW CG

Reconfigurable

Platform

(without Sbox

queues/fanouts)

MDC

RVC-CAL

specifications

IR

XRONOS

High Level

Synthesis

XDF multi

dataflow

WEIGHTS

FIFO

SIZES

TURNUS

Co-

Exploration

Framework

HW CG

Reconfigurable

Platform

(with SBox

queues/fanouts) TRACES

STEP 5: MDC generates the optimized

reconfigurable hardware platform

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16

Page 61: Automated Design Flow for Coarse-Grained Reconfigurable

Outline • Introduction

– Problem Statement

– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability

• Step 2: Dataflow Model of Computation and RVC-CAL

– Generation of Multi-Dataflow Graphs

• Automated Design Flow

– Composition: the Multi-Dataflow Composer

– Optimization: TURNUS Co-Exploration Framework

– RTL Generation: Xronos High Level Synthesis

– Tools Integration

• Experimental Results

– An MPEG-4 SP Decoder Use Case

– Synthesis Results

– Performance Results

• Conclusions

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 17

Page 62: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (1)

PARSER

input

stream

TEX Y

TEX U

TEX V

MOT Y

MOT U

MOT V

MERGER

420

video

output

intra

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18

Page 63: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (1)

PARSER

input

stream

TEX Y

TEX U

TEX V

MOT Y

MOT U

MOT V

MERGER

420

video

output

intra PARSER:

• syntactical parser

• variable length coding

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18

Page 64: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (1)

PARSER

input

stream

TEX Y

TEX U

TEX V

MOT Y

MOT U

MOT V

MERGER

420

video

output

intra

TEX (residual decoding):

• AC-DC prediction

• inverse scan

• inverse quantization

• IDCT

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18

Page 65: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (1)

PARSER

input

stream

TEX Y

TEX U

TEX V

MOT Y

MOT U

MOT V

MERGER

420

video

output

intra

MOT (motion compensation):

• framebuffer

• interpolation

• residual error addition

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18

Page 66: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (1)

PARSER

input

stream

TEX Y

TEX U

TEX V

MOT Y

MOT U

MOT V

MERGER

420

video

output

intra

MERGER 420:

• 420 decoded

components merging

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18

Page 67: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (2)

design

number of actors

ordinary* SBox not

shared** shared** overall

intra 32 0 32 0 32

full 38 0 38 0 38

parallel 70 0 70 0 70

reconf 45 45 20 25 90

opt_reconf 45 45 20 25 90

* computational actors (not SBoxes).

** referred only to the computational actors (not to SBoxes).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19

Different designs composition in terms of functionality and role of the

involved actors.

Page 68: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (2)

design

number of actors

ordinary* SBox not

shared** shared** overall

intra 32 0 32 0 32

full 38 0 38 0 38

parallel 70 0 70 0 70

reconf 45 45 20 25 90

opt_reconf 45 45 20 25 90

* computational actors (not SBoxes).

** referred only to the computational actors (not to SBoxes).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19

Different designs composition in terms of functionality and role of the

involved actors.

Page 69: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (2)

design

number of actors

ordinary* SBox not

shared** shared** overall

intra 32 0 32 0 32

full 38 0 38 0 38

parallel 70 0 70 0 70

reconf 45 45 20 25 90

opt_reconf 45 45 20 25 90

* computational actors (not SBoxes).

** referred only to the computational actors (not to SBoxes).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19

Different designs composition in terms of functionality and role of the

involved actors.

Page 70: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (2)

design

number of actors

ordinary* SBox not

shared** shared** overall

intra 32 0 32 0 32

full 38 0 38 0 38

parallel 70 0 70 0 70

reconf 45 45 20 25 90

opt_reconf 45 45 20 25 90

* computational actors (not SBoxes).

** referred only to the computational actors (not to SBoxes).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19

Different designs composition in terms of functionality and role of the

involved actors.

Page 71: Automated Design Flow for Coarse-Grained Reconfigurable

An MPEG-4 SP Decoder Use Case (2)

design

number of actors

ordinary* SBox not

shared** shared** overall

intra 32 0 32 0 32

full 38 0 38 0 38

parallel 70 0 70 0 70

reconf 45 45 20 25 90

opt_reconf 45 45 20 25 90

* computational actors (not SBoxes).

** referred only to the computational actors (not to SBoxes).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19

Different designs composition in terms of functionality and role of the

involved actors.

Page 72: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (1)

resource design

parallel reconf Δ% opt_reconf Δ%

Slices 22495 18447 -19 16383 -27

Slice Regs 27567 23034 -16 23674 -14

Slice LUTs 67445 52836 -22 48545 -28

LUT-FF pairs 71011 55946 -21 52518 -26

BRAMs 148 154 +4 112 -24

DSPs 36 18 -50 18 -50

Max freq

[MHz] 25,43 25,22 -1 25,38 0

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the Xilinx Synthesis Technology tool targeting a

Xilinx Virtex 5 330 FPGA board.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20

Page 73: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (1)

resource design

parallel reconf Δ% opt_reconf Δ%

Slices 22495 18447 -19 16383 -27

Slice Regs 27567 23034 -16 23674 -14

Slice LUTs 67445 52836 -22 48545 -28

LUT-FF pairs 71011 55946 -21 52518 -26

BRAMs 148 154 +4 112 -24

DSPs 36 18 -50 18 -50

Max freq

[MHz] 25,43 25,22 -1 25,38 0

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the Xilinx Synthesis Technology tool targeting a

Xilinx Virtex 5 330 FPGA board.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20

Page 74: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (1)

resource design

parallel reconf Δ% opt_reconf Δ%

Slices 22495 18447 -19 16383 -27

Slice Regs 27567 23034 -16 23674 -14

Slice LUTs 67445 52836 -22 48545 -28

LUT-FF pairs 71011 55946 -21 52518 -26

BRAMs 148 154 +4 112 -24

DSPs 36 18 -50 18 -50

Max freq

[MHz] 25,43 25,22 -1 25,38 0

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the Xilinx Synthesis Technology tool targeting a

Xilinx Virtex 5 330 FPGA board.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20

Page 75: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (1)

resource design

parallel reconf Δ% opt_reconf Δ%

Slices 22495 18447 -19 16383 -27

Slice Regs 27567 23034 -16 23674 -14

Slice LUTs 67445 52836 -22 48545 -28

LUT-FF pairs 71011 55946 -21 52518 -26

BRAMs 148 154 +4 112 -24

DSPs 36 18 -50 18 -50

Max freq

[MHz] 25,43 25,22 -1 25,38 0

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the Xilinx Synthesis Technology tool targeting a

Xilinx Virtex 5 330 FPGA board.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20

Page 76: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (2)

resource

power

design

parallel reconf Δ% opt_reconf Δ%

Clock 0,382 0,347 -9 0,323 -15

Logic 0,045 0,025 -44 0,024 -47

Signals 0,050 0,031 -38 0,031 -38

BRAMs 0,090 0,090 0 0,059 -34

TOT 0,567 0,493 -13 0,437 -23

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex

5 330 FPGA board for a QCIF video sequence decoding.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21

Page 77: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (2)

resource

power

design

parallel reconf Δ% opt_reconf Δ%

Clock 0,382 0,347 -9 0,323 -15

Logic 0,045 0,025 -44 0,024 -47

Signals 0,050 0,031 -38 0,031 -38

BRAMs 0,090 0,090 0 0,059 -34

TOT 0,567 0,493 -13 0,437 -23

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex

5 330 FPGA board for a QCIF video sequence decoding.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21

Page 78: Automated Design Flow for Coarse-Grained Reconfigurable

Synthesis Results (2)

resource

power

design

parallel reconf Δ% opt_reconf Δ%

Clock 0,382 0,347 -9 0,323 -15

Logic 0,045 0,025 -44 0,024 -47

Signals 0,050 0,031 -38 0,031 -38

BRAMs 0,090 0,090 0 0,059 -34

TOT 0,567 0,493 -13 0,437 -23

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex

5 330 FPGA board for a QCIF video sequence decoding.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21

Page 79: Automated Design Flow for Coarse-Grained Reconfigurable

Performance Results

video

sequence

resolution

design

parallel reconf Δ% opt_reconf Δ%

QCIF 110 105 -5 105 -5

CIF 28 26 -6 26 -6

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results are given in frames per second (fps) and are obtained as the

average of fps for the intra and the full decoder configurations.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 22

Page 80: Automated Design Flow for Coarse-Grained Reconfigurable

Performance Results

video

sequence

resolution

design

parallel reconf Δ% opt_reconf Δ%

QCIF 110 105 -5 105 -5

CIF 28 26 -6 26 -6

Δ% percentage increment/reduction between the parallel and the reconfigurable designs.

Results are given in frames per second (fps) and are obtained as the

average of fps for the intra and the full decoder configurations.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 22

Page 81: Automated Design Flow for Coarse-Grained Reconfigurable

Outline • Introduction

– Problem Statement

– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability

• Step 2: Dataflow Model of Computation and RVC-CAL

– Generation of Multi-Dataflow Graphs

• Automated Design Flow

– Composition: the Multi-Dataflow Composer

– Optimization: TURNUS Co-Exploration Framework

– RTL Generation: Xronos High Level Synthesis

– Tools Integration

• Experimental Results

– An MPEG-4 SP Decoder Use Case

– Synthesis Results

– Performance Results

• Conclusions

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 23

Page 82: Automated Design Flow for Coarse-Grained Reconfigurable

Conclusions • Challenge of modern electronic systems development: coupling

portability, flexibility and efficiency with the minimum design effort.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24

Page 83: Automated Design Flow for Coarse-Grained Reconfigurable

Conclusions • Challenge of modern electronic systems development: coupling

portability, flexibility and efficiency with the minimum design effort.

• A solution is possible by exploiting the dataflow programming

paradigm along with the coarse-grained reconfigurability.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24

Page 84: Automated Design Flow for Coarse-Grained Reconfigurable

Conclusions • Challenge of modern electronic systems development: coupling

portability, flexibility and efficiency with the minimum design effort.

• A solution is possible by exploiting the dataflow programming

paradigm along with the coarse-grained reconfigurability.

• An automated design flow has been assembled, by the integration of

different RVC-CAL tools (MDC, TURNUS, Xronos).

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24

Page 85: Automated Design Flow for Coarse-Grained Reconfigurable

Conclusions • Challenge of modern electronic systems development: coupling

portability, flexibility and efficiency with the minimum design effort.

• A solution is possible by exploiting the dataflow programming

paradigm along with the coarse-grained reconfigurability.

• An automated design flow has been assembled, by the integration of

different RVC-CAL tools (MDC, TURNUS, Xronos).

• The proposed design flow has been validated on a real use case

involving two configurations of the MPEG-4 Simple Profile decoder.

• Results highlighted the effectiveness of the approach by achieving

more than 25% of saving in terms of resource utilization and more

than 20% of saving in terms power consumption.

• The generated reconfigurable designs present a very small

performance penalty (5-6%) with respect to the original decoder

designs.

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24

Page 86: Automated Design Flow for Coarse-Grained Reconfigurable

Acknowledgements

SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 25

The research leading to these results has received funding from:

• the Region of Sardinia L.R.7/2007

under grant agreement CRP-18324

[RPCT Project].

• the Region of Sardinia, Young

Researchers Grant, POR Sardegna FSE

2007-2013, L.R.7/2007 “Promotion of the

scientific research and technological

innovation in Sardinia”

Page 87: Automated Design Flow for Coarse-Grained Reconfigurable

Automated Design Flow for Coarse-Grained

Reconfigurable Platforms: an RVC-CAL

Multi-Standard Decoder Use-Case

XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation

SAMOS XIV - 2014 July 14th - Samos Island (Greece)

Carlo Sau

DIEE

Università degli Studi di Cagliari

[email protected]