automated design flow for coarse-grained reconfigurable
TRANSCRIPT
Automated Design Flow for Coarse-Grained
Reconfigurable Platforms: an RVC-CAL
Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation
SAMOS XIV - 2014 July 14th - Samos Island (Greece)
Endri Bezati, Simone Casale-Brunet, Marco Mattavelli
SCI STI MM
École Politecnique Fédérale de Lausanne
Carlo Sau, Luigi Raffo
DIEE
Università degli Studi di Cagliari
Francesca Palumbo
POLCOMING
Università degli Studi di Sassari
Outline • Introduction
– Problem Statement
– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability
• Step 2: Dataflow Model of Computation and RVC-CAL
– Generation of Multi-Dataflow Graphs
• Automated Design Flow
– Composition: the Multi-Dataflow Composer
– Optimization: TURNUS Co-Exploration Framework
– RTL Generation: Xronos High Level Synthesis
– Tools Integration
• Experimental Results
– An MPEG-4 SP Decoder Use Case
– Synthesis Results
– Performance Results
• Conclusions
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 1
Outline • Introduction
– Problem Statement
– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability
• Step 2: Dataflow Model of Computation and RVC-CAL
– Generation of Multi-Dataflow Graphs
• Automated Design Flow
– Composition: the Multi-Dataflow Composer
– Optimization: TURNUS Co-Exploration Framework
– RTL Generation: Xronos High Level Synthesis
– Tools Integration
• Experimental Results
– An MPEG-4 SP Decoder Use Case
– Synthesis Results
– Performance Results
• Conclusions
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 2
Problem Statement
.
• portable: dimensions and
battery life limits have to be
taken into consideration.
• multimedial: different
applications have to be
executed, very often at the
same time.
• higly efficient: real time
response is needed in many
cases.
• easily evolvable: technology
and algorithms fast evolution
have to be followed by the
devices.
Electronic devices are converging to platforms that are:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3 SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes
Problem Statement
.
• portable: dimensions and
battery life limits have to be
taken into consideration.
• multimedial: different
applications have to be
executed, very often at the
same time.
• higly efficient: real time
response is needed in many
cases.
• easily evolvable: technology
and algorithms fast evolution
have to be followed by the
devices.
Electronic devices are converging to platforms that are:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3
SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes
Problem Statement
.
• portable: dimensions and
battery life limits have to be
taken into consideration.
• multimedial: different
applications have to be
executed, very often at the
same time.
• higly efficient: real time
response is needed in many
cases.
• easily evolvable: technology
and algorithms fast evolution
have to be followed by the
devices.
need of new design flows to address this complex scenario
Electronic devices are converging to platforms that are:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 3
SOURCE: keynote @ Merging Media 2013 in Vancouver Canada by Gary Hayes
Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau
• Systems are required to be flexible and efficient.
4
Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau
ASIC
GPP
DSP
RP
Performance
Fle
xib
ilit
y
• Systems are required to be flexible and efficient.
• Reconfigurable Paradigm (RP) to hardware design: specialized
computing platforms, capable of changing configuration to serve the
targeted computations.
4
Two Step Problem Solving Step 1: Coarse-Grained Reconfigurability
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau
FINE-
GRAINED
(FG)
COARSE-
GRAINED
(CG)
Bit-level Word-level
Flexibility
Reconf.
Speed
Config.
Storage
ASIC
GPP
DSP
RP
Performance
Fle
xib
ilit
y
• Systems are required to be flexible and efficient.
• Reconfigurable Paradigm (RP) to hardware design: specialized
computing platforms, capable of changing configuration to serve the
targeted computations.
• The more the hardware is specialized, the more it is difficult to
program.
4
A dataflow program is a directed graph of functional units (actors)
exchanging sequences of data (tokens) through dedicated channels.
actions
state
Actors encapsulate their state and communicate exclusively by sending
and receiving tokens. Such an absence of race conditions, along with
the intrinsic modularity of the dataflow graphs, make it possible to
explicit the algorithmic parallelism of the programs.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 5
Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (1)
Standard RVC Model
Decoder
Composition
Mechanism
VTL [2]
(RVC-CAL FUs)
Abstract Decoder
Model
(XDF + RVC-Cal)
Proprietary Decoder
Decoder
Implementation Proprietary
Tool Library
Decoder
Solution
[1] MPEG-B part 4 (2009), [2] MPEG-C part 4 (2010)
CAL[1] = Cal Actor Language, high-level programming language for
describing dataflow actors.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 6
Two Step Problem Solving Step 2: Dataflow Model of Computation and RVC-CAL (2)
Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications
A
B
C D
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7
Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications
1:1
HW platform
A
B
C D
A
B
C D
clock
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7
Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications
1:1
HW platform
A
B
C D
A
B
C D
clock
A
B
C D
B E D
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7
Generation of Multi-Dataflow Graphs (1) RVC-CAL dataflow specifications
1:1
HW platform
HW CG reconfigurable platform
A
B
C D
A
B
C D
clock
A
B
C D
B E D
A
B
C
D
clock
E
2:1
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 7
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
Challenges of the modern electronic devices design:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
resource sharing
CG reconfiguration
dataflow parallelism highlighting
dataflow modularity (VTL)
Challenges of the modern electronic devices design:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
resource sharing
CG reconfiguration
dataflow parallelism highlighting
dataflow modularity (VTL)
Challenges of the modern electronic devices design:
perfect matching, but…
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
resource sharing
CG reconfiguration
dataflow parallelism highlighting
dataflow modularity (VTL)
Challenges of the modern electronic devices design:
perfect matching, but… what happens if the design complexity grows?
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
resource sharing
CG reconfiguration
dataflow parallelism highlighting
dataflow modularity (VTL)
Challenges of the modern electronic devices design:
perfect matching, but… what happens if the design complexity grows?
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
Generation of Multi-Dataflow Graphs (2)
• low area and power (portability)
• flexibility (multimediality)
• performances (high efficiency)
• code reusability (fast evolution)
resource sharing
CG reconfiguration
dataflow parallelism highlighting
dataflow modularity (VTL)
Challenges of the modern electronic devices design:
perfect matching, but… what happens if the design complexity grows?
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 8
need of an automated design flow
Outline • Introduction
– Problem Statement
– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability
• Step 2: Dataflow Model of Computation and RVC-CAL
– Generation of Multi-Dataflow Graphs
• Automated Design Flow
– Composition: the Multi-Dataflow Composer
– Optimization: TURNUS Co-Exploration Framework
– RTL Generation: Xronos High Level Synthesis
– Tools Integration
• Experimental Results
– An MPEG-4 SP Decoder Use Case
– Synthesis Results
– Performance Results
• Conclusions
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 9
Automated Design Flow Functionalities required by the automated design flow:
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10
Automated Design Flow
multi-dataflow network composition
Functionalities required by the automated design flow:
A
B C D
B E D
A
B
C
E
D
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10
Automated Design Flow
multi-dataflow network composition
Functionalities required by the automated design flow:
dataflow network buffers optimization
A
B C D
B E D
A
B
C
E
D
A
B
C
E
D
A
B
C
E
D
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10
Automated Design Flow
multi-dataflow network composition
Functionalities required by the automated design flow:
dataflow network buffers optimization
multi-dataflow corresponding RTL generation
A
B C D
B E D
A
B
C
E
D
A
B
C
E
D
A
B
C
E
D
A
B
C
E
D
A
B
C
D
clock
E
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 10
Open RVC-CAL
Compiler (ORCC) is a
framework able to
generate, from an
RVC-CAL specification,
the corresponding
source code for
different target
platforms (hardware,
software or mixed).
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
Open RVC-CAL
Compiler (ORCC) is a
framework able to
generate, from an
RVC-CAL specification,
the corresponding
source code for
different target
platforms (hardware,
software or mixed).
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
RVC-CAL
specification
CAL
Actors
XDF
Graph
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
Open RVC-CAL
Compiler (ORCC) is a
framework able to
generate, from an
RVC-CAL specification,
the corresponding
source code for
different target
platforms (hardware,
software or mixed).
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
RVC-CAL
specification
Intermediate
Representation
CAL
Actors
XDF
Graph
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
Open RVC-CAL
Compiler (ORCC) is a
framework able to
generate, from an
RVC-CAL specification,
the corresponding
source code for
different target
platforms (hardware,
software or mixed).
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
RVC-CAL
specification
Intermediate
Representation
CAL
Actors
XDF
Graph
Source
Code
1:1
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
fro
nt-
en
d
ba
ck-e
nd
MDC
MDC exploits
the ORCC front-
end to generate,
from a set of
RVC-CAL
specifications,
the hardware
description of a
reconfigurable
platform.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
fro
nt-
en
d
ba
ck-e
nd
MDC
RVC-CAL
specifications
CAL
Actors XDF
Graphs
MDC exploits
the ORCC front-
end to generate,
from a set of
RVC-CAL
specifications,
the hardware
description of a
reconfigurable
platform.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
fro
nt-
en
d
ba
ck-e
nd
MDC
RVC-CAL
specifications
CAL
Actors XDF
Graphs
Intermediate
Representation
MDC exploits
the ORCC front-
end to generate,
from a set of
RVC-CAL
specifications,
the hardware
description of a
reconfigurable
platform.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
fro
nt-
en
d
ba
ck-e
nd
Multi-
Dataflow
XDF Graph
MDC
RVC-CAL
specifications
CAL
Actors XDF
Graphs
Intermediate
Representation
MDC exploits
the ORCC front-
end to generate,
from a set of
RVC-CAL
specifications,
the hardware
description of a
reconfigurable
platform.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
front-end
back-ends
C Java LLVM …
ORCC
Composition: the Multi-Dataflow
Composer (1)
fro
nt-
en
d
ba
ck-e
nd
Multi-
Dataflow
XDF Graph
HDL
Components
Library
HW
Reconfigurable
Platform (HDL)
MDC
RVC-CAL
specifications
CAL
Actors XDF
Graphs
Intermediate
Representation
MDC exploits
the ORCC front-
end to generate,
from a set of
RVC-CAL
specifications,
the hardware
description of a
reconfigurable
platform.
n:1
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 11
Composition: the Multi-Dataflow
Composer (2)
A
B
C D
B E D
RVC-CAL dataflow specifications
dataflow α
dataflow β
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12
Composition: the Multi-Dataflow
Composer (2)
A
B
C D
B E D
RVC-CAL dataflow specifications
MD
C fro
nt-
en
d
A
B
C
D s0
E
s1
multi-dataflow specification
dataflow α
dataflow β
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12
Composition: the Multi-Dataflow
Composer (2)
HW CG reconfigurable platform
A
B
C D
B E D
A
B
C
D
clock
E
s0 s1
RVC-CAL dataflow specifications
MD
C fro
nt-
en
d
A
B
C
D s0
E
s1
multi-dataflow specification
MDC back-end
config
LUT α β
s1 0 1
s2 0 1
dataflow α
dataflow β
0
1
0
1
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 12
TURNUS is a design space
exploration framework for
heterogeneous parallel
systems. It provides high-level
modeling and simulation
methods and tools
for system level performances
estimation and optimization.
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
It provides the execution
causation traces for a given
dataflow specification in relation
to a particular input stimulus.
The causation traces are
graphs describing the
dependencies among the
actions executed during the
input stimulus processing.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
RVC-CAL
specification
CAL
Actors
XDF
Graph
back-end
Input
stimulus
It provides the execution
causation traces for a given
dataflow specification in relation
to a particular input stimulus.
The causation traces are
graphs describing the
dependencies among the
actions executed during the
input stimulus processing.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
RVC-CAL
specification
Simulation
CAL
Actors
XDF
Graph
back-end
Input
stimulus
It provides the execution
causation traces for a given
dataflow specification in relation
to a particular input stimulus.
The causation traces are
graphs describing the
dependencies among the
actions executed during the
input stimulus processing.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
RVC-CAL
specification
Simulation
CAL
Actors
XDF
Graph
back-end
Input
stimulus
Execution
Causation
Traces
It provides the execution
causation traces for a given
dataflow specification in relation
to a particular input stimulus.
The causation traces are
graphs describing the
dependencies among the
actions executed during the
input stimulus processing.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
Execution
Causation
Traces
The causation traces post-
processing analyses the
causation traces along with
given action weights (latency of
the actions for a specific target
platform) in order to optimize
the size of the FIFO memories
in the dataflow network through
a Design Space Exploration.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
Execution
Causation
Traces
Actions
weights
The causation traces post-
processing analyses the
causation traces along with
given action weights (latency of
the actions for a specific target
platform) in order to optimize
the size of the FIFO memories
in the dataflow network through
a Design Space Exploration.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
Execution
Causation
Traces
Actions
weights
The causation traces post-
processing analyses the
causation traces along with
given action weights (latency of
the actions for a specific target
platform) in order to optimize
the size of the FIFO memories
in the dataflow network through
a Design Space Exploration.
Design
Space
Exploration
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
front-end
TURNUS
Optimization: TURNUS Co-Exploration
Framework
back-end
Execution
Causation
Traces
Actions
weights
The causation traces post-
processing analyses the
causation traces along with
given action weights (latency of
the actions for a specific target
platform) in order to optimize
the size of the FIFO memories
in the dataflow network through
a Design Space Exploration.
Design
Space
Exploration
Optimal
FIFO
Sizes
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 13
Xronos is a framework for the
generation of RTL descriptions
from dataflow or sequential
applications.
It supports three basic data
types (int, uint and bool), flow
control statements (branches,
loops, parallel blocks) and
procedural abstractions
(sequences of statements)
called tasks.
front-end
Xronos
RTL Generation: Xronos High
Level Synthesis (1)
back-end
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14
Xronos is a framework for the
generation of RTL descriptions
from dataflow or sequential
applications.
It supports three basic data
types (int, uint and bool), flow
control statements (branches,
loops, parallel blocks) and
procedural abstractions
(sequences of statements)
called tasks.
front-end
Xronos
RTL Generation: Xronos High
Level Synthesis (1)
RVC-CAL
specification
CAL
Actors
XDF
Graph
back-end
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14
Xronos is a framework for the
generation of RTL descriptions
from dataflow or sequential
applications.
It supports three basic data
types (int, uint and bool), flow
control statements (branches,
loops, parallel blocks) and
procedural abstractions
(sequences of statements)
called tasks.
front-end
Xronos
RTL Generation: Xronos High
Level Synthesis (1)
RVC-CAL
specification
Intermediate
Representation
CAL
Actors
XDF
Graph
back-end
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14
Xronos is a framework for the
generation of RTL descriptions
from dataflow or sequential
applications.
It supports three basic data
types (int, uint and bool), flow
control statements (branches,
loops, parallel blocks) and
procedural abstractions
(sequences of statements)
called tasks.
front-end
Xronos
RTL Generation: Xronos High
Level Synthesis (1)
RVC-CAL
specification
Intermediate
Representation
CAL
Actors
XDF
Graph
Actors
.v
1:1
back-end
Top
.vhd Test
.v
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 14
RTL Generation: Xronos High
Level Synthesis (2)
A
B
C D
A
C D
B clock
RVC-CAL dataflow specification HW platform
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 15
RTL Generation: Xronos High
Level Synthesis (2)
A
B
C D
A
C D
B clock
RVC-CAL dataflow specification HW platform
Actor_X queue_Actor_X_in1 in_DATA in_SEND in_ACK in_RDY in_COUNT
out_DATA out_SEND
out_ACK out_COUNT
in1_DATA in1_SEND in1_ACK in1_COUNT
out1_DATA out1_SEND
out1_ACK
out1_COUNT out1_RDY
fanout_Actor_X_out1 in_DATA in_SEND in_ACK in_RDY in_COUNT
fanout_Actor_X_outM in_DATA in_SEND in_ACK in_RDY in_COUNT
out_DATA out_SEND
out_ACK
out_COUNT out_RDY
out_DATA out_SEND
out_ACK
out_COUNT out_RDY
outM_DATA outM_SEND
outM_ACK
outM_COUNT outtM_RDY
inN_DATA inN_SEND inN_ACK inN_COUNT
queue_Actor_X_inN in_DATA in_SEND in_ACK in_RDY in_COUNT
out_DATA out_SEND
out_ACK out_COUNT
…
…
…
…
…
…
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 15
START: MDC compeses the multi-dataflow
network
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
MDC
RVC-CAL
specifications
IR XDF multi
dataflow
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
MDC
RVC-CAL
specifications
IR XDF multi
dataflow
TURNUS
Co-
Exploration
Framework
STEP 1: TURNUS generates the execution
causation traces
TRACES
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
MDC
RVC-CAL
specifications
IR
XRONOS
High Level
Synthesis
XDF multi
dataflow
WEIGHTS TURNUS
Co-
Exploration
Framework
TRACES
STEP 2: Xronos generates the actions weights
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
MDC
RVC-CAL
specifications
IR
XRONOS
High Level
Synthesis
XDF multi
dataflow
WEIGHTS
FIFO
SIZES
TURNUS
Co-
Exploration
Framework
TRACES
STEP 3: TURNUS generates the FIFO optimal
sizes
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
HDL
Components
Library
MDC
RVC-CAL
specifications
IR
XRONOS
High Level
Synthesis
XDF multi
dataflow
WEIGHTS
FIFO
SIZES
TURNUS
Co-
Exploration
Framework
HW CG
Reconfigurable
Platform
(with SBox
queues/fanouts) TRACES
STEP 4: Xronos generates the RVC compliant
reconfigurable hardware platform
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Tools Integration
front-end
back-ends
ORCC
CAL
XDF
front-
end
back-e
nd
HDL
Components
Library
opt HW CG
Reconfigurable
Platform
(without Sbox
queues/fanouts)
MDC
RVC-CAL
specifications
IR
XRONOS
High Level
Synthesis
XDF multi
dataflow
WEIGHTS
FIFO
SIZES
TURNUS
Co-
Exploration
Framework
HW CG
Reconfigurable
Platform
(with SBox
queues/fanouts) TRACES
STEP 5: MDC generates the optimized
reconfigurable hardware platform
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 16
Outline • Introduction
– Problem Statement
– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability
• Step 2: Dataflow Model of Computation and RVC-CAL
– Generation of Multi-Dataflow Graphs
• Automated Design Flow
– Composition: the Multi-Dataflow Composer
– Optimization: TURNUS Co-Exploration Framework
– RTL Generation: Xronos High Level Synthesis
– Tools Integration
• Experimental Results
– An MPEG-4 SP Decoder Use Case
– Synthesis Results
– Performance Results
• Conclusions
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 17
An MPEG-4 SP Decoder Use Case (1)
PARSER
input
stream
TEX Y
TEX U
TEX V
MOT Y
MOT U
MOT V
MERGER
420
video
output
intra
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18
An MPEG-4 SP Decoder Use Case (1)
PARSER
input
stream
TEX Y
TEX U
TEX V
MOT Y
MOT U
MOT V
MERGER
420
video
output
intra PARSER:
• syntactical parser
• variable length coding
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18
An MPEG-4 SP Decoder Use Case (1)
PARSER
input
stream
TEX Y
TEX U
TEX V
MOT Y
MOT U
MOT V
MERGER
420
video
output
intra
TEX (residual decoding):
• AC-DC prediction
• inverse scan
• inverse quantization
• IDCT
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18
An MPEG-4 SP Decoder Use Case (1)
PARSER
input
stream
TEX Y
TEX U
TEX V
MOT Y
MOT U
MOT V
MERGER
420
video
output
intra
MOT (motion compensation):
• framebuffer
• interpolation
• residual error addition
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18
An MPEG-4 SP Decoder Use Case (1)
PARSER
input
stream
TEX Y
TEX U
TEX V
MOT Y
MOT U
MOT V
MERGER
420
video
output
intra
MERGER 420:
• 420 decoded
components merging
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 18
An MPEG-4 SP Decoder Use Case (2)
design
number of actors
ordinary* SBox not
shared** shared** overall
intra 32 0 32 0 32
full 38 0 38 0 38
parallel 70 0 70 0 70
reconf 45 45 20 25 90
opt_reconf 45 45 20 25 90
* computational actors (not SBoxes).
** referred only to the computational actors (not to SBoxes).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19
Different designs composition in terms of functionality and role of the
involved actors.
An MPEG-4 SP Decoder Use Case (2)
design
number of actors
ordinary* SBox not
shared** shared** overall
intra 32 0 32 0 32
full 38 0 38 0 38
parallel 70 0 70 0 70
reconf 45 45 20 25 90
opt_reconf 45 45 20 25 90
* computational actors (not SBoxes).
** referred only to the computational actors (not to SBoxes).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19
Different designs composition in terms of functionality and role of the
involved actors.
An MPEG-4 SP Decoder Use Case (2)
design
number of actors
ordinary* SBox not
shared** shared** overall
intra 32 0 32 0 32
full 38 0 38 0 38
parallel 70 0 70 0 70
reconf 45 45 20 25 90
opt_reconf 45 45 20 25 90
* computational actors (not SBoxes).
** referred only to the computational actors (not to SBoxes).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19
Different designs composition in terms of functionality and role of the
involved actors.
An MPEG-4 SP Decoder Use Case (2)
design
number of actors
ordinary* SBox not
shared** shared** overall
intra 32 0 32 0 32
full 38 0 38 0 38
parallel 70 0 70 0 70
reconf 45 45 20 25 90
opt_reconf 45 45 20 25 90
* computational actors (not SBoxes).
** referred only to the computational actors (not to SBoxes).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19
Different designs composition in terms of functionality and role of the
involved actors.
An MPEG-4 SP Decoder Use Case (2)
design
number of actors
ordinary* SBox not
shared** shared** overall
intra 32 0 32 0 32
full 38 0 38 0 38
parallel 70 0 70 0 70
reconf 45 45 20 25 90
opt_reconf 45 45 20 25 90
* computational actors (not SBoxes).
** referred only to the computational actors (not to SBoxes).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 19
Different designs composition in terms of functionality and role of the
involved actors.
Synthesis Results (1)
resource design
parallel reconf Δ% opt_reconf Δ%
Slices 22495 18447 -19 16383 -27
Slice Regs 27567 23034 -16 23674 -14
Slice LUTs 67445 52836 -22 48545 -28
LUT-FF pairs 71011 55946 -21 52518 -26
BRAMs 148 154 +4 112 -24
DSPs 36 18 -50 18 -50
Max freq
[MHz] 25,43 25,22 -1 25,38 0
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the Xilinx Synthesis Technology tool targeting a
Xilinx Virtex 5 330 FPGA board.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20
Synthesis Results (1)
resource design
parallel reconf Δ% opt_reconf Δ%
Slices 22495 18447 -19 16383 -27
Slice Regs 27567 23034 -16 23674 -14
Slice LUTs 67445 52836 -22 48545 -28
LUT-FF pairs 71011 55946 -21 52518 -26
BRAMs 148 154 +4 112 -24
DSPs 36 18 -50 18 -50
Max freq
[MHz] 25,43 25,22 -1 25,38 0
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the Xilinx Synthesis Technology tool targeting a
Xilinx Virtex 5 330 FPGA board.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20
Synthesis Results (1)
resource design
parallel reconf Δ% opt_reconf Δ%
Slices 22495 18447 -19 16383 -27
Slice Regs 27567 23034 -16 23674 -14
Slice LUTs 67445 52836 -22 48545 -28
LUT-FF pairs 71011 55946 -21 52518 -26
BRAMs 148 154 +4 112 -24
DSPs 36 18 -50 18 -50
Max freq
[MHz] 25,43 25,22 -1 25,38 0
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the Xilinx Synthesis Technology tool targeting a
Xilinx Virtex 5 330 FPGA board.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20
Synthesis Results (1)
resource design
parallel reconf Δ% opt_reconf Δ%
Slices 22495 18447 -19 16383 -27
Slice Regs 27567 23034 -16 23674 -14
Slice LUTs 67445 52836 -22 48545 -28
LUT-FF pairs 71011 55946 -21 52518 -26
BRAMs 148 154 +4 112 -24
DSPs 36 18 -50 18 -50
Max freq
[MHz] 25,43 25,22 -1 25,38 0
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the Xilinx Synthesis Technology tool targeting a
Xilinx Virtex 5 330 FPGA board.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 20
Synthesis Results (2)
resource
power
design
parallel reconf Δ% opt_reconf Δ%
Clock 0,382 0,347 -9 0,323 -15
Logic 0,045 0,025 -44 0,024 -47
Signals 0,050 0,031 -38 0,031 -38
BRAMs 0,090 0,090 0 0,059 -34
TOT 0,567 0,493 -13 0,437 -23
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex
5 330 FPGA board for a QCIF video sequence decoding.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21
Synthesis Results (2)
resource
power
design
parallel reconf Δ% opt_reconf Δ%
Clock 0,382 0,347 -9 0,323 -15
Logic 0,045 0,025 -44 0,024 -47
Signals 0,050 0,031 -38 0,031 -38
BRAMs 0,090 0,090 0 0,059 -34
TOT 0,567 0,493 -13 0,437 -23
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex
5 330 FPGA board for a QCIF video sequence decoding.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21
Synthesis Results (2)
resource
power
design
parallel reconf Δ% opt_reconf Δ%
Clock 0,382 0,347 -9 0,323 -15
Logic 0,045 0,025 -44 0,024 -47
Signals 0,050 0,031 -38 0,031 -38
BRAMs 0,090 0,090 0 0,059 -34
TOT 0,567 0,493 -13 0,437 -23
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results retrieved from the XPower Analyzer tool targeting a Xilinx Virtex
5 330 FPGA board for a QCIF video sequence decoding.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 21
Performance Results
video
sequence
resolution
design
parallel reconf Δ% opt_reconf Δ%
QCIF 110 105 -5 105 -5
CIF 28 26 -6 26 -6
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results are given in frames per second (fps) and are obtained as the
average of fps for the intra and the full decoder configurations.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 22
Performance Results
video
sequence
resolution
design
parallel reconf Δ% opt_reconf Δ%
QCIF 110 105 -5 105 -5
CIF 28 26 -6 26 -6
Δ% percentage increment/reduction between the parallel and the reconfigurable designs.
Results are given in frames per second (fps) and are obtained as the
average of fps for the intra and the full decoder configurations.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 22
Outline • Introduction
– Problem Statement
– Two Step Problem Solving • Step 1: Coarse-Grained Reconfigurability
• Step 2: Dataflow Model of Computation and RVC-CAL
– Generation of Multi-Dataflow Graphs
• Automated Design Flow
– Composition: the Multi-Dataflow Composer
– Optimization: TURNUS Co-Exploration Framework
– RTL Generation: Xronos High Level Synthesis
– Tools Integration
• Experimental Results
– An MPEG-4 SP Decoder Use Case
– Synthesis Results
– Performance Results
• Conclusions
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 23
Conclusions • Challenge of modern electronic systems development: coupling
portability, flexibility and efficiency with the minimum design effort.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24
Conclusions • Challenge of modern electronic systems development: coupling
portability, flexibility and efficiency with the minimum design effort.
• A solution is possible by exploiting the dataflow programming
paradigm along with the coarse-grained reconfigurability.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24
Conclusions • Challenge of modern electronic systems development: coupling
portability, flexibility and efficiency with the minimum design effort.
• A solution is possible by exploiting the dataflow programming
paradigm along with the coarse-grained reconfigurability.
• An automated design flow has been assembled, by the integration of
different RVC-CAL tools (MDC, TURNUS, Xronos).
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24
Conclusions • Challenge of modern electronic systems development: coupling
portability, flexibility and efficiency with the minimum design effort.
• A solution is possible by exploiting the dataflow programming
paradigm along with the coarse-grained reconfigurability.
• An automated design flow has been assembled, by the integration of
different RVC-CAL tools (MDC, TURNUS, Xronos).
• The proposed design flow has been validated on a real use case
involving two configurations of the MPEG-4 Simple Profile decoder.
• Results highlighted the effectiveness of the approach by achieving
more than 25% of saving in terms of resource utilization and more
than 20% of saving in terms power consumption.
• The generated reconfigurable designs present a very small
performance penalty (5-6%) with respect to the original decoder
designs.
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 24
Acknowledgements
SAMOS XIV - 2014 July 14th - Samos Island (Greece) - Carlo Sau 25
The research leading to these results has received funding from:
• the Region of Sardinia L.R.7/2007
under grant agreement CRP-18324
[RPCT Project].
• the Region of Sardinia, Young
Researchers Grant, POR Sardegna FSE
2007-2013, L.R.7/2007 “Promotion of the
scientific research and technological
innovation in Sardinia”
Automated Design Flow for Coarse-Grained
Reconfigurable Platforms: an RVC-CAL
Multi-Standard Decoder Use-Case
XIV International Conference on Embedded Computer and Systems: Architectures, MOdeling and Simulation
SAMOS XIV - 2014 July 14th - Samos Island (Greece)
Carlo Sau
DIEE
Università degli Studi di Cagliari