6. application mapping

6. A

pplica

tion m

appin

g (p

art 2

)

1

6. APPLICATION MAPPING

6.3 HW/SW partitioning

6.4 Mapping to heterogeneous multi-proces-sors

2

6. A

pplica

tion m

appin

g (p

art 2

)6.3 HW/SW PARTITIONING

6.3.1 Introduction By hardware/software partitioning we mean the mapping of

task graph nodes to either hardware or software. Applying hardware/software partitioning, we will be able to decide which parts must be implemented in hardware and which in software.

6.3.2 COOL (COdegign toOL) For COOL, the input consists of three parts:

① Target technology : This part of the input to COOL comprises informa-tion about the available hardware platform components. The type of the processors used must be included in this part of the input to COOL.

② Design constraints : The second part of the input comprises design constraints such as the required throughput, latency, maximum memory size, or maximum area for application-specific hardware.

③ Behavior : The third part of the input describes the required overall behavior. Hierarchical task graphs are used for this. COOL used two kinds of edges: communication edges and timing edges.

3

6. A

pplica

tion m

appin

g (p

art 2

) For partitioning, COOL uses the following steps:

① Translation of the behavior into an internal graph model.② Translation of the behavior of each node from VHDL into C.③ Compilation of all C program for the selected target processor type,

computation of the resulting program size, estimation of the result-ing execution time.

④ Synthesis of hardware components: For each leaf node, application-specific hardware is synthesized.

⑤ Flattening the hierarchy: The next step is to extract a flat graph from the hierarchical flow graph

⑥ Generating and solving a mathematical model of the optimization problem: COOL uses integer linear programming (ILP) to solve the optimization problem.

⑦ Iterative improvements: In order to work with good estimates of the communication time, adjacent nodes mapped to the same hardware component are now merged.

⑧ Interface synthesis: After partitioning, the glue logic required for in-terfacing processors, application-specific hardware and memories is created.

4

6. A

pplica

tion m

appin

g (p

art 2

) The following index sets will be used in the description of

ILP model: Index set V denotes task graph nodes. Each vV corresponds to

one task graph node. Index set L denotes task graph node types. Each lL corresponds

to one task graph node type. Index set of M denotes hardware component types. Each mM cor-

responds to one hardware component type. For each of the hardware components, there may be multiple

copies, or “instances”. Each instance is identified by an index jJ. Index set KP denotes processors. Each kKP identifies one of the

processors. The following decision variables are required by the model:

Xv,m : this variable will be 1, if node v is mapped to hardware com-ponent type mM and 0 otherwise.

Yv,k : this variable will be 1, if node v is mapped to processor kKP and 0 otherwise.

NYl,k : this variable will be 1, if at least one node of type l is mapped to processor kKP and 0 otherwise.

Type is a mapping VL from task graph to their corresponding types.

5

6. A

pplica

tion m

appin

g (p

art 2

) The cost function accumulates the total cost of all hard-

ware units: C=processor costs + memory costs + cost of application spe-

cific

hardware We can now present a brief description of some of the

constraints of the ILP model: Operation assignment constraints: These constraints guarantee

that each operation is implemented either in hardware or in software.

Additional constraints ensure that decision variables Xv,m and Yv,k have 1 as an upper bound and, hence, are in fact 0/1-valued variables:

If the functionality of a certain node of type l is mapped to some processor k, then the processors’ instruction memory must include a copy of the software for this function:

, ,: 1 (6.10)v m v km M k KP

v V X Y

, 0 ,v mX N , 0 ,v kY N

,: : 1 (6.13)v mv V m M X

,: : 1 (6.14)v kv V k KP Y

6

6. A

pplica

tion m

appin

g (p

art 2

), ,, : ( ) , : (6.15)l l k v kl L v Type v c k KP NY Y

Additional constraints ensure that decision variables NYl,k are also

0/1-valued variables:

• Resource constraints• Precedence constraints• Design constraints• Timing constraints

,: : 1 (6.16)l kl L k KP NY

http://www.youtube.com/watch?v=o9T0SFdjPqs

7

6. A

pplica

tion m

appin

g (p

art 2

) Example: In the following, we will show how these con-

straints can be generated for the task graph in Fig. 6.29.

Suppose that we have a hardware component library containing three components types H1, H2 and H3 with costs of 20, 25 and 30 cost units, respectively. Furthermore, suppose that we can also use a processor P of cost 5.

Execution times of tasks T1 to T5 on components

T1

T2

T5

T3 T4

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

8

6. A

pplica

tion m

appin

g (p

art 2

) The following operation assignment constraints must be gener-

ated, assuming that a maximum of one processor (P1) is to be used:

X1,1 + Y1,1 = 1 (Task 1 either mapped to H1 or to P1)




X5,1 + Y5,1 = 1 (Task 5 either mapped to H1 or to P1) Furthermore, assume that the types of tasks T1 to T5 are l=1, 2, 3,

3 and 1, respectively. Then, the following additional resource con-straints are required:

NY1,1 Y1,1 (6.17)

NY2,1 Y2,1

NY3,1 Y3,1

NY3,1 Y4,1

NY1,1 Y5,1 (6.18)

The total function is:

Where #() denotes the number of instances of hardware components. This number can be computed from the variables introduced so far if the schedule is also taken into account.

1 2 320*#( ) 25*#( ) 30*#( ) 5*#( )C H H H P

9

6. A

pplica

tion m

appin

g (p

art 2

)For a timing constraint of 100 time units, the minimum cost design comprises components H1, H2 and P. This means that tasks T3 and T4 are implemented in software and all others in hardware.

6.4 Mapping to heterogeneous multi-processors The different approaches for this mapping can be classified

by two criteria: mapping tools may either assume a fixed execution platform or may design such a platform during the mapping and they may or may not include automatic parallelization of the source codes.

The DOL tools from ETH incorporate: Automatic selection of computation templates Automatic selection of communication techniques Automatic selection of scheduling and arbitration

The input to DOL consists of a set of tasks together with use cases. The output describes the execution platform, the mapping of tasks to processors together with task sched-ules. This output is executed to meet constraints and to maximize objectives.

10

6. A

pplica

tion m

appin

g (p

art 2

)• DOL problem graph

RISC

HWM1

HWM2

PTP bus

shared bus

RISC HWM1

PTP bus

HWM2

shared bus

• DOL architecture graph

6. A

pplica

tion m

appin

g (p

art 2

)

11

• DOL specification graph

6. A

pplica

tion m

appin

g (p

art 2

)

12

• DOL implementation

An allocation : is a subset of the architecture graph, repre-senting hardware components allocated (selected) for a par-ticular design.

A binding : a selected subset of the edges between specifica-tion and architecture identifies a relation between the two. Se-lected edges are called bindings.

A schedule : assigns start times to each node v in the prob-lem graph.

6. application mapping

Documents

applicationspecific

node of type

hardware units

hardware component types

mapping of task graph

task graph node types

node v

application mapping6