1
SYNTHESIS of PIPELINED SYSTEMS for the
CONTEMPORANEOUS EXECUTION of PERIODIC
and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS
Paolo PalazzariLuca Baldini
Moreno Coli
ENEA – Computing and Modeling UnitUniversity “La Sapienza” – Electronic Engineering Dep’t
2
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
3
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
4
Problem Statement We want to synthesize a
synchronous pipelined system which executes both the task PSy , sustaining its throughput, and m mutually exclusive tasks PAs
1, PAs2
, …, PAs
m whose activations are randomly triggered and whose results must be produced within a prefixed time.
5
Problem Statement We represent the tasks as Control Data Flow
Graphs (CDFG) G = (N, E)
N = {n1, n2, …, nN}: operations of the task
E=
(data and control dependencies)
ijjiji nnnnnn on dependent data/ctrl is ,, | , N
6
Problem Statement Aperiodic tasks, characterized by random
execution requests and called asynchronous to mark the difference with the synchronous nature of periodic tasks, are subjected to Real-Time constraints (RTC), collected in the set RTCAs = {RTCAs
1, RTCAs2, ..., RTCAs
m}, where RTCAs
i contains the RTC on the ith aperiodic task.
Input data for the synchronous task PSy arrive with frequency fi = 1/t, being t the period characterizing PSy.
7
Problem Statement
We present a method to determine The target architecture: a (nearly)
minimum set of HW devices to execute all the tasks (synchronous and asynchronous);
The feasible mapping onto the architecture: the allocation and the scheduling on the HW resources so that PSy is executed sustaining its prefixed throughput and all the mutually exclusive asynchronous tasks PAs
1, PAs2, …, PAs
m satisfy the constraints in RTCAs.
8
Problem Statement The adoption of a parallel system
can be mandatory when Real Time Constraints are computationally demanding
The iterative arrival of input data makes pipeline systems a very suitable solution for the problem.
9
Problem Statement Example of a pipeline serving the
synchronous task PSy
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S10
S10
S10
S10
S1
S1
S1
S1
S2
S2
S2
S2
S3
S3
S3
S3
S4
S4
S4
S4
S5
S5
S5
S5
S6
S6
S6
S6
S7
S7
S7
S7
S8
S8
S8
S8
S9
S9
S9
S9
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
DATA INTRODUCTION INTERVAL
DII = 2
0 100 200 300 400 500 600 700 800 900 1000
t (ut)
Tck = 50 ut
10
Problem Statement
Sk = (k-1)Tck and Sk = kTck
In a pipeline with L stages, SL denotes the last stage.
DII = t/Tck
11
Problem Statement
We assume the absence of synchronization delays due to control or data dependencies:
Throughput of the pipeline system =1/DII.
12
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
13
Asynchronous events
We assume the asynchronous tasks to be mutually exclusive, i.e. the activation of only one asynchronous task can be requested between two successive activations of the periodic task
14
Asynchronous eventsIn red the asynchronous service requests in a pipelined system.
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S10
S10
S10
S10
S1
S1
S1
S1
S2
S2
S2
S2
S3
S3
S3
S3
S4
S4
S4
S4
S5
S5
S5
S5
S6
S6
S6
S6
S7
S7
S7
S7
S8
S8
S8
S8
S9
S9
S9
S9
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
DII = 2
0 100 200 300 400 500 600 700 800 900 1000
t (ut) t0A1 t0A2 t0A3 t0A4 t0A5
(I0A1) (I0A2) (I0A3) (I0A4) (I0A5)
15
Asynchronous eventsLike the synchronous events, we represent the asynchronous events
{PAs1, PAs
2, ..., PAs
m}
through a set of CDFG
ASG = {AsG1(NAs1,EAs
1), ... , AsGm(NAsm,EAs
m)}
16
Asynchronous events We consider a unique CDFG made up by
composing the graph of the periodic task with the m graphs of the aperiodic tasks:
G(N, E) = SyG(NSy, ESy) AsG1(NAs1, EAs
1)
AsG2(NAs2, EAs
2) ..……. AsGm(NAsm, EAs
m)
17
Asynchronous eventsAperiodic tasks are subjected to Real-Time constraints (RTC):
As all RTC must be respected, mapping function M has to define a scheduling so that
- i 0 RTCAsi RTCAs
iiii
Li
As AsSRTC by finish must execution AsP|,
iLAs
S
18
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
19
Mapping methodology
In order to develop a pipeline system implementing G
a HW resource rj = D(nj)
anda time step Sk
must be associated to each nj N
20
Mapping methodology
We must determine the mapping functionM: N UR S
UR is the set of the used HW resources (each rj is replicated kj times),
p1
j
kp
2p
1p
k22
22
12
k1
21
11
p21jk
j
r,...,r,r,...,r,...,r,r,r,...,r,r
UR,...,UR,URRrr
UR
21
Mapping methodology
rj = D(ni) is the HW resource on which ni will be executed
S(ni) is the stage of the pipeline, or the time step, in which ni will be executed
22
Mapping methodology
We search for the mapping function M’ which, for a given DII:
Respects all the RTC Uses a minimum number ur of
resources Gives the minimum pipeline length for
the periodic task
23
Mapping methodology The mapping is determined by solving
the following minimization problem:
)()()(min)'( 321 MCMCMCMCMCM
C1(M) is responsible of the fulfillment of all the RTCC2(M) minimizes the used silicon area
C3(M) minimizes the length of the pipeline.
24
Mapping methodology
While searching for a mapping of G, we force the response to aperiodic tasks to be synchronous with the periodic task
The execution of an aperiodic task, requested at a generic time instant t0, is delayed till the next start of the pipeline of the periodic task.
25
Mapping methodology
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S10
S10
S10
S10
S1
S1
S1
S1
S2
S2
S2
S2
S3
S3
S3
S3
S4
S4
S4
S4
S5
S5
S5
S5
S6
S6
S6
S6
S7
S7
S7
S7
S8
S8
S8
S8
S9
S9
S9
S9
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
DII = 2
0 100 200 300 400 500 600 700 800 900 1000
t (ut) t0A1 t0A2 t0A3 t0A4 t0A5
(I0A1) (I0A2) (I0A3) (I0A4) (I0A5)
26
Mapping methodology
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S10
S10
S10
S10
S1
S1
S1
S1
S2
S2
S2
S2
S3
S3
S3
S3
S4
S4
S4
S4
S5
S5
S5
S5
S6
S6
S6
S6
S7
S7
S7
S7
S8
S8
S8
S8
S9
S9
S9
S9
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
DII = 2
0 100 200 300 400 500 600 700 800 900 1000
t (ut) t0A1 t0A2 t0A3 t0A4 t0A5
(I0A1) (I0A2) (I0A3) (I0A4) (I0A5)
27
Mapping methodologyIn a pipelined system with DII=1
the used resource set is maximumthe execution time of each AsGi on the pipeline is minimumA lower bound for the execution time of AsGi is given by the lowest execution time of the longest path of AsGi:
LpAsi is such a lower bound, expressed
in number of clock cycles
28
Mapping methodology
Maximum value allowed for DII, compatible with all the RTCAs
iRTCAs:
LpAsi Tck gives the minimal
execution time for AsGi
The deadline associated to AsGi is i.
29
Mapping methodologyMaximum value allowed for DII,
compatible with all the RTCAsiRTCAs
(continued):
The request for the aperiodic task can be sensed immediately after the pipeline start, the aperiodic task will begin to be executed DIITck seconds after the request: at the next start of the pipeline.
30
Mapping methodologyMaximum value allowed for DII, compatible
with all the RTCAsiRTCAs (continued):
A necessary condition to match all the RTCAs
iRTCAs is that the lower bound of the execution time of each asynchronous task must be smaller than the associated deadline diminished by the DII, i.e.
i DII Tck + LpAsiTck , i = 1, 2, ..., m
31
Mapping methodology
Combining previous relations with a congruence condition between the period of the synchronous task (t) and the clock period (Tck), we obtain the set DIIp wich contains all the admissible DII values.
32
Mapping methodology
Steps of the Mapping methodology:
A set of allowed values of DII is determined Sufficient HW resource set UR0 is
determined At the end of optimization process the
number of used resources ur could be less than ur0 if mutually exclusive nodes are contained in the graph
33
Mapping methodologySteps of the Mapping methodology (continued):
An initial feasible mapping M0 is determined; SL0 is the last time step needed to execute P by using M0.
Starting from M0, we use the Simulated Annealing algorithm to solve the minimization problem
)()()(min)'( 321 MCMCMCMCMCM
34
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
35
Searching space In order to represent a mapping function
M we adopt the formalism based on the Allocation Tables t(M)
t(M) is a table with ur horizontal lines and DII vertical sectors Osi with i=1,2,...,DII
Each Osi contains time steps Si+kDII (k=0, 1, 2, ...) which will be overlapped during the execution of P
36
Searching space Each node is assigned to a cell of
t(M), i.e. it is associated to an HW resource and to a time step.
For example, we consider the 23-node graph AsG1
37
Searching space
A A A A A A A A
A A A A A A A
C C C C C C C C
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23
AsG1
38
Searching space For DII=3, a possible mapping M is
described through the following t(M) OS1 OS2 OS3
S1 S4 S7 S2 S5 S8 S3 S6 S9
A1 n1 n6 n17
A2 n2 n7 n18
A3 n3 n8 n21
A4 n4 n19 n22
A5 n5 n20 n23
C1 n15 n9 n12
C2 n16 n10 n13
C3 n11 n14
39
Searching spaceAn allocation table t(M) must respect both
1. Causality condition
And the
2. Overlapping condition
40
Searching space We define the Ω searching space
over which minimization of C(M) must be performed.
Ω is the space containing all the feasible allocation tables:={t(M) | t(M) is a feasible mapping};
is not feasible.
MtMt
41
Searching space We can write the minimization
problem in terms of the cost associated to the mapping M represented by the allocation table:
)]([min)]'([ )(
MtCMtCMt
42
Searching space We solve the problem by using a
Simulated Annealing (SA) algorithm SA requires the generation of a
sequence of points belonging to the searching space; each point of the sequence must be close, according to a given measure criterion, to its predecessor and to its successor.
43
Searching space As consists of allocation tables, we
have to generate a sequence of allocation tables
t(Mi)Neigh[t(Mi-1)]
being Neigh[t(M)] the set of the allocation tables adjacent to t(M) according to some adjacency criteria
44
Searching spaceSearching space connection:
Theorem 2. The searching space is connected adopting the adjacency conditions.
45
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
46
Optimization by RT-PSA Algorithm
We start from a feasible allocation table t(M0)
We entrust in the optimization algorithm to find the wanted mapping M
47
Optimization by RT-PSA Algorithm
We iterate over all the allowed values of DII
The final result of the whole optimization process will be the allocation table characterized by minimal cost.
48
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
49
Results
In order to illustrate the results achievable through the presented RT-PSA algorithm, we consider the following graphs
50
Results
(1,A), (2,B), (3,B), (4,A), (5,B),
(6,C), (7,A), (8,C), (9,E), (10,A),
(11,C), (12,E), (13,B), (14,C), (15,E),
(16,A), (17,B), (18,E), (19,C), (20,A),
(21,C), (22,E), (23,A), (24,B), (25,C),
(26,B), (27,B), (28,E), (29,A), (30,B),
(31,E), (32,C), (33,B), (34,A), (35,C),
(36,E), (37,B), (38,A), (39,E), (40,A),
(41,B), (42,A), (43,C), (44,B), (45,E),
(46,E), (47,B), (48,B), (49,A), (50,C).
SyG
51
Results
A A A A A A A A
A A A A A A A
C C C C C C C C
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23
AsG1
52
Results
14 A
10
B 15
J 17 18
D A 11
6
A 1
D
B
4
2 A
J
A 16
B
J
D
A A B
J
19
20
21
22
23 24
B 12 13 J
8 D 7
D
A
5
B 9
B 3
25
AsG2
53
Results
54
Results We have
N = NSy + NAs1 + NAs
2 + NAs3 = 50 + 23 + 25 + 28 = 126
r1 = A, r2 = B, r3 = C, r4 = E
The execution times and resources areas are T(A) = 10ut, Ar(A) = 10usT(B) = 20ut, Ar(B) = 10usT(C) = 30ut, Ar(C) = 13usT(E) = 40ut, Ar(E) = 15usTr = 5ut, Ar(mr) = 1us
55
Results The input data interarrival period is t =
150ut We fix the pipeline clock cycle Tck = 50ut RTC are
RTCAs1 = 300ut
RTCAs2 = 250ut
RTCAs3 = 350ut.
The set of DII possible values is DIIp = {1, 3}
56
Results Results for DII = 3
DII = 3 Cost
Function ur LSy
Fulfilled RTC
Starting 2667.942692 37 12 1
Final 3.999681 29 20 3
57
Results Results for DII = 1
DII = 1 Cost Function ur LSy Fulfilled
RTC Starting 9.554314 104 10 3
Final 7.799348 79 18 3
58
Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions
59
Conclusions
We presented an algorithm to optimize the mapping, into a dedicated pipeline system, of a periodic task PSy and m mutually exclusive aperiodic tasks PAs
1, PAs2
, … PAs
m subjected to real time (RT) constraints
60
Conclusions The algorithm, while searching for a mapping
which satisfies all RT constraints of the aperiodic tasks, tries to minimize the number of HW resources needed to implement the system as well the length of the schedule.
The mapping optimization is formulated as a minimization problem that has been solved through the Simulated Annealing algorithm.
61
Conclusions Mappings are represented through
allocation tables. The searching space, as well
adjacency criteria on it and a cost function evaluating the quality of a mapping have been defined.
We demonstrated that the searching space containing all the feasible mappings is connected.