nantes, francertns2011.irccyn.ec-nantes.fr/files/jrwrtc2011.pdf · 2012-03-26 · 6 jrwrtc 2011...

44
France Section PÔLE SYSTÈMES EMBARQUÉS HAUTES PERFORMANCES GDR ASR

Upload: others

Post on 29-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

France Section

PÔLE SYSTÈMESEMBARQUÉSHAUTES PERFORMANCESGD

R AS

R

Proceedings

5th Junior Researcher Workshopon Real-Time Computing

JRWRTC 2011

?

29-30 September 2011Nantes, France

Contents

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Workshop Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Papers 9Gang Fixed Priority Scheduling of Periodic Moldable Real-Time Tasks

Vandy Berten, Pierre Courbin and Joel Goossens . . . . . . . . . . . . . . . . . . . . . . 9Signal Path Scheduling for Reconfigurable SDR RF Hardware

Sami Kiminki and Vesa Hirvisalo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Resource Management in Multicore Automotive Embedded Systems

Sylvain Cotard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Towards a Weakly-Hard Approach for Real-Time Simulation

Abir Ben Khale, Mongi Ben Gaid and Daniel Simon . . . . . . . . . . . . . . . . . . . . 21Preemptive Multiprocessor Real-Time Scheduling with Exact Preemption Cost

Falou Ndoye and Yves Sorel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Performance Analysis for Segment Stretch Transformation of Parallel Real-time Tasks

Manar Qamhieh, Frederic Fauberteau and Serge Midonnet . . . . . . . . . . . . . . . . . 29Towards removing tail preemptions to save energy

Cristian Maxim, Liliana Cucu-Grosjean and Olivier Zendra . . . . . . . . . . . . . . . . 33Improved Feasibility Regions: a Probabilistic Approach for Real-Time Systems

Luca Santinelli, Liliana Cucu-Grosjean and Laurent George . . . . . . . . . . . . . . . . 37

Authors index 41

5

6 JRWRTC 2011

Foreword

This volume contains the papers presented in the 5th Junior Researcher Workshop on Real-TimeComputing (JRWRTC 2011), held in conjunction with the 19th International Conference on Real Timeand Network Systems (RTNS 2011) on September 29-30th in Nantes, France.

As with the former editions, the goal of the workshop is to bring together junior researchers, mostlyPhD students and post-docs, who work on real-time issues so to have a forum where to present anddiscuss new ideas, to review current trends and to introduce new research directions in this area.Through short presentations and posters, the participants will be able to discuss their current workwith all conference attendees.

This year we received eight submissions, each of them were reviewed by four members of the pro-gram committee. The papers encompass various themes, from priority scheduling, energy saving andmultimode-multicore analysis tackling with specification modeling and verification of real-time systems.The committee decided to accept all of them.

I want to congratulate authors for the quality of their submissions to JRWRTC 2011. The programcommittee members have also made a great job in a limited amount of time. For this and the qualityof the reviews, I would definitely recommend the work of any of them.

September 2011Luca Santinelli

JRWRTC 2011 7

Workshop Organization

Program chair

Luca Santinelli, INRIA Nancy-Grand Est, France

Program Committee

Sebastian Altmeyer, U. of Saarlandes, GermanyMoris Behnam, U. of Porto, Portugal

Vandy Berten, ULB, BelgiumRoman Bourgarde, IRIT, France

Bjorn B. Brandenburg, U. of North Carolina, USAFabio Checconi, IBM T.J. Watson Research Center, USA

Frederic Fauberteau, Universit Paris-Est Marne-la-Valle, FranceChristian Fotsing, LISI Poitiers, France

Damien Hardy, U of Cyprus, CyprusKai Huang, ETHZ, SwitzerlandLeonidas Kosmidis, BSC, Spain

Kai Lampka, ETHZ, SwitzerlandXiaoting Li, IRIT, France

Mauro Marinoni, SSSA, ItalyMohamed Marouf, INRIA Rocquencourt, FranceDorin Maxim, INRIA Nancy-Grand Est, France

Patrick Meumeu Yomsi, INRIA Nancy-Grand Est, FranceAurelien Monot, INRIA Nancy-Grand Est, France

Vincent Nelis, ISEP/IPP, PortugalMarco Paolieri, BSC, Spain

Chuan-Yue Yang, National Taiwan U., Taiwan

Special Thanks

Liliana Cucu-GrosjeanSebastien FaucouLaurent George

8 JRWRTC 2011

Gang Fixed Priority Scheduling of Periodic Moldable Real-Time Tasks

Vandy BERTENULB (Brussels, Belgium)

Pierre COURBINECE (Paris, France)

Joel GOOSSENSULB (Brussels, Belgium)

1 Introduction

In the past decade, the development of multicore andmultiprocessor platforms attracted a lot of attention inthe real-time community. At first, researchers consideredseveral ways to take advantage of this kind of platform toprocess sequential tasks. Over the past few years, peoplestarted considering parallel tasks to take even better profitof this new architecture, or to save energy.

There are several families of parallel tasks. Oneof them is the “Thread” model, where threads can run(rather) independently [4]. Another one is the “Gang”family, requiring that all threads start and stop using theprocessors synchronously. Tasks can then be seen asrectangles, where the width represent the time, and theheight the number of processors. As presented in [2], theGang task family can be split in three sub-families: rigidtasks [2, 3], Malleable [1] and Moldable. In the latter,considered in this paper, the scheduler chooses the num-ber of processors for each job1, and this number does notvary with time for a given job (but may be different fortwo jobs of the same task), we propose algorithms andsufficient schedulability conditions in this research. Asfar as we know, this is the first paper studying moldabletasks for real-time systems.

2 Model

In this paper, we consider a set of m identical CPUson which we will schedule a set of n (strictly periodic)preemptive parallel tasks τ = {τ1, . . . , τn}. A task τiis a tuple (Oi, Ci, Di, Ti), where: Oi ∈ R+ is the ar-rival time of the first instance of τi; Ci is a functionCi() : [1, . . . ,m] → R+, where Ci(v) is the time thatan instance of τi needs on v processors; Di ∈ R+ is therelative deadline (we consider constraint deadlines, i.e.,Di ≤ Ti); Ti ∈ N is the (strict) period.

We consider that (1) adding CPUs does not increasethe execution time: v < w ⇒ Ci(v) ≥ Ci(w), and(2) there is a parallelization cost, i.e., the area of a taskincreases with the parallelism: v < w ⇒ Ci(v) × v ≤Ci(w) × w. The latter says that the configuration usingthe smaller amount of CPU is the sequential one (v = 1).

As we consider moldable tasks, the scheduler has tochoose, for each job of τi, its height (the number ofCPUs). The height is then fixed up to the end of the job.

The jth job (or task instance) of task τi, J ji , is a setof values (rji , v

ji , e

ji , d

ji ) (or (rji ,∅,∅, d

ji ) if the job has

1Task instance.

not started yet), where rjidef= Oi + (j − 1) × Ti is the

release time, vji is the number of processors chosen bythe scheduler (or the height of the job), eji

def= Ci(v

ji ) is

the execution time, and djidef= rji + Di is the (absolute)

deadline.

2.1 Trivial casesGiven this model, there are a few cases that are equiv-

alent to simpler models, for which there are alreadyknown results. For instance, as using one processor isthe configuration using the smaller area, we have:

i

Ci(1)

Ti> m⇒ infeasible

We can also have that the system is schedulable if allthe tasks use the m processors:

i

Ci(m)×mTi

≤ m ≡∑

i

Ci(m)

Ti≤ 1⇒ feasible

In this case, we always give all the m CPUs to all thejobs (i.e., only one job is running at any time), and theschedule is then equivalent to the uniprocessor scheduleof the set (Oi, Ci(m), Di, Ti), which can be scheduledby EDF if and only if the above condition is valid.

A third simple case is when the system is schedulableif all the tasks use a single processor.

∀i : Ci(1)Ti

≤ m ∧ Ci(1) < Di ⇒ feasible

In this case, we can use any of the sequential multi-processor optimal techniques (PFair, LLREF. . . ).

2.2 Harmonic case and ComplexityIn the case of harmonic periods with RMS priority

policies (i.e., Ti < Tj ⇒ Tj mod Ti = 0 ∀i > j andOi = 0 ∀i), we can prove (but let the proof for a longerpaper) that there is no benefit in changing the height ofjobs inside a task. In other words, the scheduler has tochoose one number for each task, and all jobs of this taskwill use this number of processors.

Furthermore, we believe that we can transform anysuch system in an equivalent frame-based system, i.e.,where all task share the same period. We also believethat it will be easy to prove that computing the height ofeach task in this frame-based problem is NP-hard, whichwould imply, by reduction, that computing optimal tasksheight in our model is also an NP-hard problem.

JRWRTC 2011 9

3 Online algorithm “in the dark”

In the following, we consider Fixed Task Priority as-signment, and without loss of generality: i < j → thepriority of τi is larger than the priority of τj .

Basically, apart from the traditional decision any real-time schedule must take, a scheduler for moldable tasksmust also chose the height of all jobs.

Algorithm 1 works as follows. When the scheduler iscalled (i.e., when a job arrives or finishes), it considers allthe active jobs. For those who have been already startedpreviously, their height is already chosen, and there isnothing new compared to a scheduler for rigid tasks. Forthose who have not started yet, we compute if, given thecurrent number of available processors, there is a possi-bility to finish by the deadline. If yes, we choose the min-imal height satisfying this constraint. Otherwise, we donot start the job now, in the hope that later, enough pro-cessors will be freed allowing to finish the job on time.

Algorithm 1: Basic “in the dark”Data: Current time tavail← m;foreach active job J ji (ordered by priority) do

if J ji started already with vji CPUs thenif vji ≤ avail then

avail← avail − vji ; Resume J ji ;

elseif ∃w : t+ Ci(w) ≤ dji then

v ← minw : t+ Ci(w) ≤ dji ;if v ≤ avail then

vji ← v;avail← avail − vji ; Start J ji ;

3.1 PredictabilityWhen there is no easy known way to evaluate the

schedulability of a task system — such as the computa-tion of a close formula —, a common way to proceed is tosimulate the system using the worst case execution time(WCET) of each task, during a period of time which isshown to be sufficient (see for instance [2]). But this re-quires the scheduling algorithm to be predictable, whichmeans that if no task misses its deadline when each taskuses its WCET, then no task will ever miss its deadlinewhen some tasks run for less than their WCET. How-ever, Algorithm 1 does not have this property. Let usconsider a simple example with 2 processors and 3 tasks,where τ1 = (0, (3, 2), 3, 3), τ2 = (0, (8, 4.5), 7.5, 7.5)and τ3 = (0, (2, 1.1), 2, 2). Figure 1 shows that this al-gorithm is not predictable.

3.2 Making the system predictableIn order to use a schedulability test as the one pre-

sented in Section 3.1, we have to make the system pre-dictable. We can apply some of the techniques presented

31 2

0 d3d1 d2

31 2 3

0 d3d1 d2

Figure 1. Predictability. Left: τ1 uses itsWCET, no deadline miss. Right: τ1 usesless than its WCET, deadline miss for τ3.

in [2], where we slightly modified the scheduling algo-rithm, such as Idling scheduler, Limited Gang scheduler,or Limited Slack Reclaiming. We let the details for futurework.

3.3 PeriodicityUsing a slightly modified version of the proof pre-

sented in [2], we can show that Algorithm 1 providesa periodic schedule, as long as the choice of job heightsis deterministic and memoryless. Furthermore, we canshow that the schedule of τ1, ...τi is periodic of Pi

def=

lcm{T1, . . . , Ti} from instant Si, inductively defined byS1

def= O1, Si

def= max{Oi, Oi + dSi−1−Oi

Tie · Ti}.

4 Clairvoyant offline algorithm

4.1 Processor availability profileLet Fi(t) be the (prevision of the) number of proces-

sors available at time t, once tasks τ1, . . . , τi have beenscheduled. It represents the availability profile of theplatform for the tasks τi+1, . . . , τn. By definition, wehave F0(t)

def= m ∀t.

Let αi(t1, t2, w) be the duration where at least w pro-cessors are available in the profile Fi(), on the inter-val [t1, t2], i.e., αi(t1, t2, w)

def=∫ t2t1δFi(x)≥wdx, where

δconddef= 1 if cond is true, 0 otherwise.

Let Ei(t, w) the (worst) completion-time of a job ofτi using w CPUs and arriving at time t (assuming a workconserving schedule).

By definition of the work conserving schedule:

αi−1(t, Ei(t, w), w) = Ci(w) (1)

Once Fi−1 is known, Ei(t, w) can be computed itera-tively, as being the result of the following iterative equa-tion, where Ei(t, w) = e∞:

e0def= Ci(w)

ek+1def= Ci(w) + (ek − αi−1(t, ek, w))

Note that as soon as ek = Ei(t, w) for some k,α(t, ek, w) = Ci(w) (cf Equation (1)), and then the it-eration reaches a steady state. The iteration stops whenek = ek−1 or ek ≥ di(t), where di(t) is the deadline ofthe job of τi starting at time t, meaning that the job willmiss its deadline if it is started on w processors at time t.

4.2 AlgorithmThe aim of Algorithm 2 is to compute the utiliza-

tion profile of the system during the feasibility interval

2

10 JRWRTC 2011

Algorithm 2: Clairvoyant offline algorithm

F0(t)← m∀t;foreach task τi (ordered by priority) do

Fi ← Fi−1;foreach job J ji such that rji < Sn + P do

if ∃w : Ei(rji , w) ≤ dji then

vji ← minw : Ei(rji , w) ≤ dji ;

foreacht ∈ [rji , Ei(r

ji , v

ji )] : Fi(t) ≥ vji do

Fi(t)← Fi(t)− vji ;

elseexit(FAIL);

exit(SCHEDULABLE);

[0, Sn+P ) (see Section 3.3). It does not compute the fullschedule (only the CPU utilization profile), but it says ifthe system is schedulable, or if it failed in trying to buildthe profile (which does NOT mean that the system is notfeasible). This algorithm is performed offline.

In order to build this profile, it goes through the ntasks (ordered by priority), and then, for each task, con-siders each job. For each job J ji , it considers variousheights, allowing to finish on time. Ei(r

ji , w) give the

completion time of the job J ji using w CPUs. If we can-not manage to find such a height, it means that the al-gorithm failed in placing this job, and returns then FAIL(which means that the system is not schedulable with thisalgorithm, but does not mean that the system is not fea-sible).

If there are correct heights, we chose to take the mini-mum one, as a smaller height implies a smaller area. Butwe could take something else, as long asEi(r

ji , w) ≤ dji .

In this basic version, this algorithm is still not pre-dictable. To make it predictable, we can apply the sametechniques as above.

Notice that F (t) is a step function. Each job adds atmost two points. We then have at most 2×∑i

⌈Sn+PTi

points in F . However, in many practical cases, periodsare not all prime to each other and P is then not veryhigh.

4.3 Size patternThe (partial) profile F1 (only task τ1) is strictly pe-

riodic, with a period of P1 = T1 starting at S1 = O1,which means that all the jobs of τ1 will run using thesame number of processors v11 . There is then no needto remember all the v1’s, but just that the pattern willbe {w}∞ for some w, where {...}∞ means that the se-quence is repeated ad vitam. More generally, the sizepattern has the structure

w′1, w′2, . . . , w

′h′i, {w1, w2, . . . , whi

}∞

where hidef=

PiTi

, and h′idef=

Si −OiTi

, which are both

integer by definition of Pi and Si.

5 Semi-clairvoyant online algorithm

In the previous sections we presented a basic schedul-ing algorithm (Algorithm 1) which also chooses theheight of all jobs. This decision is taken without con-sidering the past or the future of the scheduling. Ourclairvoyant algorithm (Algorithm 2) builds the profile ofthe scheduling and provides two results: a pattern givingthe height of each job, and a sufficient schedulability test.

In this section, we aim to adapt these algorithmsin order to improve the decision taken by our basicalgorithm for the height of a job, without building thecomplete profile as Algorithm 2.

5.1 Reduce the number of jobs studied in the profileWe need to find a smaller interval on which com-

pute the profile of the scheduler in order to take thesame decision as Algorithm 2 would take. Let J `k bethe new activated job (current time t = r`k) for whichwe want to define the height v`k. To take the same de-cision as Algorithm 2 we need to know the profile ofall higher-priority tasks on the interval [r`k; d

`k]. The

main idea is, for all higher-priority tasks, to find thelast job activated before the end of the studied intervaland compute the profile of all jobs until this one. Aspresented in Figure 2, to compute the profile of higherpriority tasks in the studied interval [r`k, d

`k], we first

need to compute the profile of all jobs of task τk−1 ac-tivated in the interval [r`k, d

`k]. The last job is activated

at time rk−1 = Ok−1 +(⌈

d`k−Ok−1

Tk−1

⌉− 1)× Tk−1.

Since all jobs have their deadline in the interval (becauserk−1+Dk−1 ≤ d`k), the studied interval remains [r`k, d

`k].

Then we have to compute the profile of all jobs of taskτk−2 activated in the interval [r`k, d

`k]. In Figure 2, we

find a job with an activation in the interval and a deadlineafter the interval (because rk−2 + Dk−2 ≥ d`k). To de-fine the profile of this job, we need to extend to intervalstudied by higher-priority tasks. The new studied inter-val is then [r`k, rk−2 + Dk−2]. We finally compute theprofile of all jobs of task τk−3 activated in the interval[r`k, rk−2 +Dk−2], and so on.

Let χ`k(τi) be the function giving the end of the inter-val to study for task τi in order to define the height v`kof job J `k . Function χ`k() is defined for all tasks withhigher-priority than τk and is iteratively computed using:

χ`k(τk) = d`k

χ`k(τi) =

{max(χ`k(τi+1), α

`k(τi)), if Oi ≤ χ`k(τi+1)

χ`k(τi+1), otherwise(2)

with α`k(τi) = Oi +(⌈

χ`k(τi+1)−Oi

Ti

⌉− 1)× Ti +Di.

Note that we have to compute all jobs of τi until the

J thi,k,` job with Ji,k,` =(⌈

χ`k(τi)−Di−Oi

Ti

⌉+ 1)+

.

Maximum length of the studied interval. As it maybe seen on Figure 2, the worst length occurs when, ateach iteration k, then end of the interval for τk arrives just

3

JRWRTC 2011 11

τkJ ℓk

rℓk

dℓk

= χℓk(τk)

τk−1rk−1

τk−2rk−2

τk−3rk−3

χℓk(τk−1)

χℓk(τk−2)

χℓk(τk−3)

χℓk()

Figure 2. Last jobs of higher-priority tasksneeded in order to take a decision for J `k

before the arrival of a job of τk−1, extending the intervalof Dk−1. We can easily show (we let the formal prooffor future work) that the maximum length of a studiedinterval for a task τj in order to determine the profile of alower-priority job of task τk is interval is then

∑ji=kDi.

Maximum number of studied jobs. The maximumnumber of jobs of a task τj studied in order to deter-mine the profile of a lower-priority job of task τk isgiven by

⌈∑ji=kDi/Tj

⌉. The maximum number of

jobs of all tasks studied in order to determine the pro-file of a lower-priority job of task τk is then given by∑kj=1

⌈∑ji=kDi/Tj

⌉.

5.2 Putting the pieces togetherAccording to Equation (2), we can compute the partial

profile needed to choose v`k. Function setNbProcessorsgives the procedure based on Algorithm 2.

Function setNbProcessors(J `k )

Input: J `k , the job for which we choose v`kOutput: true if it succeeded, false otherwisefor i = 1 to k do

Fi ← Fi−1;foreach job J ji s.t. rji ≤ χ`k(τi) and vji = ∅ do

if ∃w : Ei(rji , w) ≤ dji then

vji ← minw : Ei(rji , w) ≤ dji ;

foreacht ∈ [rji , Ei(r

ji , v

ji )] : Fi(t) ≥ vji do

Fi(t)← Fi(t)− vji ;

elsereturn false ;

return true;

Finally, Algorithm 3 extends Algorithm 1 using amore accurate decision for the job heights. Note that aschedulability test for this scheduler is given by Algo-rithm 2.

Algorithm 3: Improved SchedulerInput: Current time tNote: Global functions Fi()∀i, initialized toFi(t) = m∀i, t before the system startavail← m;foreach active job J ji (ordered by priority) do

if vji is already defined (6= ∅) thenif vji ≤ avail then

avail← avail − vji ; Resume J ji ;

elseif setNbProcessors(J ji ) then

if vji ≤ avail thenavail← avail − vji ; Start J ji ;

elseexit(FAIL);

6 Open questions and future work

In this short paper, we present the very first algorithm(to the best of our knowledge) for moldable real-timeperiodic parallel tasks, i.e., where the scheduler itselfchoses the number of processors of each job. We presentthree versions of this algorithm. The first one is verynaive, performed online, and does not take into accountany information about the future. The second one is clair-voyant, and builds offline the full utilization profile ofthe processors. The third one is an hybrid online version,building the utilization profile of the next few arrivals.

We plan to extend this very promising work in sev-eral directions. First, we plan to conduct some experi-ments, allowing to evaluate how efficient our algorithmperforms, and to have a better insight of the way mold-able tasks behave. Moreover, we plan (and actually al-ready started) to better study the theoretical behavior ofour algorithms: better study the complexity of our algo-rithms, and, more interesting, analyze their competitiveratio.

We also plan to look for other necessary schedulabil-ity tests, and to extend our task model, for instance tosporadic tasks.

References

[1] S. Collette, L. Cucu, and J. Goossens. Integrating job par-allelism in real-time scheduling theory. Information Pro-cessing Letters, 106(5):180–187, May 2008. Extendedversion.

[2] J. Goossens and V. Berten. Gang FTP scheduling of peri-odic and parallel rigid real-time tasks. In Real-Time andNetwork Systems, pages 189–196, November 2010.

[3] S. Kato and Y. Ishikawa. Gang EDF scheduling of paralleltask systems. In 30th IEEE Real-Time Systems Symposium,pages 459–468. IEEE Computer Society, 2009.

[4] K. Lakshmanan, S. Kato, and R. Rajkumar. Schedul-ing parallel real-time tasks on multi-core processors. In31st IEEE Real-Time Systems Symposium, pages 259–268,2010.

4

12 JRWRTC 2011

Signal Path Scheduling for Reconfigurable SDR RF Hardware

Sami KiminkiAalto University

Computer Science and EngineeringP.O.Box 15400, FI-00076 Aalto

[email protected]

Vesa HirvisaloAalto University

Computer Science and EngineeringP.O.Box 15400, FI-00076 Aalto

[email protected]

Abstract

Software-defined radios (SDR) have received great at-tention recently. We believe that the SDR architecture willbecome an important design approach for multi-standardradio systems. We study the scheduling of analog re-sources in reconfigurable parallel SDR systems. We for-mulate the problem as the Signal Path Scheduling Prob-lem: given a graph of heterogeneous resources and a setof concurrent RF operations that are produced on-line, aset of time-dependent paths are to be found that can beused to execute the operations. We also describe a sched-uler for the problem and demonstrate its feasibility.

1 Introduction

Modern wireless devices are required to support multi-ple standards, such as LTE [5] and WLAN [2]. The trendis for more complex radio systems, as advanced tech-niques, such as MIMO and multiband aggregation, are be-ing included in the standards, and especially, in the adventof cognitive radio standards [1]. One approach to preventunmanageable growth of RF hardware is to use software-defined radios (SDR). We study an important subproblemof the SDR control, namely, scheduling of hardware re-sources for radio protocols.

Conventionally, dedicated chips are used for each pro-tocol family. In this approach, the increased complex-ity requirements and the number of protocols to supporttranslates directly to a greater number of hardware blocks.Because analog RF processing elements do not scale downlike the digital elements, this creates design issues for thesmall portable devices such as smart phones. Yet, in suchdevices, all radio systems almost never operate simulta-neously at peak capacity (bottlenecked by internal buses).This means suboptimal utilization of the hardware.

SDR architecture (e.g., [3]) is an approach to reuse thehardware between protocols. The RF processing require-ments for the major protocols are more or less the same,

We thank Aarno Parssinen and Antti Immonen for the analog plat-form architecture proposal used in the experiment (Fig. 4).

except for the exact frequency bands. Instead of widerange SDRs, we have studied coarse-grain reconfigurableRF platforms (RF-CGRA) [6], which use conventional RFelements in a switch matrix.

Dynamic reuse of hardware for multiple concurrentlyactive radio protocols presents a control problem. This isarguably the main technical obstacle for commercial use.Radio operations, such as frame transmission, require µs-level precision in timing and ms-level response times fordecisions. Further, radio protocols are not designed for re-source sharing, and therefore, they do not cope well withresource contention. However, time-domain in-device in-terference avoidance mechanisms that are currently dis-cussed in the standardization committees should providethe required flexibility. When dynamic resource sharingcan be used, the advantages are significant [7].

In this paper, we study the problem of on-line resourceassignment for analog RF processing. We formulate therelated scheduling problem as the Signal Path Schedul-ing Problem (Sec. 2) and propose a solution, the Fixed-Job Path Scheduler (Sec. 3). The scheduler is applica-ble to resource assignment of wide range SDRs and RF-CGRAs. We run an experiment on the latter on three con-current protocols (GSM, WLAN, DVB-H) using resource-constrained and non-constrained platforms (Sec. 4).

Scheduling reconfigurable analog RF processors dif-fers in some important aspects from scheduling reconfig-urable digital processors. First, the analog RF tasks areoften rigid, i.e., their timing cannot be changed. Second,analog signals cannot be buffered, which implies that allthe processing elements in the RF signal path must beallocated simultaneously. Third, some analog elements,e.g., amplifiers and synthesizers, can be simultaneouslyshared by multiple signal paths when the conditions arefavorable. These differences have a profound effect tothe scheduling problem and its solutions. To our knowl-edge, this scheduling problem has not been studied by theacademia.

2 Signal Path Scheduling Problem

The Signal Path Scheduling Problem consists of amodel of the RF processor and the requests for signal

JRWRTC 2011 13

DUMMY-K

DUMMY-1

DUMMY-2...

SWSW M portsN ports ...

.

.

....

.

.

.

Figure 1. An N:M switch with at most Kconcurrent connections may be modeled bytwo full crossbar switches and K dummy re-sources.

paths by the protocols. The task is to find the time-dependent processor configurations that satisfy the re-quests with minimum cost.

The processor is modeled as a set of resources, ports,and switches (see Fig. 5 for an example). The resourcesrepresent logical elements of the processor and the portstheir inputs and outputs. The switches connect dynam-ically the ports of different resources. A subset of re-sources is called border resources, which are the startingor ending points of signal paths. Thus, every signal pathcontains at least one of these. In concrete designs, borderresources usually provide external connectivity.

Switches have two sides, left and right. A port in theleft side can be connected only to a port in the right side.Switches in the model are full crossbars, and thus, thereare no restrictions on the number of concurrent connec-tions. A single port in one side may also be connectedto multiple ports in another. Restricted switch topologiescan be modeled by adding dummy resources and switches.For example, an N:M switch with at most K connectionscan be modeled by an N:K switch, K dummy resources,and a K:M switch as illustrated in Fig. 1.

A signal path consists of a set of resources. A signalpath is complete, when every port of a border resource isconnected to a resource, and recursively, the ports of theseresources are connected. Usually, there are conditions forsignal paths, which are represented by resource-specificcost and shareability functions.

A resource-specific cost for a signal path request is nu-meric in range [0,∞]. The cost is used in resource se-lection. Generally, more specialized resources (e.g., lessdynamic range, narrower bandwidth) have lower cost forbetter schedulability. More specialized resources are alsooften less power-hungry. Resources that cannot be usedin the signal path have infinite cost. We consider onlycontext-insensitive costs in this paper.

Resources may be concurrently shared by multiple sig-nal paths. Shareability is determined by resource-specificBoolean-valued functions. For example, a synthesizer canusually be shared by signal paths using the same carrierfrequency, whereas an amplifier cannot be shared whenthe signal frequencies overlap.

The requests for signal paths are often rigid, whichmeans that they have fixed start and end times. The re-quests arrive on-line. When a signal path is granted, therequesting protocol may use it for its operations. The re-

scheduled in execution

unscheduled

frozen

cancel

release

cancelrevoke

freeze to execution

Figure 2. The state machine representingthe life cycle of a job in the Fixed-Job PathScheduler.

quests may also be denied, which is signalled back. Proto-cols may change behavior in response to denied requests.

3 Scheduler Design

The Fixed-Job Path Scheduler is an optimizing on-linescheduler for rigid jobs. The scheduler uses a multi-stageapproach and either eager or lazy allocation strategy. Theapplication interface of the scheduler contains function-ality for requesting signal path allocations and callbacksfor tracking updates in the allocations. The allocation re-quests are called jobs.

The scheduler contains two main components: job life-cycle management and signal path resolution. The life-cycle of a job follows the state machine of Fig. 2. Thejob is initially UNSCHEDULED. When the signal path isresolved for the job and the associated resources are al-located, the state becomes SCHEDULED. A period beforethe job is due to start, the job is FROZEN. This preventspriority overrides by other applications. This period is re-ferred to as the freeze period. Frozen jobs may be modi-fied by global optimizers, which can rearrange and com-bine signal paths for lower global cost. Finally, when thejob execution begins, the signal path is activated and thejob state is set to IN EXECUTION. After completion, thejob is revoked from the system.

The path resolution algorithm is greedy. Border re-sources are the starting point of the search. The lowest-cost suitable resource is always tried first. A suitable re-source has finite cost for the the request and is unallocatedfor the duration of the job or shareable with its existingallocations. Then, suitable resources are sought for ports,recursively with possible backtracking, until a signal pathis found or all alternatives have been exhausted. If alter-natives are not found, lower-priority jobs in SCHEDULEDstate may be rescheduled or canceled.

To speed up the path resolution, a precomputed deci-sion tree and dynamic programming are used. The deci-sion tree is also partitioned based on job classes, whichspeeds up resource selection. For example, resources thatcan be used only for 470–900-MHz RF operations neednot be considered for other jobs. Dynamic programmingstores the costs of resolved sub-paths, as it is likely that thesame sub-paths are considered in case of backtracking.

For each resource, an allocation table is assigned. The

14 JRWRTC 2011

job1

job3

job2

job5

job6

job4 t{j1} {} {j2} {j2, j3} {j3} {} {}{j4, j5, j6}

Figure 3. A resource allocation table is im-plemented using a map of points in time toa set of jobs, ordered by time.

allocation table is implemented as a map of points in timeto a sets of jobs in SCHEDULED, FROZEN and IN EXE-CUTION states. A map entry represents an allocation to aset of jobs from the key (inclusive) to the following key(exclusive). See Fig. 3 for illustration.

The eager allocation strategy resolves signal paths andallocates the associated resources immediately at job ar-rival. The lazy strategy is the opposite: allocation is post-poned to the latest possible point, i.e., the freezing.

Frozen jobs present a global optimization window,which has the size of the freeze period. The window canbe exploited by optimizer passes. We describe two: local-ization enhancer and job grouping pass. The localizationenhancer looks into consecutive jobs of a protocol. If twoconsecutive jobs use different resources, the enhancer triesto rearrange the signal paths to use the same resources.The job grouping pass attempts to promote simultaneousresource sharing of different signal paths. The path resolu-tion algorithm is greedy, and therefore, it may not alwaysbe able to find signal paths with shared resources.

4 Experiment

The experiment explores scheduling using a realisticRF-CGRA multiprocessor design and plausible protocolmodels. The experiment models a scenario of multipleactive radio protocols on a mobile internet device.

The receiver (RX) design for the test platform is illus-trated in Fig. 4. The front-end area (FE) consists of an-tennas and bandpass filters. The RF area consists of low-noise amplifiers (LNA) and mixers. Additionally, thereare analog baseband units (BB) and synthesizers (SX).The transmitter (TX) incorporates a similar architecturebut with the opposite signal direction.

The design comprises three parallel receiver pipes forDVB-H, four-band GSM and WLAN 802.11g. The RF-pipes are merged such that the synthesizer and the receiverbaseband elements may be connected freely to any pipe.Additionally, the GSM receiver may share an LNA simul-taneously with either the DVB-H or the WLAN receiverwhen the signal conditions are favorable. The operatingranges of the shared resources are well within reportedcapabilities [3]. Concurrent LNA sharing is discussed in,e.g., [4].

Fig. 5 presents the model for the concrete RF processordesign. The front-end blocks RX-FE-DVB, RX-FE-GSM,

Figure 4. The receiver part of the RF-CGRAprocessor design for experimentation. AnRF pipe is formed dynamically by activatingswitches.

SW

RX-FE-DVB

RX-FE-GSM

RX-FE-WLAN

RX-LNA-WLAN

RX-LNA-GSM-HB

RX-LNA-GSM-LB

RX-LNA-DVBSW

RX-BB3

RX-BB2

RX-BB1

SWTX-RF-GSM-LB

TX-RF-GSM-HB

SWTX-BB-GSM

TX-BB-WLAN

RX-MIXER-DVB

RX-MIXER-GSM

RX-MIXER-WLAN

TX-FE-GSM

TX-FE-WLAN TX-RF-WLAN

SWSW

SX1

SX2

SX3

Borderresources

Figure 5. The resource graph for the exper-iment. The number of RX-BB and SX ele-ments is varied.

TX-FE-GSM, RX-FE-WLAN, and TX-FE-WLAN havebeen chosen as the border resources, as one of them mustbelong to any signal path.

For the receiver, cost functions are set up such that aspecific protocol is required by the FE and MIXER ele-ments. The LNA-GSM elements may be allocated only toGSM. LNA-DVB may be allocated by DVB or GSM andLNA-WLAN to WLAN or GSM, respectively. LNA ele-ments also require transmission frequencies within limits(see Fig. 4). For the transmitter, cost functions are set upsimilarly. Amplifier and mixer elements of the transmit-ters are combined as logical RF elements and they maynot be shared between protocols. Additionally, for eachelement except SXs, a signal direction constraint (i.e., RXor TX) is applied.

The sharing rules are set up for LNA and SX resourcesas follows. LNA elements are always shareable (we as-sume favorable signal conditions). An SX resource isshareable between jobs if the jobs have a common carrierfrequency.

We consider 5 cases in the experiment. Cases 1–4 dif-fer in using either unconstrained or constrained platformand eager or lazy scheduling strategy. In case 5 we dis-able decision tree partitioning. Table 1 summarizes thecases. In the constrained platform, only two RX-BB and

JRWRTC 2011 15

Table 1. The measurement cases

Case Platform Scheduling Decision treestrategy partitioning

1 unconstrained eager yes2 constrained eager yes3 unconstrained lazy yes4 constrained lazy yes5 unconstrained eager no

two SX elements are used, which inflicts resource con-tention when all three protocols attempt simultaneous op-eration. In the unconstrained platform, three RX-BB andthree SX elements are used.

The workload for all cases is as follows, in priority or-der, highest first:

• GSM: 1800-MHz band, 1 TX, 1 RX, and 1 monitorslot per frame

• DVB-H: one 120-ms frame per second

• WLAN 802.11g: power-save mode, beacon per102.4 ms, 26.0 ms of data traffic + 1.5 ms allocationoverhead per beacon; 0.5 ms allocation granularity.

The processor allocation is per-RF-operation for GSM andDVB-H, as the transmission and reception timings forthese are predictable. For WLAN, PS-Poll processing [2]is assumed. When a beacon frame is received, RX and TXresources are allocated until there is no more data bufferedlocally or at the access point. For each combination therun time was 1 second of simulated time.

Table 2 summarizes the results of the simulation runs.For the unconstrained platforms, eager (case 1) and lazy(case 3) allocation strategies perform equally well and areable to schedule all requests successfully. The main dif-ference is with the maximum size of the allocation table,because the entries in the eager scheduler may live signif-icantly longer.

In the constrained platforms, the eager allocation strat-egy (case 2) performs better than the lazy (case 4). Thisis because the eager scheduler is able to give immediatefeedback on failed signal path allocations on most occa-sions for 802.11g. When the 802.11g protocol fails to al-locate RX resources for data traffic, it does not attemptto allocate TX resources, and thus, a number of job re-quests and cancels are avoided. With the lazy scheduler,this feedback comes later at job freeze, and allocation jobsfor both RX and TX resources are generated.

Decision tree partitioning reduces more than half of theresource evaluations in this experiment, as seen by com-paring cost function evaluation counts in cases 1 and 5.

5 Conclusions

In this paper, we considered the scheduling of SDR RFresources for multiple concurrent protocols. We formu-lated the underlying scheduling problem and proposed a

Table 2. Simulation results

Case 1 Case 2 Case 3 Case 4 Case 5Job requests 1831 1836 1831 1949 1831Executed jobs 1759 1650 1759 1650 1759Rescheduled jobs 0 2 0 1 0Canceled jobs 0 114 0 227 0Path alloc attempts 1831 1839 1761 1983 1831Failed alloc attempts 0 114 0 227 0Cost evals/path alloc 8.02 6.36 7.98 6.18 16.57Alloc table lookups 14681 11697 14053 12263 14681Max alloc table size 62 62 7 8 62GSM RX jobs total 420 420 420 420 420GSM RX failed 0 0 0 0 0GSM TX jobs total 209 209 209 209 209GSM TX failed 0 0 0 0 0DVB-H RX jobs total 60 60 60 60 60DVB-H RX failed 0 0 0 3 0802.11g RX jobs total 540 599 540 599 540802.11g RX failed 0 114 0 114 0802.11g TX jobs total 530 476 530 589 530802.11g TX failed 0 0 0 113 0

concrete solution. We also demonstrated the feasibility ofthe solution by an experiment with realistic reconfigurablehardware design and protocol models.

We believe that software-configurable RF hardware inone form or another is the future for multi-standard radioplatforms. This is underlined by the emerging cognitiveradio standards, which are often the secondary users of thespectrum, e.g., 802.22 in the TV-bands. For the secondaryusers, it is only logical that, in addition to the spectrum,the hardware resources are also shared to avoid needlessredundancy.

References

[1] L. Berlemann and S. Mangold. Cognitive Radio and Dy-namic Spectrum Access. John Wiley & Sons, 2009.

[2] M. S. Gast. 802.11 Wireless Networks: The DefinitiveGuide. O’Reilly, 2nd edition, 2005.

[3] V. Giannini et al. A 2mm2 0.1-to-5GHz SDR receiver in45nm digital CMOS. ISSCC Digest of Technical Papers,pages 408–409, Feb. 2009.

[4] H. Hashemi and A. Hajimiri. Concurrent multiband low-noise amplifiers – theory, design, and applications. IEEETrans. Microw. Theory Tech., 50(1):288–301, Jan. 2002.

[5] H. Holma and A. Toskala, editors. LTE for UMTS —OFDMA and SC-FDMA Based Radio Access. John Wiley& Sons, 2009.

[6] A. Immonen, A. Parssinen, T. Zetterman, M. Talonen,J. Ryynanen, S. Kiminki, and V. Hirvisalo. A reconfigurablemulti-standard radio platform. In 1st International Work-shop on Energy Efficient and Reconfigurable Transceivers,2010.

[7] S. Kiminki, V. Saari, A. Parssinen, V. Hirvisalo, A. Im-monen, J. Ryynanen, and T. Zetterman. Design and per-formance trade-offs in parallelized RF SDR architecture.In Proc. 6th Intl. ICST Conf. on Cognitive Radio OrientedWireless Networks, 2011.

16 JRWRTC 2011

Resource Management in Multicore Automotive Embedded Systems∗

Sylvain CotardLUNAM Université, Université de Nantes and Renault S.A.S.

IRCCyN UMR CNRS 6597 (Institut de recherche en Communications et Cybernétique de Nantes)1, rue de la Noë BP 91101F-44321 Nantes, France

[email protected]

Abstract

AUTOSAR (AUTomotive Open System ARchitecture) isan international development partnership created in 2003between car manufacturers, suppliers and companies spe-cialised in electronics and information technology. It aimsat developing and establishing an open standardised ar-chitecture for E/E (Electrical/Electronic) development.

Nowadays, car manufacturers have to cope with the in-crease of heterogeneous functionalities while ensuring thedependability of time-critical systems. In order to makesure that all critical services will be able to start and fin-ish within constrained time windows, their real-time be-haviours have to be understood and some analysing tech-niques have to be introduced.

In this paper, we propose to analyse the AUTOSARmulticore OS specification from a real-time analysis pointof view.

1. Introduction

AUTOSAR (AUTomotive Open System ARchitecture)[4] is an international development partnership created in2003 between car manufacturers, suppliers and companiesof electronics and information technology. It aims at de-veloping and establishing an open standardised architec-ture for E/E (Electrical/Electronic) systems.

During the past 15 years, the increasing number of ser-vices provided in vehicles caused the evolution of E/Esystems from federated architectures (one function perElectronic Control Unit (ECU)) to integrated architectures(several functions per ECU). The new organisation pro-posed by AUTOSAR illustrate this trend. In order topursue this evolution, multicore ECU architectures (com-posed by two or more cores on the spare die) appear tobe the best trade-off between price, performance and abil-ity to develop time-critical systems. However, we have tocope with new challenges concerning the development of

∗This work has been supported by Renault S.A.S., 1 Avenue du Golf,78280 Guyancourt - France

such systems. For example, car manufacturers need to beable to perform real-time analysis in order to make surethat critical subsystems (e.g. X-by wire) will always exe-cute correctly within their timing requirements.

Our work aims at understanding and mastering the AU-TOSAR multicore OS specification. More precisely, thispaper focuses on the predictability of such system with afocus on the real-time behaviour.

Other works are already available. In [1], it is shownhow AUTOSAR can be analysed with scheduling theorytechniques using the MAST tools. The work deals withdistributed applications but does not take multicore con-cept into consideration. In [7], it is shown that the globalscheduling problem in AUTOSAR multicore OS can bedivided into two independent sub-problems: partitioninga set of runnables, and then building the schedule on eachcore. The authors propose a set of algorithms in order todirectly find a schedulable scheme whereas we are inter-ested here in the analysis of a given configuration. Finally,[6] presents the adequacy of AUTOSAR OS specificationwith real-time scheduling theory in uniprocessor systems.

The outline of the paper is as follows. Section 2 givesan overview of the AUTOSAR multicore OS specifica-tion. Section 3 describes a schedulability analysis for AU-TOSAR multicore OS applications. Finally, section 4 con-cludes the paper.

2. The AUTOSAR multicore OS specification

The AUTOSAR OS specification [2] is based on theOSEK/VDX operating system v2.2.2 [8]. The AUTOSARmulticore OS specification (release 4.0) [3], used in orderto perform the work presented in this section, is derivedfrom the existing AUTOSAR OS specification.

In the AUTOSAR software architecture, the OS (ei-ther uniprocessor or multicore) is mainly responsible forscheduling tasks and ISRs (Interrupt Service Routines)hosted by an ECU.

According to AUTOSAR multicore OS, all cores shalluse the same instruction set and provide access to a sharedmemory. The number of cores that can be controlled by

JRWRTC 2011 17

the AUTOSAR OS shall be configured offline and it isforbidden to either restart cores or insert additional ones.

2.1 Task scheduling approachAUTOSAR OS defines an OS-Application as a collec-

tion of OS entities forming a cohesive functional unit.Mainly, this includes tasks, ISRs, alarms, hooks andschedule tables. All OS objects within the same OS-Application can access each other using dedicated APIs(e.g. it is mandatory to use synchronisation mechanismto handle inter-core communication). In order to complywith the specification, an OS Application is statically as-signed to a core what leads to a partitioned system.

The scheduling strategy defined in AUTOSAR OS ap-plies independently for each individual core. A fixed pri-ority is assigned to each task off-line. When the scheduleris invoked, it checks among all the ready tasks and selectsto one with the highest priority for execution. If more thanone task share the same priority, FIFO is used as a secondcriterion to break ties. It is also possible to choose if thepreemption of a task is allowed or not. This leads to amixed preemptive policy.

2.2 Synchronization strategyIn uniprocessor AUTOSAR systems, shared resources

management is handled by the IPCP protocol (Immedi-ate Priority Ceiling Protocol) [6] what limits the blockingtime due to execution of lower priority tasks. When a taskgets a shared resource, its priority is immediately raisedto the resource ceiling priority. The ceiling priority of theresource must be greater than the base priority of all tasksthat can access this resource so that the scheduling policyenforces mutual exclusion.

For multicore systems, AUTOSAR allows the sharingof resources among cores but IPCP is not efficient in thiscontext. To illustrate the inefficiency of IPCP, let us con-sider the case of a resource shared by two tasks on differ-ent cores. The first one enters its critical section and itspriority is raised to the resource ceiling priority. However,this will not prevent the other task from entering its criticalsection because schedulers on both cores are independent.

In multicore AUTOSAR systems, synchronization isdone using the spinlock mechanism. Spinlock refers toa busy waiting technique that polls a shared lock variableuntil it becomes available. Usually, this technique relieson HW facilities such as test-and-set or compare-&-swapinstructions but it can also be implemented in software us-ing for instance the Dekker algorithm [5] or the Petersonalgorithm [9].

As synchronization based on spinlock has not been de-signed for time critical systems, we can observe deadlockand starvation situations as discussed in section 3.1. Toface these problems, other protocols such as MPCP orMSRP (Multicore extension of Priority Ceiling Protocoland Stack Resource Policy) can be used as proposed in[11].

3. Schedulability Analysis of AUTOSAR mul-ticore OS applications

3.1 Recommendation for using spinlocksLet us consider the case where five tasks are distributed

on two cores as illustrated in Figure 1(a). τ11 , τ1

2 , τ21 , τ2

2 ,share the resource R, whereas τ1

3 never takes R. A taskhas a priority πi (πi > πj denotes τi has a higher prioritythan τj).Scenario leading to a deadlock (Figure 1(b)): On coreone, τ1

1 enters its critical section. Then, it is preemptedby τ1

2 that has a higher priority. During its execution, τ12

tries to enter its critical section. This leads to a deadlocksituation on core 1.Scenario leading to a starvation (Figure 1(c)): On core2, τ2

2 is executing and enters its critical section. Then, τ11

starts its execution on core 1 and tries to enter its criti-cal section. As the resource is locked, τ1

1 enters a busywaiting state. Then, τ1

1 is preempted by τ13 . During that

time, τ22 releases the resource. τ2

1 is scheduled and en-ters its critical section. So, when τ1

1 is scheduled again,it stays in the busy waiting state. Starvation occurs whenthis scheme repeats indefinitely.

As presented in [3], another problematic situation cor-responds to the use of nested spinlocks. In this document,it is even recommended never to use nested spinlocks.

A partial solution could be to disable all interruptionsbefore getting the spinlock. This solution prevents dead-lock but cannot prevent all starvation situations mainly inmulticore architectures composed of more than two cores.First, let us consider the case of two cores. If a task isbusy-waiting to enter a critical section, it will automat-ically win the lock as soon as it is released. Indeed, anyother task of the same core cannot take the lock (interruptsare disabled, and no other task can be scheduled) and notask of the other core will try to get the lock immediatelyafter it has been released (a context switch, which is notinstantaneous, is required before a task on the other corecan try to get the lock) Let us consider now two tasks ontwo cores trying to lock a resource taken by a third task onanother core. In that case, we cannot predict exactly whichof the two waiting tasks will enter its critical section whenthe resource will be freed. Thus, the execution schemecannot guarantee that starvation will not occur. This leadsto a non-deterministic behavior.

3.2 Classical response time analysis (RTA)The purpose of schedulability analysis is to make sure

that no time constraint will be violated during all the en-tire life of the system. In other words, it must be guar-anteed that each task will always complete itself withinits deadline. In order to do that, an off-line analysis hasto be performed during the application design stage. TheRTA technique can be used for that purpose. This tech-nique consists in computing the worst case response timeof each task and comparing this value with its deadline.

More formally, a system is composed of a set of tasks

2

18 JRWRTC 2011

R

Core 2

Core 1

τ 22τ 21

τ 13τ 12τ 11

(a) Tasks distribution

Busy waiting

Core 1

R is taken

τ 11

τ 12

(b) Deadlock Illustration

Busy waiting

Core 1

Core 2

R is taken

τ 11

τ 22

τ 21

τ 13

(c) Starvation Illustration

Figure 1. Example of problematic situation observed using spinlock mechanism

S = {τi}1≤i≤s. For the sake of simplicity, we will con-sider the following assumptions:

• All tasks are periodic, preemptive and activated forthe first time at system startup.

• Tasks on a same core can share resources whose ac-cesses are controlled via the IPCP protocol. Tasksof different cores can share a same resource usingthe spinlock mechanism. For this, as explained insection 3.1, we will consider only two cores to havea deterministic behavior. Moreover, we will assumethat interrupt are disabled (resp: enable) before (resp:after) that the lock is taken (resp: freed). This is il-lustrated by the programming scheme of Figure 2.

GetSpinLock

DisableAllInterrupt

<Critical Section Code>

ReleaseSpinLock

EnableAllInterrupt

Figure 2. Management of mixed resources

• Each task is assigned a fixed priority which is uniquerelative to the core upon which it is executed. Aswe use IPCP synchronization protocol for local re-source, the priority level of a task may vary duringits execution.

Every task will generate an infinite number of jobsτi(q). We define the response time of a job q as the amountof time elapsed between the instant where the job is re-leased for execution and the instant of its completion. Inmulticore environments where tasks share resources, weneed to consider all situations that involve a blocking timein order to compute ri(q). The response time of the qth

job is:

ri(q) = Ci + bli(q) + bri (q) + pi(q, r(q)) (1)

where :

• Ci is the worst-case execution time (WCET) of τi.

• bli(q) is the local blocking time caused by tasks thatare located on the same core. This blocking time

represents the time during which a task can inter-fere by executing a critical section of ceiling prior-ity greater than πi or a critical section protected by adisable/enable interrupt.

• bri (q) is the remote blocking time caused by tasks ofthe other core when trying to get a global resource.

• pi(q, r(q)) is the interference due to preemption, i.e.the amount of time a task can be delayed becauseof the execution of higher priority tasks on the samecore.

The worst case response time Ri of τi is defined by themaximum value of ri(q) among an infinite number of jobq. Consequently, the value of Ri is:

Ri = max {ri(q)} (2)

To detail the computation of Ri, we will consider thefollowing task model (illustrated on Figure 3):

τi = (Ti, Di ≤ Ti, πi, {Cnci,j}, {[Cci,j , Si,j ,Πi,j ]}) (3)

Resource Resource Si,2τiπi

Cci,2Cnc

i,1 Cnci,2 Cnc

i,3Cci,1

TiΠi,1

Si,1Πi,2

Figure 3. Task model considered for RTA

A job of a task τi is defined as a set of critical andnon-critical sections. Each critical section corresponds toa resource Si,j that can be shared either with tasks on thesame core only, or on either core. For local resources,the ceiling priority is denoted Πi,j and corresponds to thepriority given by IPCP. For global resources, the ceilingpriority Πi,j corresponds to an "infinite" value (to capturethe effect of interrupt masking). The task τi is periodicof period Ti and has a priority πi defined off-line (πi >πj denotes τi has a higher priority than τj). We denotethe execution time of critical and non-critical sectionsCci,jand Cnci,j respectively. Finally c(i) and nc(i) respectivelydenote the number of critical and non-critical sections inτi.

3

JRWRTC 2011 19

We can use the definition given in [10] for the defi-nition of the critical instant of τi: when τi and tasks ofhigher priority on the same core are activated at the sametime and the task that contributes to the maximum localblocking time has just started its execution. The remoteblocking is taking into account in the local blocking time(section 3.2.1).

3.2.1 Local blocking

remote spinning

Core 1

critical section

τ 11

τ 12

Figure 4. Local blocking situation

As shown in Figure 4, a task can be delayed by a criticalsection of a task of lower priority on the same core if theresource ceiling priority is higher than πi (filled section).This critical section can also be delayed by another one,protected by a disable/enable interrupt (shaded section).This can occur only once, between the task release and theinstant it starts. This reduces to the following equation:

∀q > 0, bli(q) ≤ maxk:πk<πi

corek=corei

max`:1≤`≤c(k)

Πk,`≥πi

Cck,` + max

m:corem 6=coreimax

1≤n≤c(m)Sm,n=Sk,`

Ccm,n

(4)

3.2.2 Remote spinning blocking

Each critical section of τi can potentially be delayed bya task of the other core using the same resource. In theworst case, we consider the longest critical section amongthis set of tasks. This is expressed by:

∀q > 0, bri (q) ≤∑

1≤j≤c(i)max

k:corek 6=coreimax

1≤`≤c(k)Sk,`=Si,`

Cck,` (5)

3.2.3 Interference

The interference term due to preemptions by higher prior-ity tasks on the same core is given by:

∀q > 0, pi(q) ≤∑

k:πk>πicorek=corei

⌈ri(q)

Tk

⌉∗ C ′k (6)

with :

C ′k = Ck +∑

1≤`≤c(k)

maxm:corem 6=corek

max1≤n≤c(m)Sm,n=Sk,`

Ccm,n (7)

In (7), we have to take into consideration the WCET ofhigher priority task including the impact of busy waitingperiod.

4. Conclusion

The goal of this paper was to give some results con-cerning the real-time analysis for systems using AU-TOSAR multicore OS specification.

The analysis does not assume strong hypothesis on theapplication. Therefore, it is very pessimistic. Indeed, forglobal resources, we always considered the worst case bytaking the maximal value of the remote blocking time (inequations 4,5,6). In a real scenario, we cannot determineif this pessimistic case will be encountered. To overcomethis limitation, our future work will be to find how theAUTOSAR specifications could be used in a predictableway.

In order to limit the non-determinism induced by spin-lock, we had to define some restrictions on the HW byconsidering only two cores. For now, only dual-core ar-chitectures are available but in the future, we will use dieswith more cores. In those cases, we will have to studyother protocols such as MSRP or MPCP [11].

References

[1] S. Anssi, S. Tucci-Piergiovanni, S. Kuntz, S. Gérard,and F. Terrier. Enabling scheduling analysis for au-tosar systems. In 14th IEEE International Symposiumon Object/Component/Service-Oriented Real Time Dis-tributed Computing, pages 152–159, 2011.

[2] AUTOSAR. AUTOSAR - Specification of operating sys-tem. Technical Report v4.0, AUTOSAR GbR, 2011.

[3] AUTOSAR. AUTOSAR - Specification of operating sys-tem for multicore. Technical Report v4.0, AUTOSARGbR, 2011.

[4] AUTOSAR. http://www.autosar.org. Technical report,AUTOSAR GbR, June 2011.

[5] E. W. Dijkstra. Solution of a problem in concurrent pro-gramming control. Commun. ACM, 8(9):569, 1965.

[6] P.-E. Hladik, A.-M. Deplanche, S. Faucou, and Y. Trin-quet. Adequacy between autosar os specification and real-time scheduling theory. In International Symposium onIndustrial Embedded Systems, 2003.

[7] N. Navet, A. Monot, B. Bavoux, and F. Simonot-Lion.Multi-source and multicore automotive ecus - os protec-tion mechanisms and scheduling. In International Sympo-sium on Industrial Electronics - ISIE, 2010.

[8] OSEK/VDX. OSEK/VDX - Operating system. TechnicalReport v2.2.3, OSEK Group, 2005.

[9] G. L. Peterson. Myths about the mutual exclusion prob-lem. Inf. Process. Lett., 12(3):115–116, 1981.

[10] Y. Wang and M. Saksena. Scheduling fixed-priority taskswith preemption threshold. In Proceedings of the Sixth In-ternational Conference on Real-Time Computing Systemsand Applications, 1999.

[11] H. Zeng and M. Di Natale. Mechanisms for guaranteeingdata consistency and flow preservation in autosar softwareon multi-core platforms. In 6th IEEE International Sym-posium on Industrial Embedded Systems, 2011.

4

20 JRWRTC 2011

Towards a Weakly-Hard Approach for Real-Time Simulation

Abir Ben Khaled Mongi Ben GaidIFP Energies nouvelles,

1-4 avenue de Bois-Preau,92852 Rueil-Malmaison, France

[email protected] [email protected]

Daniel SimonINRIA

Inovallee Montbonnot38334 St Ismier Cedex, France

[email protected]

Abstract

New regulations for vehicles are increasingly demandingon fuel consumption and pollutant emissions reduction. Thisrequires to design new engine concepts and related controlstrategies. Control design, validation and calibration on testbeds is costly and time consuming. To reduce developmentcosts and time-to-market, Electronic Control Units (ECUs)have to be validated at an early stage using a real-time simu-lation platform based on continuous engine models.

To allow real-time simulation of high fidelity engine mod-els, different techniques have to be applied in order to ful-fill the real-time constraints. Real-time simulation involvestrade-offs between several aspects, such as real-time con-straints, models computational complexity and integration ac-curacy. This paper mainly focus on numerical integrationrelated aspects, by introducing its different methods such asnumerical solvers and partitioning/scheduling, and then con-sidering the trade-off between accuracy and simulation speed.

1. Introduction

Designing complex systems like Cyber-Physical Systemswhich include both physical and computational models is timeconsuming and implies the knowledge and cooperation of dif-ferent disciplines [7]. In order to reduce design, develop-ment and validation phases, a global simulation is needed atan early stage. The main purpose of the numerical simulationis, when an analytical solution cannot be derived, to approx-imate its behavior as faithfully as possible. In other words,bounding and minimizing the simulation errors is the impor-tant aim during numerical simulation.

Besides, in a hardware-in-the-loop (HIL) architecture, realcomponents (e.g. an ECU) are linked with simulated models,therefore inducing real-time constraints to make the differenttime scales consistent. In order to allow the real-time execu-tion of numerical simulation models, computation times haveto satisfy a well-defined model of real-time constraints.

Considering the numerical integration of Ordinary Differ-ential Equations (ODEs), the real-time constraints can be seenas computing fast enough to be compliant with the system’s

dynamics to ensure that the deviations w.r.t. the ideal (con-tinuous) plant’s behavior can be counted as negligible. Forexample, in an engine model, the combustion in the cylindersmay require tight needs in term of processor resources to copewith the fast transients of this particular phase.

In our point of view, real-time simulators for HIL valida-tion may be considered as weakly-hard real-time systems [3]that may miss a specified number of deadlines in a specifiedway, in order to guarantee a level of quality of service. Men-in-the-loop simulators, on the other hand, may be consideredas soft real-time systems where deadlines can be missed in anon predictable way.

This paper considers weakly hard real-time HIL systemsand focuses on computation time aspects, where small enoughcomputation times are necessary to comply with real-timeconstraints. First, engine models are introduced. Then,the characteristic of numerical solvers are presented throughbenchmark test results to show their influence on real-timesimulation. Finally, model splitting and parallel computingare introduced to improve efficiency to reach real-time con-straints.

2. Engine Model

The considered cyber-physical system involves an engineand its controllers. The engine represents the physical sys-tem part, it is modeled in the continuous-time domain usingODEs. It belongs to the hybrid systems category becauseof some discontinuous behaviors which correspond to eventstriggered off when a given threshold is crossed. Controllers,that supervise physical parts, represent computational mod-els. They are modeled on the discrete-time domain and sam-pling is a mixture of time-driven and events-driven features.For this case study, the real-time system is defined as if con-trollers interact with the engine model as though it was in areal physical system, which means that the behavior of themodel is expected to be similar (or close enough) to the oneof the real continuous system. In other words, some dead-lines may be relaxed in a controllable way by synchronizingcontrollers and engine model only at some meeting points.Eventually, it must not disturb the test results.

The considered engine is a four cylinder diesel engine withfixed geometry turbocharger where the combustion process is

JRWRTC 2011 21

modeled based on Wiebe’s law. The model contains 87 con-tinuous states and 420 event indicators (of discontinuities).Each cylinder requires four strokes of its piston (two revolu-tions of the crankshaft) to complete the sequence of eventswhich produces one power stroke. It comprises [5]:

• An intake stroke, which draws up fresh mixture into thecylinder from the inlet valve.• A compression stroke, when both valves are closed and

the mixture inside the cylinder is compressed to a smallfraction of its initial volume. Toward the end of the com-pression stroke, combustion is initiated and the cylinderpressure rises very fast.• A power or expansion stroke (combustion phase), where

the high-temperature and high-pressure gases push thepiston down and force the crank to rotate.• An exhaust stroke, where the remaining burned gases

exit the cylinder through the exhaust valve.

3. Numerical solvers

The need for deterministic computation times made fixedstep solvers considered as the prerequisite solution for real-time simulation. This paper aims to revisit this assumption.This section will discuss solvers characteristics, more espe-cially their step management, in order to characterize theirinfluence on weakly-hard real-time constraints.The main differences between numerical schemes are:

• Order, which represents the accuracy of numerical solu-tion due to the truncation of Taylor series.• Explicit/Implicit method: Implicit schemes are often less

accurate in the beginning compared to explicit schemes.They are also more complicated to implement and re-quire much computations at each time step because theyneed the resolution of a non linear system; still, they arestable mostly for stiff systems, whereas explicit schemesneed to use very short time steps to ensure stability.• Fixed/Variable (adaptive) time step: In general, the time

step must be much lower or even negligible compared tosystem dynamic in order to have stability. Stiff problemsarise when the system has several time constants that arevery different (distant). For fixed time step methods, thestep size must be chosen tiny enough to cope with thefastest system’s transient for the whole simulation time.However the number of integration for a simulation stepaccording to their order is bounded and so the executiontime is predictable. For adaptive methods, the integra-tion step is driven by the integration error, iterating untilit reaches a predefined bound on the error, thus followingthe variations of the system’s dynamic along the simula-tion.• One-step/Multi-step method: One-step methods (e.g.

Runge-Kutta) only use the current values of the solutiony and its derivatives f = dy/dt. Multi-step methodsuse several past values of y (e.g. Backward Differenti-ation Formulas (BDFs)) and f = dy/dt (e.g. Adams)to achieve a higher order of accuracy. BDF is among

the most effective multi-step method for stiff systemswhereas Adams method is among the best known multi-step method for solving general non-stiff systems.

The LSODAR [6] solver is based on BDFs and Adams algo-rithms and switch automatically between methods when thesystem varies from stiff to non stiff and conversely. It has alsoa root-finding capability to detect the events occurring dur-ing the simulation and to cleanly stop/restart the integrationin case of events.

4. Benchmark tests and results

In the following benchmark tests, simulations are donewith the xMOD tool. xMOD [1] is a platform that combinesan integration environment for heterogeneous models togetherwith a virtual test laboratory. The considered engine modelwas developed in Dymola using ModEngine library [2] andwas imported in xMOD as Functional Mock-up Unit (FMU)1

The engine model is linked to its controller developedin Simulink thanks to the integration capabilities of xMOD.The data exchange between the model and the controller isperformed periodically. This period is denoted communica-tion time step and is chosen equal to 500 µs. For fixed-stepsolvers, there is no control on accuracy. An integration stepof 50 µs is chosen in order to achieve correct simulation re-sults. For variable-step solvers, the communication time-stepis different from integration step and could even be very larger(see Figure 1). The solver automatically increase the time-step when there are no discontinuities and decreases it whena discontinuity occurs. This distinctive feature make it fasterthan fixed-step solvers for regular continuous system with fewdiscontinuities. Besides, variable-step solvers are known fortheir accuracy and the capability to better manage numericalstability.

Engine simulator

Controller

Initialization Exchange 1 Exchange 2

Integration step

Communication step Control computation

Figure 1. simulator architecture

The following tests, which constitute a work in progress,are done without controllers and don’t reach currently realtime (0.5s simulation time correspond to 10.5s with Euler)due to the model complexity. Model optimization is currentlybeing undertaken and simulation with Euler (the fast one) willbe considered as reference. Tests with LSODAR show thatrefining the solver tolerance causes an increase in executiontime. In fact, varying it from 10−1 to 10−3 increases twicethe time (see Figure 2). Moreover, setting the tolerance to

1http://modelisar.org/

2

22 JRWRTC 2011

10−1 preserves the result correctness, so it is better to useit, especially as its execution time is close to the well knownRunge-Kunta 4th order solver (RK4) (see Figure 2).

Figure 2. execution time for different solversand tolerances

Other results using LSODAR show that the engine modelpresents many discontinuities. Figure 3 represents one enginecycle (2500rpm). This causes an interruption of the solverat every discontinuity. For this model, the interruption dueto events represent 57, 7% of all the integration stops. So,even when the system’s dynamics allows LSODAR to choosethe maximum step size (500µs), the solver cannot take it andis forced to select a smaller one to fit with the discontinu-ity (see Figure 4). This shows that the step size is not onlyrelated to the system’s dynamics but also to the occurrenceof frequent events processed during the integration. We canconclude that the speed advantage of variable-step solvers de-creases when the system emits a large number of events withhigh frequency.

Figure 3. Impact of discontinuities on step size

Figure 5 plots LSODAR, Euler and RK4 execution timesfor each communication time step (500µs). LSODARpresents several peaks due to fast transients, but several timesLSODAR execution times are smaller than fixed-step solvers(Euler and RK4). Besides, LSODAR execution times varyfrom communication step to another which enlighten the stepadaptation w.r.t. the system’s dynamic evolution.

Previous tests show that LSODAR is an effective variable-step solver because it can speed up when system dynamics al-low it, while keeping the integration tolerance under control.

Figure 4. frequency of events on variable step

Figure 5. execution time per communicationstep

Although, it is not efficient when the system presents manydiscontinuities. On the other hand, it appears often that in acomplex systems, events are only directly related to the evolu-tion of a subset of the state vector, and that the correspondingdiscontinuities are independent from a physical point of view.Thus an idea is to divide the system into subsystems, to mini-mize useless integration interrupts due to unrelated events.

5. Model splitting to parallel computing: Wave-form Relaxation

Even with sophisticated techniques, numerical solvershave reached their boundaries in speed computation terms. Inorder to reach real-time, the next step is to shorten even moreintegration time using multi-core platform by parallelizing en-gine model to simplify computation to the solver.

5.1. Engine model splittingThe engine model was partitioned by separating the com-

bustion cycle in the four-cylinder from everything else, calledairpath, then by isolating the combustion cycle from eachcylinder. This kind of splitting is interesting for variable time-step solver because it relaxes event constraints i.e. decreasesthe number of events. In fact, the combustion phase presentsmany events and it is executed from cylinder 1 to cylinder 4in a sequential way. Then, at the end of the forth cylindercombustion, the cycle is repeated relentlessly. By splitting themodels, the solver can treat locally the events during combus-

3

JRWRTC 2011 23

tion cycle in a single cylinder and then relax time-step untilthe next cycle.

With the engine model already partitioned, the aim is toapply a solver to it using the Waveform Relaxation (WR).

5.2. Waveform RelaxationWaveform Relaxation method was introduced in [8]. It is

an iterative process originating from Picard theorem, whichmakes possible to solve simultaneously in parallel coupledsubsystems over successive time windows. Each subsystemis characterized by its waveform (i.e. its solution over a deter-mined time interval). The purpose is to find the waveform ofa subsystem, considering all the waveforms of the other sub-systems constant during one iteration. For practical results([4]), a sequential Gauss Seidel and a parallel Gauss JacobiWR codes have been developed. The second implementationis considered in the sequel.

Given a system partitioned into two subsystems with x1and x2 the two waveforms:dx1

dt = f(t, x1, x2), x10 = x1(t = 0) (1)dx2

dt = f(t, x1, x2), x20 = x2(t = 0) (2)The initialization is done by freezing, during all simulationtime, x20 for the first WR and x10 for the second WR. Then,the first iteration is done by integrating simultaneously (1) and(2) and saving in memory the trajectories (x11, x

21, .., x

n1 ) and

(x12, x22, .., x

n2 ) at the communication time-step. Each sub-

system can be integrated with a variable-step or a fixed-stepsolver but all must use the same communication time-step.

If the tests of convergence are satisfied for iteration it:|xi1,it − xi1,it−1| <= ε and |xi2,it−xi2,it−1| <= ε; for i=[1..n],then the integration is successful. Otherwise another integra-tion is restarted, using the already computed state trajectoriesupdated from the other subsystems, until convergence (see fig-ure 6).

Component exchange for each step

Subsystem1: x1

Subsystem2: x2

2dt 4dt dt 3dt 5dt (n-1)dt ndt = Tstop

Iterates until convergence:

x11,it

x12,it x2

2,it xn2,it

x21,it

xn1,it

it++

time

time-step communication

Figure 6. Waveform relaxation technique

The purpose of splitting the system and using WR methodis to reduce execution time and to reach real-time constraints.With the original WR method, it is not possible because of theslowness of convergence. It is shown in [4] that longer is thesimulation time until communication between the WR, sloweris the convergence. In this work in progress, several parame-ters are studied in order to improve the number of iterations:

• Control of the solver tolerance depending on the WR it-eration. In fact, for the first WR iterations, the resultsare not accurate because of the lack of data coming from

other sub-systems. The idea is then to relax the solvertolerance just for the first iterations,then progressivelytighten it.

• Initialization with infinitesimal disturbance of equilib-rium position. Indeed, with initialization close to the so-lution, few iterations will be need to converge.

• Using time windows for the WR. In fact, integrating untila specified time window instead of the end of simulationeases the convergence and decreases the number of iter-ations.

6. Future Work: Waveform relaxation withadaptive windowing

After implementing these described techniques, WR withwindowing approach could be more improved for real-timesimulation by dynamically adapting the window. In fact, it isnot obvious to select the optimal window size that take intoaccount the (variable) system’s dynamic behaviors. Hence,the idea, derived from variable-step solvers principle, is tocompute on-line non-uniform windows whose sizes dependon the development of the waveforms. The concept is to adap-tively determine the size of the next time interval based on thepreviously computed solution, for example the windows canbe rescaled using the ratio between convergence tolerance oftwo successive iterations [4]. The windows size should bebounded to get a small number of WR iterations. The corre-sponding feedback scheduler remains to be designed.

References

[1] M. Ben Gaıd, G. Corde, A. Chasse, B. Lety, R. De La Rubia,and M. Ould Abdellahi. Heterogeneous model integration andvirtual experimentation using xmod: Application to hybrid pow-ertrain design and validation. In 7th EUROSIM Congress onModeling and Simulation, Prague, Czech Republic, Sep. 2010.

[2] Z. Benjelloun-Touimi, M. Ben Gaid, J. Bohbot, A. Dutoya,H. Hadj-Amor, P. Moulin, H. Saafi, and N. Pernet. From phys-ical modeling to real-time simulation : Feedback on the use ofmodelica in the engine control development toolchain. In 8thInternational Modelica Conference, Germany, March 2011.

[3] G. Bernat, A. Burns, and A. Llamosi. Weakly hard real-timesystems. IEEE Trans. Comput., 50:308–321, April 2001.

[4] K. Burrage, C. Dyke, and B. Pohl. On the performance of par-allel waveform relaxations for differential systems. Applied Nu-merical Mathematics, 20:39–55, February 1996.

[5] J. Heywood. Internal combustion engine fundamentals.McGraw-Hill series in mechanical engineering. McGraw-Hill,1988.

[6] A. C. Hindmarsh and L. R. Petzold. Algorithms and software forordinary differential equations and differential-algebraic equa-tions, part ii: higher-order methods and software packages.Comput. Phys., 9:148–155, March 1995.

[7] E. A. Lee. Computing foundations and practice for cyber-physical systems: A preliminary report. Technical ReportUCB/EECS-2007-72, Univ. of California, Berkeley, May 2007.

[8] E. Lelarasmee, A. Ruehli, and A. Sangiovanni-Vincentelli. Thewaveform relaxation method for time-domain analysis of largescale integrated circuits. IEEE Trans. on Computer-Aided De-sign of Integrated Circuits and Systems, 1(3):131–145, Jul 1982.

4

24 JRWRTC 2011

Preemptive Multiprocessor Real-Time Scheduling with Exact Preemption Cost

Falou Ndoye

INRIA Paris-Rocquencourt

Domaine de Voluceau BP 105

78153 Le Chesnay Cedex - France

[email protected]

Yves Sorel

INRIA Paris-Rocquencourt

Domaine de Voluceau BP 105

78153 Le Chesnay Cedex - France

[email protected]

Abstract

We propose a greedy heuristic to solve the real-time

scheduling problem of periodic preemptive tasks on a mul-

tiprocessor architecture while taking into account the ex-

act preemption cost. In the framework of partitioned

scheduling, this is achieved by combining an allocation

heuristic whose cost function minimizes the makespan,

and a schedulability condition based on the ⊕ operation

which takes into account the exact preemption cost.

1. Introduction

For computation power and modularity issues, multi-

processor architectures are necessary to tackle complex

applications found in domains such that avionics, auto-

motives, mobile robotics, etc. Some of these applications

are safety critical, leading to hard real-time task systems

whose constraints must be necessarily satisfied in order

to avoid catastrophic consequences. Although preemp-

tive real-time scheduling allows a better success ratio than

non preemptive real-time scheduling, preemption has a

cost. That cost is usually approximated in the WCET

(Worst Case Execution Time) as assumed, explicitely, by

Liu and Layland in their pioneer article [12]. However,

such approximation is dangerous in safety critical context

since an application may miss some deadlines during its

real-time execution even though schedulability conditions

were satisfied. In oder to tackle the problem A. Burns

and al. in [3] presented an analysis that enables the global

cost due to preemptions to be factored into the standard

equations for calculating the worst case response time of

any task, but they achieved that by considering the maxi-

mum number of preemptions. Other works aim at bound-

ing the number of preemptions [8, 7]. In both cases the

exact number of preemptions is not considered leading to

waste resources in time and memory. However, that exact

number of preemptions is difficult to determine since it

may vary according to every instance of a task whereas it

is not difficult to determine the constant cost of every pre-

emption which includes the context switch necessary to

make possible the preemption and the choice of the task

with the highest priority. It is the reasons why it is neces-

sary to take care of the the exact preemption cost. In this

paper we address all together the previous issues.

The remainder of the paper is organized as follows:

section 2 presents the related work on multiprocessor real-

time scheduling and preemption cost, in section 3 we de-

scribe the model and the schedulability analysis used, sec-

tion 4 presents the proposed heuristic. Finaly the section

5 concludes and gives some directions for future work.

2. Related work

2.1. State of works in multiprocessor scheduling

The scheduling of real-time tasks on multiprocessor ar-

chitectures can be achieved according to three main ap-

proaches: partitioned scheduling, global scheduling, and

semi-patitioned scheduling.

In the partitioned scheduling approach [5, 17] the set of

tasks is divided into a number of disjoint subsets less than

the number of processors in the multiprocessor architec-

ture, and each of these subsets is allocated to one proces-

sor. All the instances (or jobs) of a task are executed on

the same processor and no migration is permitted. In this

approach it is necessary to choose a scheduling algorithm

for every processor, possibly the same algorithm, and also

an allocation algorithm. On the other hand, the allocation

problem has been demonstrated NP-Hard [9]. This com-

plexity is the main drawback of the partitioned scheduling

approach.

There exist two classes of methods to solve allocation

problems and more generally NP-Hard problems: the ex-

act methods [15, 16] which examine all possible solutions

and give the optimal solution (the best solution accord-

ing to given criteria) but they have a very large execu-

tion time, and the approximate methods [16] which give

the solutions very quickly compared to the exact meth-

ods but these solutions are only near optimal. For the

approximate methods we distinguish heuristics and meta-

heuristics [16, 11]. Metaheuristics are methods inspired

from domains such that biology, chemistry, artificial intel-

ligence, etc. They give near optimal solutions but they

have an execution time larger than the heuristics. The

JRWRTC 2011 25

heuristics methods are inspired from the considered do-

main, here the real-time scheduling, but the solutions pro-

duced are generally less close to the optimal than those

obtained with metaheuristics. Since the allocation prob-

lem is NP-Hard heuristics are considered to be the best

suited solutions when the execution time is crucial as in

the rapid prototype phase of the design process. Davari

and Dhall were the first to propose in [6] two preemptive

scheduling algorithms RM-FF (Rate Monotinic First Fit)

and RM-NF (Rate Monotinic Next Fit) to solve the fixed

priority multiprocessor real-time scheduling problem. In

the proposed algorithm, the uniprocessor RM algorithm

[12] is used to verify if a task is schedulable on a pro-

cessor, and respectively first-fit and next-fit bin-packing

heuristics [4] are used to achieve the allocation to the dif-

ferent processors. In both allocation heuristics the tasks

are sorted in decreasing order of their periods before the

allocation started. RM-NF tries to allocate a task to a pro-

cessor, called curent processor, until the RM schedulabil-

ity condition is violated. In this case the current processor

is marked ”full” and a new processor is selected. RM-FF

tries to allocate, first, a task to the marked processor before

allocating it to a new processor. In [11] a greedy heuristic

is proposed to solve the problem of allocating tasks on a

multiprocessor architecture while reducing the makespan,

but the scheduling algorithm is non preemptive.

In the global scheduling approach [5, 17] a unique

scheduling algorithm is applied globally for every proces-

sor of the multiprocessor architecture. All the ready tasks

are in a unique queue shared by all the processors. In this

queue the m tasks with the highest priorities are selected

to be executed on the m available processors. Besides

preemptions, task migrations are permitted. The advan-

tage of the global scheduling approach, is that it allows a

better use of the processors. The main drawback of the

global scheduling approach, is that each migration has a

prohibitive cost.

In the semi-partitioned scheduling approach [10, 2],

derivated from the partitioned scheduling approach, each

task is allocated to a specific processor as long as the

the total utilization of the processor does not exceed its

schedulable bound. In this approach some tasks can be

portioned for their executions among multiple processors.

During run-time scheduling, a portioned task is permitted

to migrate among the allocated processors, while the par-

tioned tasks are executed on specific processors without

any migration. The semi-partitioned scheduling approach

allows a reduction of the number of migrations. However,

be aware that migrations have a prohibitive cost.

2.2. Our choices

The migrations cost in the global and semi-partition

scheduling approaches lead us to choose the partitioned

scheduling. Moreover, since the partitioned sheduling

transforms the multiprocessor scheduling problem in sev-

eral uniprocessor scheduling problems we can take ad-

vantage of the numerous research results obtained for the

uniprocessor scheduling problem. Because we aim at

rapid protyping we propose an allocation heuristic rather

than metaheuristic or exact methods, and a schedulability

test to verify if a task is schedulable on a specific pro-

cessor. Bin-packing heuristics try to reduce the number

of processors involving an increase of the makespan, i.e.

the global response time of tasks on all the processors.

On the other hand, multiprocessor architectures used in

the industrial applications, we are interested in, have a

fixed number of processors. This number of processors

may be minimized later on but this is not the primary

goal. That is the reason why we propose a greedy heuris-

tic similar to the heuristic given in [11]. This heuristic

allocates the tasks on the processors and, in addition, min-

imizes the makespan. This latter optimization is important

when feedback control is intended, like in avionics, auto-

motives, mobile robotics applications, etc.

Although preemptive scheduling algorithms are able to

successfully schedule some task systems that cannot be

scheduled by non preemptive scheduling algorithms, the

preemption has a cost. Indeed, Liu and Layland in [12]

assume that the preemption cost is approximated in the

WCET. Thus, there are two possible cases: the approx-

imation in time and memory space is high enough and

thus leads to wasting, the approximation is low and thus

a task system declared schedulable by, let say RM, may

miss some deadlines during its real-time execution. Con-

sequently, we propose to use the ⊕ operation [13, 14]. It

is an algebraic operation that verifies either two tasks are

schedulable, or not, taking into account the exact preemp-

tion cost.

3. Model and Schedulability analysis

Let Γn = {τ1, τ2, · · · , τn} be a system of n periodic

real-time tasks where τi = (r1i , Ci, Di, Ti) and Ci ≤Di ≤ Ti. Based on the typical characteristics of a peri-

odic task, r1i is the date of first activation, Ci is the WCET

without any approximation of the preemption cost [14],

Di the relative deadline, and Ti is the period of τi. We

want to schedule the system of tasks Γn on m identical

processors 1.

We use the scheduling operation ⊕ [14] to verify either

a task is schedulable, or not, on a processor. This opera-

tion applied to a pair of tasks (x, y), such that x has the

highest priority, gives as result a task r, that is r = x⊕ y.

As mentioned before, ⊕ takes into account the exact pre-

emption cost suffered by the task y. Here is briefly the

principle of that operation which is explained in details in

[14]. It consists in replacing the available time units of

the highest priority task x with the time units of the low-

est priority task y. In order to do that, both tasks are ini-

tially referenced to the same time origin. Then, task x is

rewritten according to the number of instances of task y in

the LCM (Least Common Multiple) of both task periods.

1All the processors have the same computation power.

2

26 JRWRTC 2011

That latter operation allows the identification of the avail-

able time units in task x, but also the verifcation that task

y does not miss its deadlines for each instance. Since task

y can be preempted by task x the exact number of preemp-

tions is counted for each instance of y. For each instance

of y, using a ”fixed point algorithm”, the preemption cost

is added to the WCET, without any approximation, in or-

der to obtain the PET (Preemption Execution Time). If

the amount of PET unit of times fits in the available time

units in task x the task y is schedulable giving as result

task r, otherwise it is not schedulable. Since ⊕ is an inter-

nal operation, i.e. the result given by ⊕ is also a task, that

result may be in turn used as the highest priority task in

another ⊕ operation. Thanks to this property it is possible

to consider more than two tasks.

4. Heuristic

The heuristic presented in Algorithm 1 is a greedy

heuristic. The solution is built step by step. In each step

a decision is taken and this decision is never questioned

during the following steps (no bactracking). The effec-

tiveness of such greedy heuristic is based on the choice

of the decision to built a new element of the solution. In

our case the decision is taken according to a cost function

which aims at minimizing the makespan.

4.1. Cost function for allocation

The cost function allows the selection of the best pro-

cessor pj to schedule a task τi. In our case this cost fun-

tion is defined for a processor pj and a task τi to be the

response time on pj after the scheduling of τi taking into

account the exact preemption cost. The processor which

minimizes this cost function of τi among all the proces-

sors is considered to be the best processor to schedule the

task τi. The minimization of the response time on the set

of processors has the advantage of reducing the makespan.

4.2. Cost function for allocation

The cost function allows the selection of the best pro-

cessor pj to schedule a task τi. In our case this cost fun-

tion is defined for a processor pj and a task τi to be the

response time on pj after the scheduling of τi taking into

account the exact preemption cost. The processor which

minimizes this cost function of τi among all the proces-

sors is considered to be the best processor to schedule the

task τi. The minimization of the response time on the set

of processors has the advantage of reducing the makespan.

4.3. Principle of our allocation heuristic

We use a ”list heuristic” [1] whose order allows the

allocation of the tasks to the different processors. In our

case, we initialize this list, called “set of canditate tasks”,

with the set of tasks according to the decreasing order of

their priorities. At each step of the heuristic, the task with

the highest priority is selected among the set of candidate

tasks, and we attempt to allocate it to its best processor

according to the cost function presented previously. Of

course, the task is actually scheduled on its best processor.

Then, this task is removed from the set of candidate tasks.

We use the scheduling operation ⊕ to verify either

a task is schedulable, or not, on a processor taking into

account the exact preemption cost. The first activation

date is crucial for the schedulabilty of a task τi. A non

schedulable task can become schedulable if its date of

first activation is modified [14]. In the proposed heuristic

we exploit this property to improve the success ratio of

the heuristic, as follows. When we attempt to allocate the

task τi to a processor pj , if that task is not schedulable

with its initial date of first activation r1i , before attempting

to allocate it on another processor, we delay its date of

first activation by incrementing its value of one time

unit (r1i = r1i + 1), and that until the task τi becomes

schedulable or r1i exceeds the value of the makespan

computed in the precedent step. In this case the task τiis not schedulable on the processor pj . This principle

allows a set of tasks to be schedulable while they where

not with their initial date of first activation. However, that

iterative search of a date of first activation, which leads

to a schedulable task, increases the complexity of the

heuristic.

5. Conclusion and future work

In this paper we have presented a greedy heuristic

which allocates and schedules, on a multiprocessor ar-

chitecture, a set of real-time tasks while reducing the

makespan. In addition, this heuristic takes into account

the exact preemption cost that must be carefully consid-

ered in safety critical applications we are interested in.

In future works we plan to study the complexity of the

proposed heuristic, as well as its performance, by com-

paring it with an exact algorithm. In addition we plan to

study the case of dependent tasks.

References

[1] L. Adams, K. M. Chandy, and J. R. Dickson. A com-

parison of list schedules for parallel processing systems.

Commun. ACM, 17:685–690, December 1974.

[2] J. H. Anderson, V. Bud, and C. U. Devi. An edf-based

scheduling algorithm for multiprocessor soft real-time sys-

tems. In Proceedings of the 17th Euromicro Conference

on Real-Time Systems, pages 199–208, Washington, DC,

USA, 2005. IEEE Computer Society.

[3] A. Burns, K. Tindell, and A. Wellings. Effective analysis

for engineering real-time fixed priority schedulers. IEEE

Trans. Softw. Eng., 21:475–480, May 1995.

[4] E. Coffman, G. Galambos, S. Martello, and D. Vigo. Bin

packing approximation algorithms: Combinatorial analy-

sis. Handbook of combinatorial optimization, 1998.

[5] R. I. Davis and A. Burns. A survey of hard real-time

scheduling algorithms and schedulability analysis tech-

niques for multiprocessor systems. Technical Report YCS-

3

JRWRTC 2011 27

Algorithm 1 Greedy heuristic

1: Initialize the candidate tasks W with the set of tasks

in the decreasing order of their priorities, initialize the

boolean variable SystemTasksSchedulable to true2: while W is not empty and

SystemTasksSchedulable= true do

3: Select in W the highest priority task τi4: % We verify on each processor pj if task τi is

schedulable.

5: for j=1 to m (the number of processors) do

6: if with its initial date of activation r1i , the task τiis schedulable on pj with the exact preemption

cost (scheduling operation ⊕ [14]) then

7: Compute the cost function of task τi on the

processor pj , i.e. the response time of τi on pj8: else

9: while τi is not schedulable on pj and r1i do

not exceed the value of the makespan of the

previous step do

10: r1i = r1i + 111: end while

12: if τi is schedulable on pj then

13: Compute the cost function of τi on pj with

the new date of first activation of τi14: else

15: τi is not schedulable on pj with the exact

preemption cost

16: end if

17: end if

18: end for

19: % Now, using again the cost function, we choose

the best processor for τi among all the processors

on which τi is schedulable.

20: if τi is schedulable on one or several processors

then

21: Schedule the task τi on the processor which min-

imizes the cost function

22: Remove the task τi from W .

23: SystemTasksSchedulable= true24: else

25: SystemTasksSchedulable= false26: end if

27: end while

2009-443, University of York, Department of Computer

Science, 2009.

[6] S. K. Dhall and C. L. Liu. On a real-time scheduling prob-

lem. Operation Research, vol. 26(1), 1978.

[7] A. Easwaran, I. Shin, I. Lee, and O. Sokolsky. Bounding

preemptions under edf and rm schedulers. Technical Re-

port MS-CIS-06-07, University of Pennsylvania, Depart-

ment of Computer and Information Science.

[8] J. Echague, I. Ripoll, and A. Crespo. Hard real-time

preemptively scheduling with high context switch cost.

In Proceedings of 7th Euromicro workshop on Real-Time

Systems, Los Alamitos, CA, USA, 1995. IEEE Computer

Society.

[9] Garey and Johnson. Computers and intractability : a guide

to the theory of NP-completeness. W. H. Freeman and

Company, 1979.

[10] S. Kato and N. Yamasaki. Semi-partitioning technique

for multiprocessor real-time scheduling. In Proceedings

of WIP Session of the 29th Real-Time Systems Symposium

(RTSS), IEEE Computer Society, 2008.

[11] O. Kermia and Y. Sorel. A rapid heuristic for scheduling

non-preemptive dependent periodic tasks onto multipro-

cessor. In Proceedings of ISCA 20th International Con-

ference on Parallel and Distributed Computing Systems,

PDCS’07, Las Vegas, Nevada, USA, sep 2007.

[12] C. L. Liu and J. W. Layland. Scheduling algorithms

for multiprogramming in a hard-real-time environnment.

JACM, vol. 20(1), Jan 1973.

[13] P. Meumeu-Yomsi and Y. Sorel. Extending rate monotonic

analysis with exact cost of preemptions for hard real-time

systems. In Proceedings of 19th Euromicro Conference on

Real-Time Systems, ECRTS’07, Pisa, Italy, July 2007.

[14] P. Meumeu-Yomsi and Y. Sorel. An algebraic approach for

fixed-priority scheduling of hard real-time systems with

exact preemption cost. Research Report RR-7702, INRIA,

Aug. 2011.

[15] J. E. Mitchell. Branch-and-cut algorithms for combinato-

rial optimization problems. pages pp 65–67, 2002.

[16] E. G. Talabi. Metaheuristics. Wiley, 2009.

[17] O. U. P. Zapata and P. M. Alvarez. Edf and rm mul-

tiprocessor scheduling algorithms: Survey and perfor-

mance evaluation. http://delta.cs.cinvestav.mx/ pmejia-

multitechreport.pdf, Oct 2005.

4

28 JRWRTC 2011

Performance Analysis for Segment Stretch Transformation of ParallelReal-time Tasks

Manar Qamhieh, Frederic Fauberteau, Serge MidonnetUniversite Paris-Est

{manar.qamhieh,frederic.fauberteau,serge.midonnet}@univ-paris-est.fr

Abstract

The Segment Stretch Transformation (SST) is an algo-rithm that transforms parallel Fork-Join (FJ) tasks intosequential tasks on multiprocessor systems when possi-ble. SST is based on Task Stretch Transformation (TST)which is a transformation for the same model of tasks, butit uses segment migrations while SST eliminates their use.In this paper, we prove that both transformations have thesame performance by providing a detailed analysis basedon Demand Bound Function (DBF) and by showing thatSST has a resource augmentation bound of 3.42, same asTST, which means that if a taskset is feasible on m speedprocessors, then it is schedulable using the transformationon m processors that are 3.42 times faster.

1. Introduction

Parallelism in normal systems is widely used for a longtime, and it is created specially to cope with the tendencyof chip manufacturers to build multiprocessor systems.But in real-time systems, the integration of parallelism ismore complicated when it is to be compared with ordi-nary sequential tasks even if they are executed on multi-processor systems. Parallelism in real-time systems canbe defined as the execution of the same task at the sametime on multiple processors while respecting certain timeconstraints like period or deadline.

Parallelism has many models and theories that are ap-plied in actual programming languages, like the MapRe-duce model which is designed by Google in order to speedup the execution of massive data on multiple processors.Another model is the fork-join model “FJ”, which is acommon parallel computing model and it is the base ofOpenMP parallel programming C library.

2 Fork-Join Model

The parallel real-time task of fork-join model is a taskin which certain parts are executed simultaneously onmultiple processors. As shown in Figure 1, FJ task con-sists of segments, both sequential and parallel. The taskalways starts by a sequential segment which is executed

on one processor, and then it forks into a specific num-ber of parallel segments, which they join together into asequential segment, and so on. The number of segmentsare defined by the model, as well as the number of parallelprocessors, which means that all the parallel regions in thetask share the same number of processors.

An implicit-deadline FJ task is described as the follow-ing: τi = ((C1

i , P2i , C

3i , . . . , P

si−1i , Csii ),mi, Ti) where:

si is the total number of segments (sequential and parallel)and is an odd number,mi is the number of parallel threadson which parallel segments will be executed. mi > 1 forparallel segments, and equal to 1 for sequential segments.Cki is the Worst Case Execution Time (WCET) of the kth

sequential segment and k is an odd number, P ki is theWCET of the parallel segments in the kth parallel region,where k is an even number. P k,1i = P k,2i = . . . = P k,mi

i ,Ti is the period of the task (Di = Ti).

7 8 9 10 11 12 13

S1 S3P11 P12 P21

P’22

14 15 16 171 4

P13

3

S2 P23P’’22

Deadline Di

Ci1

Pi2,1

Ci3 Ci

5 CikPi

2,2

Pi2,n

Pi4,1

Pi4,2

Pi4,n

Figure 1. Task of FJ model.

Definition 2.1 (Master string) The master string of aparallel FJ task is a collection of segments that executeswithin the master thread of the task, starting by the firstsequential segment and then the intermediate parallel andsequential segments, and ending by the last sequentialsegment. In numerical notations, master string can berepresented as:

τ1i , τ2,1i , τ3i , . . . , τ

(si−1),1i , τsi,1i

Definition 2.2 (Parallel execution length) The parallelexecution length Pi is the sum of worst execution time ofparallel segments in the master string of τi, where:

Pi =

si−1

2∑

k=1

P 2k,1i (1)

JRWRTC 2011 29

Definition 2.3 (Minimum execution length) The mini-mum execution length ηi represents the minimum time aFJ task τi needs to execute when all parallel segments areexecuted in parallel. It is equal to the sum of WCET of allsegments in the master string of task τi:

ηi = (

si−1

2∑

k=0

(C2k+1i )) + Pi (2)

Definition 2.4 (Maximum execution length) The maxi-mum execution length Ci, which is the sum of WCET ofall sequential and parallel segments in task τi:

Ci = (

si−1

2∑

k=0

(C2k+1i )) +mi ∗ Pi (3)

Definition 2.5 (Slack time) The slack time Li is the tem-poral difference between the deadline and the minimumexecution time.

Li = Di − ηi (4)

Definition 2.6 (Capacity) The capacity fi is defined asthe capacity of the master string to execute parallel seg-ments from all parallel regions within itself without miss-ing its deadline.

fi = Li/Pi (5)

3 Related work

3.1 Task Stretch TransformationAccording to Lakshmanan et al. in [4], the parallel

real-time tasks of FJ model on multiprocessor can haveschedulable utilization bound slightly greater than andarbitrarily close to uniprocessor schedulable utilizationbound. Therefore they proposed an algorithm called TST.

The main objective of TST is to convert the parallelFJ task into sequential when possible, by creating a fullystretched master string where its execution time is equal toits deadline. Part of the parallel segments execute withinthe master string and the remaining ones are scheduled bya partitioned scheduling algorithm called FBB-FFD [3].

TST is proved to have a resource augmentation boundof 3.42, which means any taskset that is feasible onm unitspeed processor is feasible using the TST onm processorsthat are 3.42 times faster.

3.2 Segment Stretch TransformationSince segment migrations are used heavily in the TST

which limits its implementation by using a special Linuxkernel called Linux/RK which supports semi-partitioning,we proposed earlier a modification to the algorithm calledSST for the same FJ model of parallelism [2].

SST also tries to convert the parallel tasks into sequen-tial ones by creating a master string, which in this case canbe either fully stretched or not, in order to provide a parti-tioned scheduling for the parallel FJ tasks by eliminating

the segment migration. As a result the new transformationalgorithm can be implemented directly on a RT Linux ker-nel with no specific patches.

For a parallel implicit-deadline task τi of FJ model,with maximum execution time of Ci, deadline of Di, SSTcan be divided into 3 cases:

1. Ci ≤ Di: the task will be fully converted from par-allel task to sequential,

2. Ci > Di and the master string will be fully stretched.The slack Li of the master string will be completelyfilled by parallel segments and its execution time willbe equal to the deadline,

3. Ci > Di and the master string will not be fullystretched, some of the slack time will remain unfilled.This case can’t be found in the algorithm of TST.

In this paper we prove that SST have the same resourceaugmentation bound as TST, by providing an analysis in-spired by the analysis performed on TST algorithm in [4].

4 Analysis

4.1 Demand Bound Function “DBF”Definition 4.1 (DBF [1]) DBF is defined as the largestcumulative execution requirement of all jobs that can begenerated by τi to have both their arrival times and theirdeadlines within a contiguous interval of length t, and isdefined by:

DBF (τi, t) = max(0, (b t−Di

Tic+ 1)Ci) (6)

Theorem 4.1 (DBF) DBF of a stretched task τstretchedi

using SST is:

DBF (τstretchedi , t) ≤ (Ci/Ti − ηi)t

where 0 ≤ ηi ≤ Ti.

In order to generalize DBF for the stretched task, thethree cases of SST (Section 3.2) have to be analyzed.

1. For the first case, the parallel task is transformed to-tally into a sequential one which is the master string,and Dmaster

i = Tmasteri and Cmasteri = Ci (Equa-tion (3)). DBF is calculated as the following:

DBF (τstretchedi , t) = DBF (τmasteri , t)

DBF (τstretchedi , t) = max

(0,

(⌊t−Di

Ti

⌋+ 1

)Cmasteri

)

DBF (τstretchedi , t) = max

(0,

(⌊t

Ti

⌋)Ci

)≤ CiTit

DBF (τstretchedi , t) ≤ (Ci/Ti − ηi)t

where 0 ≤ ηi ≤ Ti.

30 JRWRTC 2011

2. The second case is when the master string is fullystretched but there exist parallel constrained deadlinesegments that are not part of the master string andwill be scheduled using FBB-FFD:

τstretched = τmaster + {τ cd}

DBF will be as the following:

DBF (τstretchedi , t) ≤ DBF (τmasteri , t)+DBF ({τ cdi }, t)

DBF for the master string can be calculated know-ing that the master string is fully stretched, andCmasteri = Dmaster

i :

DBF (τmasteri , t) ≤ Cmasteri

Dmasteri

t

DBF (τmasteri , t) ≤ t

The group of segments {τ cdi } consists of segmentsfrom all the parallel regions in τi, and only one par-allel region is activated at time instant t. The maxi-mum number of parallel segments in each region is(qi − 1), where qi = mi − bfic. In Fig. 2, we showsan example of a stretched task of the second case,with constrained-deadline parallel segments of 3 dif-ferent parallel regions.

Therefore, DBF can be calculated as the following:

DBF ({τ cdi }, t) ≤ δmaxi (qi − 1)t (7)

The density of a constrained deadline task τi is givenby:

δi = Ci/Di

As shown also in Fig. 2, the SST algorithm startsby filling the slack of the master string by bfic-parallel segments from each parallel region, and thenwe add other parallel segments if their WCET fits inthe remaining slack. For a k parallel region, all τ cd

have the same WCET P ki , and deadline Dki where

Dki = (1+ nbfic)P ki , and 1 ≤ n < mi according to

the number of parallel segments from region whichexecute within the master string.

The maximum density of the parallel constraineddeadline tasks τ cd can be calculated as the follow-ing:

δmax =si−1

2maxk=1

P 2ki

(1 + nbfic)P 2ki

δmax =si−1

2maxk=1

1

(1 + nbfic)

δmax =1

1 + bfic

Since there exist at least one parallel region wheren = 1, this region is the one with the highest density.

SST

Segment Stretch Transformation (1st case)Cmax = C1,1 + C2,1 + ... + C2,4 + C3,1

Cond: Cmax <= DeadlineOutput: a master string (implicit deadline).

Same as TST, DBFstretched <= [C / (T - η)] * t

Segment Stretch Transformation (2nd case)Cmax = C1,1 + C2,1 + ... + C2,4 + C3,1 + ... + C7,1

ηi = C1,1 + C2,1 + C3,1 + ... + C7,1 Pi = C2,1 + C4,1 + C6,1 fi = (D - ηi) / Pi = (13 - 8)/4 = 1,5Cond: Cmax > Deadline, master string fully stretched.Output: ! 1- a master string (implicit deadline)/ fully stretched. [Cmaster = Dmaster] [Same as TST, DBF(Tmaster, t) <= t] 2- {Τcd}: parallel segments that were not included in the master string Τcd = {Τ2,2, Τ2,3, Τ4,2, Τ6,2, Τ6,3} max. number of Τcd in each parallel region = (qi -1) = (m - floor(f) -1) = 4 -1 -1 = 2 for each parallel segment i in Τcd => Di = Ci * ([n * floor(f)] +1), where n is the number of parallel segments that included in the master string from this parallel region. Since these parallel segments have offsets, and at each instant of time t, only on parallel region can execute.Then, DBF(Tcd , t) = (qi -1) * (maximum density of Τcd) where density = Execution time / Deadline, Densitymax = 1 / (1 + floor(fi)).As shown in the example, the maximum density of parallel segments is when the minimum number of parallel segments is included into the master string from each region (which is the value of floor(f)) (in the example, the first and third parallel region have the maximum density = 0.5)

Density= 2/4 = 0.5

Density= 1/3 = 0.33

Output: ! 1- a master string (implicit deadline)/ not fully stretched. 2- {Τcd}: parallel segments that were not included in the master string -- Same as Case 2 above.!For the master string, it is a task with execution time Cmaster = ηi + Pi * floor(fi) which is definitely less than the period. (Cmaster / Period < 1)So, DBF(Tmaster, t) <= [ Cmaster / Period ] * t <= twhich leads us to the same results of TST, that in all cases,DBFstretched <= [C / (T - η)] * t

Details of the analysis and equations are included in the draft sent before.DBF is used to calculate the Resource augmentation bound, equations included in the draft.

Segment Stretch Transformation (3rd case)

Cmax = C1,1 + C2,1 + ... + C2,4 + C3,1

Cond: Cmax > Deadline && [fi - floor(fi) != 0]fi = (7 - 4) / 2 = 1.5

T2,1T1,1

T2,2

T2,3

T2,4

T4,1

T4,2

T4,3

T4,4

T7,1 T2,1T1,1

T2,2

T2,4

T2,3

T4,1

T4,2

T4,4 T7,1

parallel region parallel region parallel region

Figure 2. Stretched task

Using this result in Equation (7), the DBF is as thefollowing:

DBF ({τ cdi }, t) ≤1

1 + bfic(qi − 1)t

In order to eliminate the use of bfic in our calcula-tions, we will use the following approximation:

δmaxi =1

1 + bficfi − bfic < 1 =⇒ 1 + bfic > fi

δmaxi <1

fi

which leads to the following:

DBF ({τ cdi }, t) ≤1

fi(qi − 1)t

fi = (Ti − ηi)/Pi

DBF ({τ cdi }, t) ≤(qi − 1)PiTi − ηi

t (8)

DBF of the whole stretched task can be calculated asthe sum of both the master string and the group ofconstrained deadline, as the following:

DBF (τstretchedi , t) ≤ DBF (τmasteri , t) +DBF ({τ cdi }, t)

DBF (τstretchedi , t) ≤ t+ (qi − 1)PiTi − ηi

t

DBF (τstretchedi , t) ≤ Ti − ηi + (mi − bfic − 1)

Ti − ηit

DBF (τstretchedi , t) ≤ miPiTi − ηi

t ≤ CiTi − ηi

t

3. For the third case, the stretched task τstretchi con-sists of a collection of constrained deadline tasks,including the master string, which will not be fullystretched.

The execution time of the master string in this casewill be as the following:

ηi + Pibfic ≤ Cmasteri < Ci ≤ Ti

Cmasteri < Ti =⇒Cmasteri

Ti< 1

DBF (τmasteri , t) = max(0, b tTicCmasteri )

DBF (τmasteri , t) ≤ Cmasteri

Tit

DBF (τmasteri , t) ≤ t

JRWRTC 2011 31

For the group of constrained deadline segments τ cd,it is the same as in the second case, then we can usethe previously calculated DBF (Equation (8)):

DBF (τstretchedi , t) ≤ DBF (τmasteri , t) +DBF ({τ cdi }, t)

DBF (τstretchedi , t) ≤ t+ (qi − 1)PiTi − ηi

t

DBF (τstretchedi , t) ≤ CiTi − ηi

t

To sum up, the three cases of τstretched by SST sharethe same DBF.

4.2 Resource Augmentation BoundLakshmanan et al. have analyzed the resource aug-

mentation bound for their partitioned scheduling algo-rithm FJ-DMS (Fork-Join–Deadline-Monotonic Schedul-ing) [4]. The partitioning of the transformed set of tasks iscarried out by FBB-FFD scheduling algorithm proposedby Fisher et al. [3]. The schedulability test for FBB-FFDis given by:

m ≥ δsum + usum − δmax1− δmax

(9)

For a processor that is v times faster, the following canbe applied:

∀1 ≤ i ≤ n, ηvi ≤Tiv

=⇒ (Ti − ηvi ) ≥ Ti(1−1

v)

Cvi =Civ

uvsum =usumv

δvmax =δmaxv

n∑

i=1

CiTi≤ m

δvsum = maxt>0

(

∑ni=1DBF (τ

stretchedi , t)

t)

δvsum ≤n∑

i=1

CviTi − ηvi

In order to simplify the equation, the following can beapplied:

usum =n∑

i=1

CiTi

δvsum ≤n∑

i=1

CviTi − ηvi

≤n∑

i=1

CviTi(1− 1

v )≤ 1

v − 1usum

By substituting the previous equations in Equation (9),a taskset is schedulable if:

m ≥mv−1 + m

v − δmax

v

1− δmax

v

This is an increasing function of δmax for m ≥ v2 .

The density of any parallel thread with constraineddeadline in SST is:

∀fi ≥ 0 =⇒ δmaxi =1

1 + bfic0 ≤ bfic =⇒ 1 ≤ 1 + bfic

then δmaxi ≤ 11+bfic ≤ 1.

Using this in Equation (9) and when m ≥ v2 , the

schedulability is ensured if:

m ≥mv−1 + m

v − 1v

1− 1v

Applying the same calculations used on TST in [4], andfor all m ≥ v

2 , we got the following result:

=⇒ v ≥ (2 +√2) ≈ 3.42

This approximated value of resource augmentation boundfor SST is the same as the one in TST, and it means ifa taskset is feasible on a m speed processors, then it isguarantee that the transformation will schedule the sametaskset on m processors with 3.42 times faster.

5 Conclusion

In a previous paper [2], we presented SST, which isan algorithm design specially for parallel real-time tasksof FJ model based on another transformation called TST,which can be found in literature. Both algorithms en-hances the schedulability of parallel tasks while the seg-ment stretch has a practical implementation advantagegained by preventing the use of segment migration.

All the analysis we provided previously was an exten-sive simulation as a tool to measure the performance ofSST in comparison with the TST, so as to see the effects ofthe proposed modifications. But in this paper, we providea detailed analysis to calculate the demand bound func-tion of the SST and to prove that it also has the same ap-proximated resource augmentation bound 3.42, the sameas TST. This analysis proved that SST does not only havethe same performance as TST, but it also reduced the costof segment migration.

References

[1] S. K. Baruah, A. K.-L. Mok, and L. E. Rosier. Preemptivelyscheduling hard-real-time sporadic tasks on one processor.In Proc. of RTSS, pages 182–190, 1990.

[2] F. Fauberteau, S. Midonnet, and M. Qamhieh. Partitionedscheduling of parallel real-time tasks on multiprocessor sys-tems. In Proc. of WiP ECRTS, page 4pp., 2011.

[3] N. W. Fisher, S. K. Baruah, and T. P. Baker. The partitionedscheduling of sporadic tasks according to static-priorities. InProc. of ECRTS, pages 118–127, 2006.

[4] K. Lakshmanan, S. Kato, and R. (Raj) Rajkumar. Schedul-ing parallel real-time tasks on multi-core processors. InProc. of RTSS, pages 259–268, 2010.

32 JRWRTC 2011

Towards removing tail preemptions to save energy

Cristian Maxim, Liliana Cucu-Grosjean and Olivier ZendraINRIA Nancy Grand EstVillers les Nancy, France

Email: [email protected]

Abstract

We propose an algorithm that improves energy con-sumption in real-time systems by combining DynamicVoltage Scaling and a decrease in the number of preemp-tions. Our overall purpose is to focus on a specific partof the problem, namely selectively increasing frequency tolower the number of preemptions of a task to try and de-crease the total energy consumption.

1 Introduction and related work

Many embedded real-time systems (ERTS) integratebattery operated microprocessor systems with limited bat-tery autonomy. Minimizing energy consumption is thuscrucial. Significant research occurred in the area of Dy-namic Voltage and Frequency Scaling (DVFS), an effec-tive technique to reduce energy consumption in ERTS.DVFS is used in a variety of modern processors technolo-gies and allows switching between different frequenciesand voltage operating points at run time. Many DVFSalgorithms have been proposed and support various taskmodels for ERTS. For instance, Jejurikar and Gupta [1]used a DVFS method to determine the minimum optimalvoltage at which the task set subject to both time and min-imum energy constraints must be executed.

Marinoni and Buttazzo [2] presented a novel DVFSmanagement algorithm that integrates energy-awarenesswithin elastic scheduling to cope with processors with alimited number of operating modes.

Task preemption improves slack utilization and hasbeen extensively used in DVFS schemes to minimize CPUenergy consumption. However, task preemption increasesthe response time of the preempted task, which can resultin an increased energy consumption of the preempted task.This motivates us to study the reduction of energy con-sumption by decreasing the number of preemptions. Someauthors investigated limited preemption models that canbe used to reduce the negative amount of blocking due tonon-preemptive regions. For instance Yao et al. [4, 5] pro-posed such preemption points placement algorithm (calledInsertPP algorithm).

In this paper we are combining the limited preemptions

method and DVFS. Our overall purpose is to focus on aspecific part of the problem, namely selectively increasingfrequency to lower the number of preemptions of a task totry and decrease the total energy consumption. We presenta very first step in this direction, in which we focus onremoving tail preemptions. Our method is iterative andworks by trial and error keeping a configuration only whenit does improve energy usage; this guarantees that it cannever increase global energy usage.

2 Problem statement and associated model

We deal with the preemptive fixed-priority schedulingof synchronous constrained periodic tasks on one proces-sor. We consider τ = {τ1, τ2, · · · , τn} a set of n periodictasks ordered according to their priority.

Each task τi is characterized by an exact inter-arrivaltime Ti, a relative deadline Di, a worst-case executiontime Ci and pi the number of non-preemptive (elemen-tary) chunks composing any job of τi. Thus any job of τicould be split by preemption in at most pi chunks.

We consider Di ≤ Ti,∀i ≤ n. Since the tasks areconstrained deadline, then (0, H] is a feasibility interval,where H is the least common multiple of the tasks peri-ods. We denote by Ui = Ci

Tithe utilisation of τi and by

Utotal =n∑

i=1

Ui the total utilization for a task set τ .

We denote by Ji,j the jth job of τi which is releasedat time instant (j − 1)Ti and must finish its execution bytime instant (j − 1)Ti +Di.

For each job we consider that (initially) there are pre-emption points inserted according to the InsertPP algo-rithm presented in [5]. Therefore the kth chunk of anytask τi is denoted by δi,k, 1 ≤ i ≤ n, 1 ≤ k ≤ pi, and itsworst case execution time by qi,k. For a task τi we denoteby qmaxi the largest qi,k among all k ≤ pi.

We use a variable frequency processor with differentoperating modes. Let F1,...,Fs be the available frequen-cies ordered in increasing order. A slowdown factor η isdefined as the normalized operating frequency, i.e., the ra-tio between the current frequency and the maximum fre-quency Fs of the processor. We denote by ηi the slow-down factor assigned to task τi; therefore ∃l such that

1

JRWRTC 2011 33

ηi =Fl

Fs. By default we consider that all chunks of any

τi are executed with the ηi slowdown factor. In the othercases we denote by ηi,k the slowdown factor assigned tothe kth chunk of the task τi.

We assume that the overhead incurred in changing theprocessor speed is incorporated in the task execution time.This overhead is constant and it can be, for instance, in-corporated in the worst case execution time of a task.

For a preempted task τi the associated total energy con-sumption equal to Eτi =

∑pik=1 qi,k ∗ wi,k, where wi,k

is the power used by the processor when it executes thechunk δi,k at slowdown factor ηi,k,∀k ≤ pi. Preemptioncosts are included in the tasks cost.

Problem statement We are interested in solving the fol-lowing problem :

min{pi|Eτi ≤ E0i and pi ≤ p0i },∀1 ≤ i ≤ n. (1)

where E0i is the energy consumption when τi is sched-

uled with p0i preemption points obtained by applying theInsertPP algorithm. We present the InsertPP algorithm aswell as the associated feasibility analysis in Section 3.2.

3 Existing results that we use

3.1 Execution time modelIn this section we introduce the execution time model

proposed by Thekkilakattil et al. in [3]. This model cor-relates execution times and frequencies and it can be usedwithin methods applying DVFS technique.

The execution time of a task is inversely proportionalto the slowdown factor at which the task is run. We thusrepresent the relation between any two executions timesand corresponding factors:

η1

η2=C2i

C1i

(2)

where ηk is the slowdown factor at which a task executesin Cki time units. We prove Equation (2) from:

C1i =

CsiF1∗ Fs (3)

where C1i is the execution time obtained at frequency

F1 and Csi,j the execution time obtained at frequency Fs.This implies that,

F1 =CsiC1i

∗ Fs (4)

By dividing Equation(4) by the similar equation of F2

we obtainF1

F2=C2i

C1i

(5)

After dividing the left side fraction by Fs we obtainEquation(2).

This can be applied also for the chunks:

η1

η2=q2i,kq1i,k

(6)

where ηl is the slowdown factor used by a chunk to exe-cute in qli,k time units.

In the same way, we can change any of the pairs abovewith all known execution times and corresponding fre-quencies. We can obtain frequencies or execution timesusing Equation (6) if we have the other 3 variables. Thisway, we can change execution times or find frequencies.

We use Equation (6) to obtain the needed frequenciesin order to reduce the number of preemptions of certaintasks. This equation is derived from the model presentedin [2].

3.2 Feasibility analysis under limited preemptionmodel

In this section we introduce the existing feasibilityanalysis of fixed-priority (FP) scheduling under limitedpreemption model. These results are introduced in [1] and[5]. Here we encounter the Fixed Preemption Points (FPP)approach used to reduce the number of preemptions. Ac-cording to this model, each task is divided into a numberof non preemptive chunks (aka. subjobs) by inserting pre-defined preemption points in the task code. If a higherpriority task arrives between two preemption points of therunning task, preemption is deferred until the next pre-emption point.

For the feasibility analysis under FP, we use the requestbound function RBFi(a) in a time duration a defined as:

RBFi(a) = da

TieCi

The largest blocking Bi that a task τi might experi-ence is given by the length of the largest non-preemptablechunk belonging to tasks with lower priority than τi

Bi = maxi<k≤n+1

qmaxk (7)

where qmaxn+1 = 0 by definition.We use the following two theorems (proved in [1])

that derive schedulability conditions under limited pre-emptions for FP.

Theorem 1. [1] A task set τ is schedulable with limitedpreemption FP if ∀1 ≤ i ≤ n,

Bi ≤ βiwhere βi is given by

βi = maxa∈A|a≤Di

{a−∑

j≤iRBFj(a)} (8)

with A = {kTj , k ∈ N, 1 ≤ j ≤ n}The following theorem presents a different schedula-

bility condition, expressed in terms of a bound Qk on thelongest non-preemptable region qmaxk of each task τk.

2

34 JRWRTC 2011

Theorem 2. [1] A task set τ is schedulable with limitedpreemption FP if ∀k|1 ≤ k ≤ n+ 1,

qmaxk ≤ Qk , and Qk = min1≤i≤k

{βi}

where βi is given by Equation(8).

After applying InsertPP Algorithm (see below its de-scription) , each task τi is formed of pi chunks. The firstpi − 1 chunks have the execution times equal to qmaxi

and the last chunk execution time is equal to ri, whereri ≤ qmaxi

Ci = (pi − 1) ∗ qmaxi + ri (9)

Input: Task set τOutput: The number of non-preempted chunks pi andthe largest chunk qmaxi for every τi.Initialize:for i=1 to n doqmaxi = Cipi = 1

end forqmaxn+1 = 0, Q1 =∞

for i=1 to n doCompute βi using equation(8)Qi+1 = min{Qi, βi}if (qmaxi+1 > Qi+1) then

Place a PP in τi after every Qi

pi = dCiQie

qmaxi = Qiend if

end for

InsertPP Algorithm places preemption points

4 A first step towards reducing preemptionsto save energy

In this section we propose a first (naive) solution basedon the DVFS technique. For any task τi and after apply-ing InsertPP algorithm, we decrease the number of pre-emptions by searching both for the number of chunks thathave to be executed at a higher frequency and the as-sociated frequency. Initially every chunk has the sameslowdown factor as the task (ηi). After applying our al-gorithm each chunk is assigned its own slowdown factorηi,k,∀1 ≤ k ≤ pi. This allows us to eliminate the last pre-emption. More precisely, for a task τi the length ri of thefinal chunk directly affects its response time. It is prefer-able that all higher tasks arriving during the execution oflast chunk of Ji,j do not cause preemption. One way torealize that is to join the last chunk to the previous chunks

and to forbid preemption. Another solution is to executethe last chunk in the same execution time reserved for theprevious task(s) by using a DVFS technique.

We force the execution of the last chunk right after thepi − 1 chunks by rising the frequency of one block. Wepropose an iterative algorithm EliminateLP that searchesfor the number of chunks that have to be executed at ahigher frequency and the needed frequency. We speed uponly the chunks from the tail of the task, instead of aver-agely speeding up every chunk, in order to reduce as muchas possible the frequency fluctuations.

EliminateLP algorithm uses two working variablesηoldi = ηi and mC that gives the number of chunksthat are modified (speed up) in order to eliminate the lastchunk.

Input: Task τiOutput: Task τi with a modified slowdown factor inorder to eliminate the remaining chunk riif (pi ≥ 2) then

solved=false;for (mC = 1; mC < pi and !solved; mC ++) do

find η0 such thatqnewi,k = (ηoldi ∗ (ri +mC ∗ qmaxi )/η0) < (qmaxi ∗mC) and Enewτi =

∑pi−1−mCk=1 qnewi,k ∗ wnewi,k +∑pi−1

pi−mC qoldi,k ∗ woldi,k ≤ E0

i ;for k = pi − 1; k ≤ pi − 1−mC; k ++ doηnewi,k = η0;solved=true;

end forend forif (!solved) then

The task execution time is not changed;else

task τi hasmC chunks with slowdown factor ηnewi,k

and pi − mC − 1 chunks have slowdown factorηoldi ;

end ifend if

EliminateLP Algorithm eliminates the lastpreemption, when possible

The algorithm has a complexity of O(p2i ∗m), but thenumber of modes m is usually a low number for any pro-cessor.

5 Example

We illustrate the proposed algorithm with a simple ex-ample. The tasks are scheduled on a processor that has theexact CPU operating modes like in Table 1.

We consider a task set τ with tasks τ1 = (4, 8, 8) andτ2 = (15, 32, 32) and initial slowdown factors η1 = 40

60that corresponds to frequency mode 3 and η2 = 30

60 tofrequency mode 2 (see Table 1).

3

JRWRTC 2011 35

Mode 1 2 3 4 5Frequency(mHz) 5 30 40 50 60

PowerDissipated(mW) 20 50 50 200 300Table 1. Example of CPU operating modes

Using the InsertPP algorithm we insert 3 preemptionpoints in task τ2 obtaining p2 = 4 and qmax2 = 4. SinceC2 = 15, then we have r2 = 3.

The steps for EliminateLP algorithm are:

1. mC = 1 and ηnew2,3 =40

60→

qnew2,3 =η2∗(r2+mC∗qmax

2 )ηnew2

= 5, 25 thusqmax2,3 < qmax2 ∗mC = 4 false

2. ηnew2,3 =50

60→ qnew2,3 = 4, 2 < 4 false

3. ηnew2,3 =60

60→ qnew2,3 = 3, 5 < 4 true

The steps of the algorithm indicate that the chunk δ2,3is executed at frequency mode 5 in order to eliminate thelast preemption (see Figure 1).

-

-0 8 16 24 32

τ2

τ16 6 6 6 6

6 6

? ? ? ?

?

Schedule before EliminateLP

-

-0 8 16 24 32

τ2

τ16 6 6 6 6

6 6

? ? ? ?

?

Schedule after EliminateLPFigure 1. Example 1

If τ2 uses initially the frequency mode 3 instead offrequency mode 2, then the EliminateLP algorithm stepsare:

1. ηnew2,3 =50

60→ qnew2,3 = 5, 6 < 4 false

2. ηnew2,3 =60

60→ qnew2,3 = 4, 66 < 4 false

3. ηnew2,3 = 5060 and ηnew2,2 = 50

60 → qnew2,3 +qnew2,2 = 8, 8 <8 false

4. ηnew2,3 = 6060 and ηnew2,2 = 60

60 → qnew2,3 + qnew2,2 =7, 33 < 8 true

The steps of the algorithm indicate that chunks δ2,3 andδ2,2 are executed at frequency mode 5 in order to eliminatethe last preemption (see Figure 2).

-

-0 8 16 24 32

τ2

τ16 6 6 6 6

6 6

? ? ? ?

?

Schedule before EliminateLP

-

-0 8 16 24 32

τ2

τ16 6 6 6 6

6 6

? ? ? ?

?

Schedule after EliminateLPFigure 2. Example 2

6 Conclusions and future work

In this paper we propose an algorithm that improvesenergy consumption by decreasing the number of pre-emptions. More precisely after dividing the tasks in non-preemptable chunks, we increase the frequency while ex-ecuting some chunks of a task such that the worst caseexecution time of that task is decreased by the length ofits smallest chunk. Estimating the energy gain providedby our algorithm when compared to existing work is leftas future work.

This allows us to eliminate the last preemption, but thisis not a complete solution for the problem stated above.An improvement of the algorithm should be based on theenergy estimation of the solution before and after applyingthe algorithm.

References

[1] R. Jejurikar and R. Gupta. Dynamic voltage scaling for sys-temwide energy minimization in real-time embedded sys-tems. In the 2004 International Symposium on Low PowerElectronics and Design (ISLPED), 2004.

[2] M. Marinoni and G. Buttazzo. Elastic dvs management inprocessors with discrete voltage/frequency modes. IEEETransactions on Industrial Informatics, 3(1):51–62, 2007.

[3] A. Thekkilakattil, A. S. Pillai, R. Dobrin, and S. Punnekkat.Reducing the number of preemptions in real-time systemsscheduling by CPU frequency scaling. In the 18th Inter-national Conference on Real-Time and Network Systems(RTNS), 2010.

[4] G. Yao, G. Buttazzo, and M. Bertogna. Bounding the maxi-mum length of non-preemptive regions under fixed priorityscheduling. In the 15th IEEE International Conference onEmbedded and Real-Time Computing Systems and Applica-tions (RTCSA), 2009.

[5] G. Yao, G. Buttazzo, and M. Bertogna. Feasibility anal-ysis under fixed priority scheduling with fixed preemptionpoints. In the 16th IEEE International Conference on Em-bedded and Real-Time Computing Systems and Applica-tions (RTCSA), 2010.

4

36 JRWRTC 2011

Improved Feasibility Regions: a Probabilistic Approach for Real-Time Systems

Luca Santinelli∗, Liliana Cucu-Grosjean∗ and Laurent George♯∗ INRIA Nancy Grand Est, Nancy, France

♯ University of Paris 12, Paris, [email protected],[email protected],[email protected]

Abstract

The guaranteeing of timing constraints is the main pur-pose of analyses for real-time systems. The satisfaction ofsuch constraints could be verified deterministically usingworst-case scenarios that introduce a certain pessimism.On one hand the pessimism could be decreased by usingstatistical estimations of certain parameters as it is the casefor worst-case execution times. In this paper, we address theproblem of analyzing probabilistic real-time systems whereboth the period and the execution time of tasks could be de-scribed by statistical distributions. We apply the(α,∆) ab-straction to exploit the flexibility offered by the probabilisticmodel. The resulting probabilistic schedulability analysisaims to face all the possible degrees of accuracy requiredby complex real-time systems.

1 Introduction

Real-time systems are, generally, embedded and heav-ily interacting with the environment. The performances ofsuch interactions are then analyzed, not only from the pointof view of their correctness, but also from the perspective oftime. The timing analysis of such systems has been exten-sively studied by considering worst-case values that inducea certain pessimism. Unfortunately not all real-time sys-tems could afford such pessimism and for these cases otherapproaches should be used.

Another aspect to be considered is that the hardware andsoftware elements composing RTSs may usually experienceor exhibit some randomness. For example failures due toelectro magnetic interference, aging of hardware compo-nents, probabilistic execution times, and choices in random-ized algorithms.

For all these reasons, probabilistic real-time systemsand probabilistic real-time analysis is becoming a commonpractice in the real-time community, [1]. Probabilistic ap-proaches are promising because they answer questions orprovide solutions that cannot be addressed in a determinis-

tic manner such as the distributions for the response times.Moreover, they consider models that are more realistic, forinstance regarding the message activation patterns or the ex-pression of soft real-time constraints.

1.1 Related work

One way to analyze real-time systems is to make useof abstractions and alternative representations in order topoint out and investigate specific aspects of interest, [2, 3].Among those commonly recognized by the real-time com-munity, we recall the feasibility regions exploited with sen-sitivity analysis as well as with the(α,∆) representation.The sensitivity analysis focus on theC-space and theT -space resulting in constraints on execution timesC and pe-riodsT to guarantee the schedulability of the system, [4,5].Instead, with the(α,∆) representation it is the(α,∆)-space the subject of the analysis [6], focusing on the re-source provisioning to the scheduling element. In that spaceis the computational resource, given to tasks in order to ex-ecuted, be mapped into feasibility regions (in accordanceaccording to the scheduling paradigm applied): inside suchregions the schedulability of the system is guaranteed.

Papers related to our work and probabilistic real-timehad equally used the wordsstochastic analysis[7] andprob-abilistic analysis[8]. Since the paper of Diaz et al. [9],the termstochastic analysisof real-time systems has beenused regularly by the community regardless of the approach(probabilistic or statistical). While the wordstochastic, isoften associated with unpredicted behavior, we make useof the wordprobabilistic in order to indicate that the workis based on the theory of probability. Moreover byproba-bilistic real-time systemwe mean a real-time system with atleast one parameter defined by a random variable, [10,11].

Contribution of the paper. We propose an improvedschedulability analysis of real-time systems making use ofthe probabilistic approach. The probabilistic task model al-lows to define probabilistic feasibility regions where, ac-cording to the probability level desired, schedulability con-ditions can be inferred. Indeed, with probabilities both hard

1

JRWRTC 2011 37

and soft real-time constraints can be guaranteed increasingthe flexibility of the analysis as well as reducing the pes-simism of the analysis results.

Organization of the paper. The paper is organizedsuch that in Section 2 it is described the models with par-ticular emphasis on the probabilistic model for real-timetasks. Section 3 details abstractions of tasks and resourcesbased on bounding curves. It explains also the probabilis-tic bounding resulting from the probabilistic tasks modeltogether with the(α,∆) representation and the feasibilityregions. Examples of probabilistic feasibility regions areprovided in Section 4.

2 Models

We consider a real-time systemΓ composed ofn tasks,Γ = {τ1, τ2, . . . , τn} where eachτi is modeled by aprob-abilistic periodand/orprobabilistic execution timetogetherwith constrained deadlines. Tasks are characterized by threeparameters(Ci, Ti, Di), where Ci is a random variable1

representing the execution time with a known Probability

Function (PF) denoted byfCi(·) wherefCi(c)def= P(Ci =

c). Ti is the random variable of the period with a known PF

denoted byfTi(·) andfTi(p)def= P(Ti = p); Di is the task

relative deadline. The execution time ofτi takes a valuebounded by[Cmin

i , Cmaxi ] whereas the period takes a value

bounded by[Tmini , Tmax

i ]. We assume thatCmaxi ≤ Di

andDi ≤ Tmini .

The PF ofCi andTi are respectively represented as fol-lows.

Ci =

(C0

i = Cmaxi C1

i · · · Ckii

= Cmini

fCi(Cmax

i ) fCi(C1

i ) · · · fCi(Cmin

i )

)(1)

and

Ti =

(T 0i = Tmin

i T 1i · · · T

ℓii

= Tmaxi

fTi(Tmin

i ) fTi(T 1

i ) · · · fTi(Tmax

i )

)(2)

where∑ki

j=0 fCi(Cji ) = 1 and

∑ℓij=0 fTi(T

ji ) = 1. Here,

(ki + 1) and(ℓi + 1) are respectively the number of com-putation times and periods representing taskτi. The com-putation times are ordered in an opposite manner than theperiods for sake of readability and ease of representation ofthe mathematical expressions.

In this paper we assume all the random variablesCi andTi, ∀ i ≤ n, are independent, a realistic assumption for realsystems2.

All the task parameters are given with the interpretationthat taskτi generates an infinite number of successive jobsτi,j , with j = 1, . . . ,∞. Each such job has an execu-tion requirement described byCi where for each valueCk

i ,

1Everywhere in this paper we will use a calligraphic typefaceto denoterandom variables.

2For more details see http://www.proartis-project.eu/

fCi(Cki ) is its probability of occurrence. Instead, the arrival

of the jobs is described byTi, e.g. the probability of hav-ing T k

i as the arrival for the next task job isfTi(Tki ). All

the jobs are assumed to be independent of other jobs of thesame task and those of other tasks.

Furthermore, in this work we can consider schedulersthat can be eitherFixed-Priority (FP) or dynamic-prioritysuch asEarliest Deadline First(EDF).

3 Bounding

The workloadwi of a taskτi is the amount of loadτi requires or asks in executing. Given thei-th task, thecumulative workload on the processor made by that taskand its high priority tasks is provided bywbfi(t) = Ci +∑

τj∈hp(i)

⌈tTj

⌉Cj , [12] wherehp(i) is the set of high pri-

ority tasks with respect toτi. Theworkload bound functionwbf describes the total amount of time the processor is busyserving taskτi and its high priority tasks. Within the prob-abilistic task model it is possible to associate a probabilityp to each workload and workload bound function.p standsfor the probability thatwi andwbf upper bounds respec-tively the workload and the cumulative workload of thei-thtask. The workload functions are applied in FP schedulabil-ity analysis.

The demand bound functiondbf of a task is the min-imum amount of resource (computational resource in thiscase) demanded by that task in order to execute and meet itstiming constraints [13]. It describes the least resource de-manded which is equivalent to the workload shifted by thetask deadline, and it is considered in EDF scheduling sce-narios. In case of probabilistic model it is possible to defineseveral resource demands each with an associated probabil-ity p, that comes with the interpretation ofp being the prob-ability that the curve upper bounds the resource demand.

dbfi,j(t) = max

{0,

(⌊t−Di

Ti+ 1

⌋Cj

i

)}, (3)

where eachdbfi,j has a bounding probabilitypi,j =1 − ∑

k : Cki≤Cj

ifCi(C

ki ) in case of probabilistic execu-

tion time. For probabilistic periods it isdbfi,j(t) =

max{0,(⌊

t−Di

T ji

+ 1⌋Ci

)}, where eachdbfi,j has a

bounding probabilitypi,j = 1 −∑k : Tk

i≥T j

ifTi(T

ki ). The

probabilistic curve is then represented by the tuple boundand probability,dbfi,j(t) pi,j .

The computational demand of a task setΓ of periodictasks synchronously activated at timet = 0 can be com-puted as the sum of the individual demand bound functionsof each task. In case of probabilistic tasks there are morepossibilities to obtain the cumulative demand bound func-tion by including the probabilities as parameters. One pos-

2

38 JRWRTC 2011

sibility is dbfΓ,j(t) =∑

τi∈Γ dbfi,j(t), which has a proba-bility of bounding resource requestpi = min{pi,j}. Othercombination can be obtained by imposing a certain proba-bility of boundingp and then selecting the curves (thejs)such that the minimum of the probabilities is larger thanp.Such a reasoning works at the same time for the workloadbounding functionswbf. In order to have a better insighton the probability to bounds we refer to [14]. The proba-bilities associated to bounding curves (both workloads andresource demands) lead to set of curves associated to tasksand task sets,〈curve, probability〉.

Thesupply bound functionsbf models the resource thata system element provides to tasks in order to let them exe-cute and meet their timing constraints [15]. Such a resource,in terms of thesbf, can be bounded with a linear function,the bounded delay functionbdf(t) = max{0, α(t − ∆)}with

α = limt→∞

sbf(t)

t, ∆ = inf{q |α(t− q) ≤ sbf(t) ∀t}. (4)

Although this is not the case of our paper, both thesbf andits linear approximationbdf, can have a probabilityp as-sociated representing the probability thatsbf andbdf lowerbounds the resource amount provided.

3.1 Schedulability Analysis

A real-time system is schedulable if all the tasks com-posing the system meet their deadline while executing.Scheduling conditions are defined by comparing the re-source request and the resource provisioning. With the EDFscheduling paradigm, a task setΓ, receiving an amount ofresourcesbf can be guaranteed schedulable (its deadlinescan be guaranteed) if and only if∀t dbfΓ(t) ≤ sbf(t).Using the bounded-delay linear approximation, the feasibil-ity condition becomes a sufficient condition∀ t dbf(t) ≤bdf(t), [13].

Lehoczky et al. in [12] came out with the schedulabil-ity criteria by comparing the workload to the total avail-able resource task by task in case of FP scheduling. Indeed,∀τi ∃ t ∈ schedPi | wbfi(t) ≤ sbf(t), with schedPi the setof scheduling points as in [16].

Commonly, schedulability conditions relate to schedu-lability regions in representation spaces. For example,with the(α,∆) abstraction one can define the(α,∆)-spacewhere the computational resource is described in terms ofits parametersα and∆. In that space it is possible to de-fine feasibility regions where the task set is schedulable withthe specific(α,∆) resource assignment, [6]. Then, apply-ing the(α,∆) approximation the schedulability conditiontranslates into feasibility regions, in which the system isfea-sible, hence schedulable, according to the scheduling policyapplied.

In the case of EDF, the task set feasibility region is de-fined such that∀t ∈ D : dbf(t) ≤ α(t−∆), meaning that

∀t ∈ D : ∆ ≤ t− dbf(t)

α, ∆ ≤ min

t∈D

{t− dbf(t)

α

}, (5)

whereD is the set of deadlines the task set schedulabilityhas to be checked.

In the case of FP, the task set feasibility region comesfrom the application of the workloads and the(α,∆) ap-proximation to the schedulability criteria, [16]. It is

∀i ∃t ∈ schedPi : wbfi(t) ≤ α(t−∆),

meaning that

∀i ∃t ∈ schedPi : ∆ ≤ t− wbfi(t)

α

∆ ≤ minimaxt∈schedPi{t−wbfi(t)

α}. (6)

So far the deterministic analysis. On the other hand, theprobabilistic model for the tasks increases the feasibility interms of modeling situations and mostly in terms of improv-ing the real-time analysis. Indeed, it is possible to defineintermediate scheduling conditions by involving the prob-abilities, i.e.,a task set is schedulable with a probabilityof 50% if 50% of the time the tasks meet their deadlines.The deterministic approach consisting on feasibility regionsin theα,∆)-space as outlined before, can be extended tothe probabilistic case. Consequently of having boundingcurves with probabilities, we can define multiple regions inthe (α,∆)-space with a probability level associated. Theregions are as many as the probability levels that are pos-sible to define. Any point within that region is a resourceassignment that makes the task setΓ schedulable with theprobability given by the probability level of the region.

4 Simple Test Cases

Example 4.1. Given a probabilistic task setΓ = {τ1, τ2}and an EDF scheduling policy withτ1 has τ1 =

(0,

(1 2 30.7 0.2 0.1

), 10, 10). The resulting probabilis-

tic demand bound curves are

dbf1,1 =

⌊t − 10

10+ 1

⌋1, dbf1,2 =

⌊t − 10

10+ 1

⌋2, dbf1,3 =

⌊t − 10

10+ 1

⌋3,

respectively with bounding probabilities0.7, 0.9, 1, mean-ing that dbf1,1 upper bounds the task resource request in10% of the cases, which are those that the execution timeis C = 1. dbf1,2 upper bounds the task resource requestin 40% of the cases, resulting from the instances where theexecution time isC = 1 and/orC = 2. Finally, dbf1,3 canupper bound all the cases of execution time, so the proba-

bility is 100%. For τ2 it is τ2 = (0,

(2 30.7 0.3

), 8, 8)

resulting in

3

JRWRTC 2011 39

dbf2,1 =

⌊t − 8

8+ 1

⌋2, dbf2,2 =

⌊t − 8

8+ 1

⌋3,

wheredbf2,1 upper bounds the task resource request70%of the cases anddbf2,2 upper bounds all the the task re-source request, so the100% of the cases. The schedulabil-ity analysis in the(α,∆)-space results in different regions,Figure 1(a), where decreasing the bounding probability theregion increases its area.

Example 4.2. The former probabilistic task setΓ ={τ1, τ2} in case of FP scheduling withτ1 higher prioritythan τ2. It results in a set of feasibility regions as in Fig-ure 1(b) where the two regions differs for the area and theprobability associated. Less stringent conditions (reducedprobability) results in a larger region, hence a less con-straining schedulability condition.

0 1 2 3 4 5 6 7 8

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

α

p=0.7p=1

(a) Case 1: EDF schedulingparadigm

0

2

4

6

8

10

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

α

p=0.7p=1

(b) Case 2: FP schedulingparadigm

Figure 1. Probabilistic feasibility regions inthe (α,∆)-space for Γ = {τ1, τ2}. Two differentregions, at probabilities 0.7 and 1.

5 Conclusions

In this paper we have applied the probabilistic task modelto schedulability analysis and in particular to the(α,∆)-space exploiting a probabilistic version of the feasibility re-gions. This way we have derived an entire new set of fea-sibility regions with a probability associated and throughwhich set up a probabilistic version of the classical schedu-lability analysis.

In the future we intend to apply such probabilistic fea-sibility regions to complex real-time analyses showing theflexibility and the reduced pessimism offered by the proba-bilistic model.

References

[1] Burns, A., Bernat, G., Broster, I.: A probabilistic frameworkfor schedulability analysis. In: Third International Embed-ded Software Conference (EMSOFT03). (2003) 1–15

[2] Thiele, L., Chakraborty, S., Naedele, M.: Real-time calculusfor scheduling hard real-time systems. In: Proc. ISCAS’00.Volume 4. (2000) 101–104

[3] Mok, A.K., Feng, X.A., Chen, D.: Resource partition forreal-time systems. In: Real-Time Systems. (2001)

[4] Hermant, J.F., George, L.: A c-space senstivity analysis ofearliest deadline first scheduling. In: Workshop on Leverag-ing Applications of Formal Methods, Verification and Vali-dation (ISoLA 2007). (2007)

[5] Bini, E., Di Natale, M., Buttazzo, G.C.: Sensitivity analy-sis for fixed priority real-time systems. In: 18th EuromicroConference of Real-Time Systems (ECRTS 2006). (2006)

[6] Bini, E., Buttazzo, G.C.: Schedulability analysis of periodicfixed priority systems. IEEE Transactions on Computers 53(11), pp. 1462-1473, November 2004 (2004) 1462–1473

[7] Kaczynski, G.A., Bello, L.L., Nolte, T.: Towards stochas-tic response-time of hierarchically scheduled real-time tasks.In: Proceedings of 11th IEEE International Conferenceon Emerging Technologies and Factory Automation, ETFA2006. (2006) 453–456

[8] Tia, T., Z. Deng, Z., Shankar, M., Storch, M., Sun, J., Wu,L., Liu, J.: Probabilistic performance guarantee for real-timetasks with varying computation times. In: IEEE Real-Timeand Embedded Technology and Applications Symposium,.(1995)

[9] Dıaz, J., Garcia, D., Kim, K., Lee, C., Lo Bello, L., J.M., L.,Mirabella, O.: Stochastic analysis of periodic real-time sys-tems. In: 23rd of the IEEE Real-Time Systems Symposium(RTSS02). (2002) 289–300

[10] Kaczynski, G., Lo Bello, L., Nolte, T.: Deriving exactstochastic response times of periodic tasks in hybrid priority-driven soft real-time systems. 12th IEEE International Con-ference on Emerging Technologies and Factory Automation(ETFA’07), Greece (2007)

[11] Cucu, L., Tovar, E.: A framework for response time analy-sis of fixed-priority tasks with stochastic inter-arrival times.ACM SIGBED Review3(1) (2006)

[12] Lehoczky, J.P., Sha, L., Ding, Y.: The rate monotonicscheduling algorithm: Exact characterization and averagecase behavior. In: IEEE Real-Time Systems Symposium.(1989) 166–171

[13] Baruah, S.K., Howell, R.R., Rosier, L.E.: Algorithms andcomplexity concerning the preemptive scheduling of peri-odic, real-time tasks on one processor. In: Real-Time Sys-tem. (1990) 301–324

[14] Santinelli, L., Meumeu, P.M., Maxim, D., Cucu-Grosjean,L.: A component-based framework for modeling and ana-lyzing probabilistic real-time systems. In: 16th IEEE Inter-national Conference on Emerging Technologies and FactoryAutomation(ETFA 2011). (2011)

[15] Feng, X., Mok, A.: A model of hierarchical real-time vir-tual resources. In: Proceedings of the 23rd IEEE Real-TimeSystems Symposium (RTSS 2002). (December 2002) 26–35

[16] Bini, E., Buttazzo, G.C.: Schedulability analysis of periodicfixed priority systems. IEEE Transactions on Computers 53(11), pp. 1462-1473, November 2004 (2004) 1462–1473

4

40 JRWRTC 2011

Authors index

Ben Gaid, Mongi, 21Ben Khale, Abir, 21Berten, Vandy, 9

Cotard, Sylvain, 17Courbin, Pierre, 9Cucu-Grosjean, Liliana, 33, 37

Fauberteau, Frederic, 29

George, Laurent, 37Goossens, Joel, 9

Hirvisalo, Vesa, 13

Kiminki, Sami, 13

Maxim, Cristian, 33Midonnet, Serge, 29

Ndoye, Falou, 25

Qamhieh, Manar, 29

Santinelli, Luca, 37Simon, Daniel, 21Sorel, Yves, 25

Zendra, Olivier, 33

41

Cover picture credit: Chateau des Ducs de Bretagne - FRANCE - Nantes (44) by EmmanuelClement. Picture distributed under the Creative Commons Attribution-ShareAlike CC BY-SA3.0, CC BY-SA 2.5, CC BY-SA 2.0 and CC BY-SA 1.0 licenses.

France Section

PÔLE SYSTÈMESEMBARQUÉSHAUTES PERFORMANCESGD

R AS

R