predictable implementation of real-time applications on multiprocessor systems-on-chip alexandru...

37
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University, Sweden

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

Predictable Implementation of Real-Time Applications on Multiprocessor

Systems-on-Chip

Alexandru Andrei

Embedded Systems LaboratoryLinköping University, Sweden

Page 2: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

2

GSM Phone:SearchRadio Link ControlTalking

GSM Phone:SearchRadio Link ControlTalking

MP3 playerMP3 player

Digital Camera:Take PhotoRestore Photo

Digital Camera:Take PhotoRestore Photo

...... High performanceLow powerPredictable

Page 3: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

3

Design Flow

Hardwareplatform

Software Application(s)

ExtractTask Graph

Extract TaskParameters

Optimize

Formal

Simulation

CPU0

ASIC0

CPU1

Bus

for (i=0;i<99;i++) x=x+a[i];for(j=0;j<100;j++) y=y+b[i];if (x<y)z=y;

•Worst case execution times•Task power

dl

dl

for (i=0;i<99;i++) x=x+a[i];

for (j=0;j<100;j++) y=y+b[i];

if (x<y)z=y;Implement

Extract TaskParameters

Optimize

Page 4: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

4

Application Model

dl

dl

Page 5: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

5

Hardware Architecture

Bus

CPU CPU CPUInterruptDevice

PrivateMemory

PrivateMemory

PrivateMemory

SemaphoreDevice

SharedMemory

CACHE CACHE CACHE

Page 6: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

6

Execution Model

CPU1

CPU2

BUS

Shared Mem

Private Mem1

Cac

he

Cac

he

Private Mem2

copy(s,y)use(y)

2:

y

Instructions 2

Original TG

copy(x,s)comp(x)

x

Instructions 11:

s

Page 7: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

7

Task Model

i

j

Original TG

wi

rj

Explicitcommunication

i

j

Extended TG

Page 8: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

8

Motivational Example

1 2

wi

WCET: 1 =60; 2 =25; w2 =12

1 and 2 have a deadline at time 63

PMem1

Bus

CPU1

CPU2

ShMem

PMem2

1

2

wiwi

Page 9: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

9

Motivational Example (2)

CPU1

CPU2

BUS

1

2

Implicit communication

w2

M1 M3 M5

M2 M4

I 1

I 20 6 9 15

0 6 11 17 24

33 39

36

57

Explicit communication

dl=63

I5 w2I 4

I 3

Page 10: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

10

w2 I5I4I3I2

Motivational Example (3)

CPU1

CPU2

BUS

1

2 w2

M1 M3 M5

M2 M4

I1

0 6 9 18

0 3112 17 24

36 49

43

67

dl=63

0 6 12 18 24 31

Deadlineviolation !

43 49

Using a FCFS bus arbiter

Page 11: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

11

w2I5I2I3 I4

Motivational Example (4)

CPU1

CPU2

BUS

1

2 w2

M1 M3

M2

I1

0 6 9 18

0 3212 17 26

33 39

39

57

dl=63

0 6 9 21 32 4915

M4

M4

26 39

Using a bus schedule

Page 12: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

12

Motivational Example Message

In multiprocessor systems, the WCET depends on the bus load !

In multiprocessor systems, the WCET depends on the schedule !

In multiprocessor systems, the schedule depends on the WCET !

Page 13: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

13

Implicit Communication

Benchmark Bus Utilization Impl.Communication

GSM1) 12% 39%

MP32) 26% 42%

MP33) 49% 86%

Setup:ARM7 cores, ST bus protocol1) Icache: 4096b, Dcache: 1024b2)Icache: 4096b, Dcache: 1024b3)Icache: 16b, Dcache: 256b

Page 14: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

14

WCET Analysis

Difficult both for single and multiprocessor systems Single processor tools: Symta/P, Absint aiT

Handle instruction and data caches

Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one

Page 15: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

15

WCET Analisys Flow

source files

analysisData flow

Instr. addressextraction

Program segmentsimulation

Abstract syntax treegeneration

Data dependencyanalysis

analysisData flow

extractionData address

analysisData cache

binary fileCFG construction

Annotated CFG

WCET

Instruction cache Data cache

Instr. Cacheanalysis

analysis analysis

Page 16: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

16

WCET Analysis: Example

void foo() { int i, temp; for (i=0;

i<100;i++) {

temp=a[i]; a[temp]=0;

}}

Page 17: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

17

WCET Analysis: CFG

1:void foo() {2: int i, temp;3: for (i=0;4: i<N;5: i++) {6: temp=a[i];7: a[temp]=0; 8: }9:}

id: 2

id: 17Lno:3,4,9

id: 12Lno:3,4,6

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11

Page 18: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

18

WCET Analysis: CFG

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11

Control nodes: 2, 4, 11

Basic blocks: 12, 17, 13, 6

id: 4

Loop bound(for ex. N=100)

Page 19: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

19

WCET Analysis with Instruction Cache

Generate the address traces for each program blockAssume always a miss at the beginning of each blockUse a cache simulator to get the cache rate/miss ratio for each block

We can do better

Page 20: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

20

WCET Analysis with ICache: Unrolled CFG

1:void foo() {

2: int i, temp;

3: for (i=0;

4: i<100;

5: i++) {

6: temp=a[i];

7: a[temp]=0;

8: }

9:}

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11id: 104

id: 13Lno:6,7,5,4,6

Page 21: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

21

WCET Analysis with ICache: Unrolled CFG

id: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6

id: 16Lno:6,7,5,4,8

id: 11id: 104

id: 13Lno:6,7,5,4,6

miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4

miss lno 6 (d)miss lno 6 (i)

lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4

miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4

Page 22: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

22

WCET Analysis: Multiprocessor

Cache miss penalty is constant in single processor case

Cache miss penalty is variable in the multiprocessor case

Page 23: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

23

Predictable MPSoC Bus Access

Partition the bus period in bus slots (TDMA) Assign bus slots to the processors The bus arbiter grants the bus to a processor

only during its allocated slots Eliminates the bus interference Not flexible: an idle bus slot can not be used

by another processor

Page 24: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

24

Analysis & Bus Accessid: 2

id: 17Lno:3,4

id: 12Lno:3,4

id: 4

id: 13Lno:6,7,5,4,6id: 16

Lno:6,7,5,4,8

id: 11

id: 104

id: 13Lno:6,7,5,4,6

miss lno 6 (d)lno 6miss lno 7 (d)lno 7, 5, 4

miss lno 6 (d)miss lno 6 (i)

lno 6miss lno 7 (i)miss lno 7 (d)lno 7miss lno 5 (i)lno 5, 4

miss lno 3 (i)miss lno 3 (d)lno 3miss lno 4 (i)lno 4

Bus schedule CPU1 CPU2 CPU1 CPU2 CPU2CPU1 ...24 320 8 16 42 52

Page 25: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

25

Multiprocessor Analysis and Optimization

In multiprocessor systems, the WCET depends on the schedule !

In multiprocessor systems, the schedule depends on the WCET !

Page 26: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

26

5

Overall ApproachC

PU

1C

PU

2C

PU

3B

US

1

2

3

CPU1: 1, 4

CPU2: 2

CPU3: 3 , 5

41

3

1

2

3

2

4

2

3

44

2

5

2

5

44 4

55

Page 27: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

27

Overall Approach

starting at tfor the time interval

Select bus schedule B

tasks from set Determine WCET of the

is the earliest timea tasks from set

finishes

Schedule new task attime t>=

that are active at time t

is the set of all tasksN

ew t

ask

to s

ched

ule

optim

izat

ion

Bus

sch

edul

e

Page 28: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

28

Overall Approach

starting at tfor the time interval

Select bus schedule B

tasks from set Determine WCET of the

is the earliest timea tasks from set

finishes

Schedule new task attime t >=

that are active at time t

is the set of all tasksN

ew t

ask

to s

ched

ule

optim

izat

ion

Bus

sch

edul

e

Page 29: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

29

Bus Schedule: BSA1

t0

t1

t3 CPU2

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2

... ...over

a p

erio

d slot_start ownerCPU1

CPU2

CPU1

...

t2

Page 30: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

30

Bus Schedule: BSA2

t0

owners1, 2 12

seg_sizeseg_start

owner size13

CPU1

CPU2

Segment 1 Segment 2ov

er a

per

iod

...

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2 ...

t4

owners2, 1 7

seg_sizeseg_start

owner size25

CPU1

CPU2

CPU2 CPU1

t5 t6

...

Page 31: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

31

Bus Schedule: BSA3

t0

seg_start owners1, 2 3

slot_size

t42, 1 6

... ... ...

over

a p

erio

d

Segment 1 Segment 2

t1 t2t0 t4t3

CPU2CPU1 CPU1 CPU2 ...CPU2 CPU1

t5 t6

Page 32: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

32

Experimental Results

BSA4

BSA3

BSA2

BSA1

Number of CPUs

No

rmal

ized

Sch

edu

le L

eng

th

1

1.5

2

2.5

3

3.5

4

2 4 6 8 10 12 14 16 18 20

Page 33: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

33

Experimental Results

4.0 3.0 2.6

1.2 1.01.82.2

5.0

1

1.5

2

2.5

3

3.5

2 4 6 8 10

Number of CPUs

No

rmal

ized

Sch

edu

le L

eng

th

Page 34: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

34

Real-life Example

Smart phone GSM voice codec (encoder+decoder)

and Mp3 player 64 tasks, between 100-2000 lines of C

code per task 4 ARM7 processors, interconnected via

a bus

Page 35: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

35

Real-life Example

BSA_1 BSA_2 BSA_3 BSA_4

1.17 1.33 1.31 1.62

GSM + Mp364 tasks4 ARM7 processors

Page 36: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

36

Conclusions

Realistic model for MPSoC WCET analysis must be integrated in the

system scheduling Tool for system level scheduling and WCET Tested on real applications

Page 37: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

37

ARTIST

LiU

TU Brauschweig U. of Bologna

Original SymtaP code

Bus controllerImplementation