bus-aware multicore wcet analysis through tdma offset bounds
DESCRIPTION
Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds. Timon Kelter, Heiko Falk, Peter Marwedel TU Dortmund, Computer Science 12 Design Automation for Embedded Systems. Sudipta Chattopadhyay, Abhik Roychoudhury National University of Singapore, School of Computing . Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/1.jpg)
Computer Science 12Design Automation for Embedded Systems
ECRTS 2011
Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds
Timon Kelter, Heiko Falk, Peter Marwedel
TU Dortmund, Computer Science 12Design Automation for Embedded Systems
Sudipta Chattopadhyay, Abhik Roychoudhury
National University of Singapore,School of Computing
![Page 2: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/2.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Outline
1. Introduction & Motivation2. System Model3. Analysis of TDMA Arbitration Delays4. Results5. Summary & Future Work
Slide 2
![Page 3: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/3.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Worst-Case Execution Time (WCET) Analysis
Hard Real-Time Systems and Schedulability Analysis require safe WCET values Static Analysis (Abstract Interpretation)
State-of-the-art:Industrial-strength Static Singlecore WCET Analysis
New scenario: Multicore Environments Main problem: Shared resources (Arbitration) New dependencies for timing analysis
Slide 3
![Page 4: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/4.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Predictability Properties of TDMA arbitration
Various standard arbitration alternatives exist Here: TDMA / Time slicing scheduling
Favorable predictability properties
Central: All cores can be analyzed separately Delay does only depend on the point in time of the access
Cyclicity: On the offset in the TDMA schedule Trivial bound for delay:
Core 1 Core 2 Core 3 Core 4
Slide 4
Time Time
![Page 5: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/5.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Predictability Properties of TDMA arbitration
Goal: Improve upon trivial delay bound Idea: TDMA offset determines maximum access delay For each access: Determine possible TDMA offsets
Core 1 Core 2 Core 3 Core 4
Slide 5
Time Time
Access may be reached via different paths in CFG Use sets of possible TDMA offsets
Offset overapproximationACC
![Page 6: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/6.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
System Model In-order
SimpleScalar cores
Per Core: Taskgraph Fixed-priority,
non-preemptive scheduling
Loop Bounds and for all loops
Slide 6
Core Core Core …
L1I-Cache
Shared TDMA Bus:• TDMA slotsize• No split transactions
L2I-Cache
InstructionMemory
L1I-Cache
L1I-Cache
Data memory
Data memory
Data memory
…
L
![Page 7: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/7.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
WCET Analysis Framework
Slide 7
L1 CacheAnalysis
L2 CacheAnalysis
WCETAnalysis
PipelineAnalysis
WCETAnalysis
Determines, which instructions might access the bus
Provide numerical parameters (instruction runtime)
Per Core:
L1 CacheAnalysis
L1 CacheAnalysis…
Determines possible interference in shared L2 cache
Bus access delay analysis & WCET computation
![Page 8: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/8.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Global Convergence Approach
TDMA schedule
Slide 8
mark: addmul…
sll
…
sub
…beq mark
Core 1
3
1
2
1
2
{0}
{3}
{3} {4} {4} {6} {6} {1} {1}
{3}
{0,3}
Data-flow analysis / Abstract interpretation Computes offset sets before/after block Fixpoint reached
Safe offset information
0 51 2 3 4 6 7 8 9
Core 2Core 1
{3,6}
{3,6} {4,1} {4,1} {6,3} {6,3} {1,4} {1,4}
{3,6}
{0,3,6}
{3,6,9}
{3,6,9} {4,1} {4,1} {6,3} {6,3} {1,4} {1,4}
{3,6}
4
4
WCET
![Page 9: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/9.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Graph-Tracking Approach
Problem: Global convergence cannot track cyclic offset progressions
Loop head in the example: 0, 3, 6, 6, 6, 6, … (cyclic at 6) Idea: Capture this behavior with an offset graph
Slide 9
v+
v-
v4 v3 v2 v1 v0 v9 v8 v7 v6 v5
13 1310
Edges represent single loop iterations, Weight: Iteration WCET for start offset
WCET
![Page 10: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/10.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Graph-Tracking Approach
Build special flow problem in the offset graph See paper for further details
The solution to this flow problem (ILP) yields: Full loop WCET (including bus delays) Resulting TDMA offsets after the loop execution
Handling of nested loops (similar: function calls)1)Order of analysis: Innermost loops Outmost loops2)With results: Handle inner loops like single instructions Structural reduction / folding
Slide 10
![Page 11: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/11.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Test Setup
Prototype implemented in Chronos Framework Includes:
Multi-level Cache Analysis TMDA bus analysis
Missing features: Pipeline Analysis
Testcases: Mälardalen WCET benchmarks (MRTC suite) PapaBench (multitask UAV control software) Debie (multitask space-debris monitoring software)
Slide 11
![Page 12: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/12.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Test Setup
Xeon 2 GHz, 4GB main memory, Debian ILP-Solver: CPLEX
Manual task mapping Standard cache configuration
1KB L1 (direct mapped, block size 32 byte, 0 cycle access) 2KB L2 (4-way associative, block size 64 byte, 1 cycle access) Main memory: 5 cycles access time
Debie cache configuration changes (1.6MByte Code) 2KB L1 (2-way associative) 8KB L2 (4-way associative)
Slide 12
![Page 13: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/13.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Compared Approaches
Fully unroll all loops (known loopbound) ([6]) Sequential code Precise & Slow
Assume all loop iterations start at offset 0 ([8]) Add penalty to compensate for possible underestimation Less precise & Fast
Global Convergence Approach Graph-Tracking Approach
Always use trivial bound
Slide 13
![Page 14: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/14.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Experimental Results (Relative WCET)
Slide 14
<2,10> <2,20> <2,40> <2,80> <4,80> <2,160>0%
100%
200%
300%
400%
500%
600%
700%
800%
Trivial bound Fixed Alignment ([8])Global Convergence Graph-Tracking
Baseline: Fully Unrolling ([6])
Same precision as reference approach (+0,14%)
Works for all tested configurations
![Page 15: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/15.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Experimental Results (Relative Runtime)
Slide 15
Baseline: Fully Unrolling ([6])
Way faster than reference approach (-79% to -99%)
Absolute Runtime:~ 5h for Fully Unrolling for all experiments
<2,10> <2,20> <2,40> <2,80> <2,160> <4,80>0%
5%
10%
15%
20%
25%
Trivial bound Fixed Alignment ([8])Global Convergence Graph-Tracking
![Page 16: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/16.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Summary & Future Work
TDMA offset analysis can provide useful static bounds for bus access times
Comparison against most precise known approach ([6]): 0,14% overestimation on average 13 times faster on average
Future work: Extended prototype with pipeline analysis Fine-tune graph-tracking analysis (clustering, expansion) Heuristics to combine the advantages of the existing methods Experiments with different architectures
Slide 16
![Page 17: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/17.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Thank you for your attention!
Slide 17
![Page 18: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/18.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Worst-Case Execution Times (WCET)
WCET in general not computable (Halting problem) Upper timing bounds can be statically estimated ( WCETest)
Slide 18
Time
Run
time
dist
ribut
ion
WCETBCET
BCETest WCETest
Possible execution times
Estimated execution times (Overapproximation)
![Page 19: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/19.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Timing analysis of basic blocks
New block definition yields 2 cases Block w/o bus access Pipeline
analysis will produce block WCET Single-bus-access block Compute
offset sets before / after the block to bound delay
Data-flow analysis / Abstract interpretation Computes offset sets before/after block Fixpoint reached
Safe offset information
Slide 19
mark: addmul…lw…sw…beq mark
mark: addmul…
lw
…
sw
…beq mark
![Page 20: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/20.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Abstract interpretation: Operators
Offset merge (at CFG joins)
Offset update (Abstract execution of basic block)
Slide 20
Core 1
Core 2
Core 3
![Page 21: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/21.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Algorithm: AnalyzeBlock
Slide 21
![Page 22: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/22.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Algorithm: AnalyzeLoopIteration
Slide 22
![Page 23: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/23.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Global Convergence Approach
Base scenario: Single loop, no nesting, no function calls
For each BB in loop: Repeatedly compute resulting offsets for loop iterations, build overapproximation
Slide 23
mark: addmul…
…beq mark
…
Core 2
Core 2
Core 1 Core 2
AnalysisIteration Offsets (Red BB)
1
2
3
Fixpoint valid for all loop iterations WCET
Core 1
Fixpoint Stop
lw
Core 1
LoopWCET
{0}
{3}
{3}
{3}
{3,6}
{3,6}
{0,3}
{3,6}
{3,6,9}
{3,6}
{0,3,6}
{3,6,9} …
![Page 24: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/24.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Graph-Tracking Approach
Compute dynamic flow through graph to determine loop WCET (flow unit simulates loop execution) Flow function
Flow conservation
Start / End constraints
Objective function Slide 24
Iteration t starts at offset iand ends at offset j
![Page 25: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/25.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Graph-Tracking: WCET ILP
Slide 25
Variables:
Objective:
Subject to:
![Page 26: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/26.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Graph-Tracking: Offset ILP
Slide 26
Variables:
Objective:
Subject to:
![Page 27: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/27.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Extension for pipeline/branch pred. analysis
Global convergence:Build global overapproximation of hardware state
Graph-Tracking:Possible to build approximation per offset node for better precision
Slide 27
![Page 28: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/28.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Discussion: Timing anomalies
The presented results were derived under the assumption of a timing-anomly-free system
Timing anomaly:Local worst-case behaviour does not lead to global worst-case behaviour No pruning of search space
Pruning in case of offsets: Keep only a single worst-case offset when updating offset information (the offset which leads to maximum delay)
Slide 28
![Page 29: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/29.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Benchmark Properties 1
Slide 29
![Page 30: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/30.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Benchmark Properties 2
Slide 30
![Page 31: Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds](https://reader035.vdocuments.site/reader035/viewer/2022062814/56816860550346895ddeab72/html5/thumbnails/31.jpg)
© T. Kelter | 2011-07-06 ECRTS 2011
Result Details for n=2, s=80
Slide 31
adpc
m bs
bsor
t100 cn
t
cove
r
crc
edn
fdct fft fir
inse
rtsor
t
jfdci
nt
lms
ludc
mp
mat
mul
t
mer
geso
rt
min
ver
ndes
nsic
hneu qurt
sele
ct
sqrt
stat
emat
e st
Deb
ie
Pap
aBen
ch
aver
age
75%
100%
125%
150%
175%
200%
225%
250%
275%
300%F- OC+ OT+ OT-
481% 356% 402% 321%
371%
358%