csci1600: embedded and real time software lecture 33: worst case execution time steven reiss, fall...
TRANSCRIPT
CSCI1600: Embedded and Real Time SoftwareLecture 33: Worst Case Execution Time
Steven Reiss, Fall 2015
Worst Case Execution Time
What is it? Longest time a task can take
Why do we need it? Scheduling algorithms assume it is known
Can’t say anything about real time without it
What is the goal? Manually check each task to gets it max run time
Automatically get the run time of a task using a tool
What is the Problem
This should be easy Knuth volume 1 does this for a variety of algorithms
Just count the number of instructions
What are the problems? The halting problem
Almost anything you want to know about a real program is undecidable
Need to understand and limit control flows
Need to understand the hardware
Need to understand the execution model
Control Flow
To compute WCET, the control flow must be limited
Control flow can be modeled as a graph Graph of basic blocks
Basic block: code with no branches
Once started, will execute to completion
Suppose we could compute the WCET of each block
How could we compute the run time of the program
Control Flow Graphs
Loops have to be bounded Bounds can be fixed
Can be based on input
Need to determine the bounds
Nested loops Fixed, based on input
Based on index of outer loop
Reducible Control Structures
Can you compute the time for an arbitrary graph? Can be difficult
But programs don’t produce arbitrary graphs
Clean programs produce reducible graphs
A reducible graph allows you to cluster nodes WCET of a cluster can be computed
The cluster can be replaced with a single node
Reducible
A graph is reducible iff repeated applications of the following actions yields a graph with only one node: Replace a self loop with a single node
Replace a sequence of nodes such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node
Reducible Example
B1
B2,B3,B4,B5
B6
B1
B2,B3,B4,B5
B6
B1,B2,B3,B4,B5,B6
WCET On Reducible Graphs
Assume you have WCET for each block This should be easy – sequence of instructions
Can compute the WCET for each reduced block Loops are bounded
Self loop = WCET(block) * loop count
Others can’t have loops
Compute MAX(WCET for each path) from start to finish
Basic Block WCET
Each instruction takes k cycles Count the number of cycles
Multiply by the clock speed
If only it were that simple Processor timing can depend on many factors
Pipelining, out-of-order execution
Memory behavior needs to be considered
Caching
Speculation-Based CPU Anomalies
Instruction A does conditional branch followed by B or C Speculate B rather than C, but execute C
C is in the cache
If A is in the cache, there is time to prefetch B B drives C out of the cache => Longer time
If A is not in the cache, then the overall time is faster
Scheduling-Based CPU Anomalies Instructions A-B-C-D-E
B depends on A, D depends on C, E depends on D
A, D, E use resource 1 (CPU unit)
B, C use resource 2
Resource 2 initially in use
A is run first If A is quick, then B is run followed by C,D,E
This is linear time, with no overlap
If A is slow, then C can start (resource 2 freed)
B and D can then overlap
Result is faster
Memory Behavior Caching can change timings considerably
Both instruction and data caching
Why not just assume worst-case time / instruction What is the cost of an I-cache miss
Can be several orders of magnitude
Can’t afford to do this for each instruction
Need to maintain a complex model of processor and cache state Assume start state is unknown
Determining worst case input can be difficult
Need to handle preemption This could change the processor and cache states at any time
But the number of preemptions can be limited
Approaches to WCET
We need to compute WCET To handle real time scheduling
To understand real time limits
What can we do with real problems Measurement-based approaches
Code-analysis based approaches
Hybrid approaches
Measurement-Based Approaches
Why not just run the code On multiple inputs, multiple times
Recording the time it takes
Get a graph of execution times Best, worst, distribution
Execution Time Distribution
Practical Measurements
Break the program in subtasks Input distribution can be better controlled
Get measurements of the time for each subtask
Put these together to get total time
This can be a bit better but still not safe
How to Get Measurements Getting Measurements
Clock time, CPU cycle counters, etc. are availbalbe
On real hardware, probes might change processor states
Simulation
Assumes you know everything about the hardware
On real hardware using hardware probes
External triggers on hardware lines
Picking inputs Randomly (from what space, what distribution)
From sample data (how representative)
Manually (can be difficult)
Static WCET Analysis
Compiler technology can be used Much of the same type of work that compilers do in the
optimization process
Compilers need to understand control flow
Compilers want to understand loop bounds
Compilers need to understand processor state
Model the processor when generating instructions
We can use this to compute WCET
Static WCET Analysis
Static Analysis for WCET
Build the program model Control flow graph with connected basic blocks
Include information on path dependencies
Might require programmer annotations
Compute the loop bounds Have the programmer provide them for you
Deduce through symbolic execution and constraints
Hybrid approaches
Static Analysis for WCET Estimate the time for each basic block
Using a model of the CPU/Memory/etc.
Tracking processor/cache states
Known X, Known not X, unknown
Produce a range instead of a single number
Typically take into account I-cache, not D-cache
Can be done using measurement
Put the result back together Using reducible control structures
Can be formulated as linear programming
Still have to handle calls, …
Other Techniques for WCET
Partition the task into subtasks and analyze them Partitioning can be heuristic or programmer-defined
Generally, the smaller the unit, the easier it is to analyze
Hybrid approaches Use measurements for small units
Do both measurement and static analysis to get a better approximation
Use dynamics to determine possible initial states
State-of-the-Art Tools Tools exist to do this work
Using programmer annotations and assistance
Tools aren’t perfect Don’t handle preemption and scheduling
Don’t handle data caching
Don’t have the most accurate models of the CPU
Models aren’t necessarily correct
Other tools Languages, compilers and system design for time-prediction
Next Time
Guest Lecture on Security: Vasilis Kemerlis
Project Presentations Start FRIDAY Mechanics: Order, volunteers, …