csci1600: embedded and real time software lecture 33: worst case execution time steven reiss, fall...

CSCI1600: Embedded and Real Time SoftwareLecture 33: Worst Case Execution Time

Steven Reiss, Fall 2015

Worst Case Execution Time

What is it? Longest time a task can take

Why do we need it? Scheduling algorithms assume it is known

Can’t say anything about real time without it

What is the goal? Manually check each task to gets it max run time

Automatically get the run time of a task using a tool

What is the Problem

This should be easy Knuth volume 1 does this for a variety of algorithms

Just count the number of instructions

What are the problems? The halting problem

Almost anything you want to know about a real program is undecidable

Need to understand and limit control flows

Need to understand the hardware

Need to understand the execution model

Control Flow

To compute WCET, the control flow must be limited

Control flow can be modeled as a graph Graph of basic blocks

Basic block: code with no branches

Once started, will execute to completion

Suppose we could compute the WCET of each block

How could we compute the run time of the program

Control Flow Graphs

Loops have to be bounded Bounds can be fixed

Can be based on input

Need to determine the bounds

Nested loops Fixed, based on input

Based on index of outer loop

Reducible Control Structures

Can you compute the time for an arbitrary graph? Can be difficult

But programs don’t produce arbitrary graphs

Clean programs produce reducible graphs

A reducible graph allows you to cluster nodes WCET of a cluster can be computed

The cluster can be replaced with a single node

Reducible

A graph is reducible iff repeated applications of the following actions yields a graph with only one node: Replace a self loop with a single node

Replace a sequence of nodes such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node

Reducible Example

B1

B2,B3,B4,B5

B6

B1

B2,B3,B4,B5

B6

B1,B2,B3,B4,B5,B6

WCET On Reducible Graphs

Assume you have WCET for each block This should be easy – sequence of instructions

Can compute the WCET for each reduced block Loops are bounded

Self loop = WCET(block) * loop count

Others can’t have loops

Compute MAX(WCET for each path) from start to finish

Basic Block WCET

Each instruction takes k cycles Count the number of cycles

Multiply by the clock speed

If only it were that simple Processor timing can depend on many factors

Pipelining, out-of-order execution

Memory behavior needs to be considered

Caching

Speculation-Based CPU Anomalies

Instruction A does conditional branch followed by B or C Speculate B rather than C, but execute C

C is in the cache

If A is in the cache, there is time to prefetch B B drives C out of the cache => Longer time

If A is not in the cache, then the overall time is faster

Scheduling-Based CPU Anomalies Instructions A-B-C-D-E

B depends on A, D depends on C, E depends on D

A, D, E use resource 1 (CPU unit)

B, C use resource 2

Resource 2 initially in use

A is run first If A is quick, then B is run followed by C,D,E

This is linear time, with no overlap

If A is slow, then C can start (resource 2 freed)

B and D can then overlap

Result is faster

Memory Behavior Caching can change timings considerably

Both instruction and data caching

Why not just assume worst-case time / instruction What is the cost of an I-cache miss

Can be several orders of magnitude

Can’t afford to do this for each instruction

Need to maintain a complex model of processor and cache state Assume start state is unknown

Determining worst case input can be difficult

Need to handle preemption This could change the processor and cache states at any time

But the number of preemptions can be limited

Approaches to WCET

We need to compute WCET To handle real time scheduling

To understand real time limits

What can we do with real problems Measurement-based approaches

Code-analysis based approaches

Hybrid approaches

Measurement-Based Approaches

Why not just run the code On multiple inputs, multiple times

Recording the time it takes

Get a graph of execution times Best, worst, distribution

Execution Time Distribution

Practical Measurements

Break the program in subtasks Input distribution can be better controlled

Get measurements of the time for each subtask

Put these together to get total time

This can be a bit better but still not safe

How to Get Measurements Getting Measurements

Clock time, CPU cycle counters, etc. are availbalbe

On real hardware, probes might change processor states

Simulation

Assumes you know everything about the hardware

On real hardware using hardware probes

External triggers on hardware lines

Picking inputs Randomly (from what space, what distribution)

From sample data (how representative)

Manually (can be difficult)

Static WCET Analysis

Compiler technology can be used Much of the same type of work that compilers do in the

optimization process

Compilers need to understand control flow

Compilers want to understand loop bounds

Compilers need to understand processor state

Model the processor when generating instructions

We can use this to compute WCET

Static WCET Analysis

Static Analysis for WCET

Build the program model Control flow graph with connected basic blocks

Include information on path dependencies

Might require programmer annotations

Compute the loop bounds Have the programmer provide them for you

Deduce through symbolic execution and constraints

Hybrid approaches

Static Analysis for WCET Estimate the time for each basic block

Using a model of the CPU/Memory/etc.

Tracking processor/cache states

Known X, Known not X, unknown

Produce a range instead of a single number

Typically take into account I-cache, not D-cache

Can be done using measurement

Put the result back together Using reducible control structures

Can be formulated as linear programming

Still have to handle calls, …

Other Techniques for WCET

Partition the task into subtasks and analyze them Partitioning can be heuristic or programmer-defined

Generally, the smaller the unit, the easier it is to analyze

Hybrid approaches Use measurements for small units

Do both measurement and static analysis to get a better approximation

Use dynamics to determine possible initial states

State-of-the-Art Tools Tools exist to do this work

Using programmer annotations and assistance

Tools aren’t perfect Don’t handle preemption and scheduling

Don’t handle data caching

Don’t have the most accurate models of the CPU

Models aren’t necessarily correct

Other tools Languages, compilers and system design for time-prediction

Next Time

Guest Lecture on Security: Vasilis Kemerlis

Project Presentations Start FRIDAY Mechanics: Order, volunteers, …

csci1600: embedded and real time software lecture 33: worst case execution time steven reiss, fall...

Documents