scheduling as a learned art* christopher gill, william d. smart, terry tidwell, and robert glaubius...

Scheduling as a Learned Art*

Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius

{cdgill, wds, ttidwell, rlg1}@cse.wustl.eduDepartment of Computer Science and Engineering

Washington University, St. Louis, MO, USA

Fourth International Workshop on Operating Systems Platforms for Embedded

Real-Time Applications (OSPERT 2008)

July 1, 2008, Prague, Czech Republic

*Research supported in part by NSF awards CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)

2 - Gill et al. – 04/21/23

Motivation: Systems with (some) Autonomy

Interact with variable environment» Varying degrees of autonomy» Performance is deadline sensitive

Many activities must run at once» Device interrupt handing,

computation» Comm w/ other systems/operators

Need reliable activity execution» Scheduling with shared resources

and competing, variable execution times

» How to guarantee utilizations?Remote Operator Station

(for all but full autonomy)Wireless

Communication

LewisMedia and Machines Lab Washington University

St. Louis, MO, USA

3 - Gill et al. – 04/21/23

More Generally, Open Soft Real-Time Systems

Questions of interest are relevant well beyond mobile robotics» Robotics is a good touchstone,

though» In many systems, platform features

interact with physical environment» Especially with increased embedding

of OS/RTOS platforms everywhere ;-)

Abstract view of the problem» Diverse concurrent application tasks» Task execution times are variable» (Soft) deadlines on application tasks» Resources shared among tasks» Need methods to design and verify

scheduling policies accordingly

What Other Kinds of Embedded Systems Have

Similar Platform Constraints?

4 - Gill et al. – 04/21/23

Current System Model

Threads of execution depend on a shared resource» Require mutually exclusive access (e.g., to a CPU) to

run Each thread binds the resource when it runs

» A thread binds resource for a duration then releases it

» Model duration with integer variables: count time quanta

Variable execution times with known distributions» We assume that each thread’s run-time distribution

is known and bounded, and independent of the others

Non-preemptive scheduler (repeats perpetually) » Scheduler chooses which thread to run (based on

policy)» Scheduler dispatches thread which runs until it yields» Scheduler waits until the thread releases the

resource

5 - Gill et al. – 04/21/23

Uncertainty (but with Observability Post-Hoc)

timepro

bab

ilit

y

timepro

bab

ilit

y

We summarize system state as a vector of integers»Represent thread utilizations

Threads’ run times come from known, bounded distributions

Scheduling a thread changes the system’s (utilization) state»Utilization is observed after the

thread runs based on its run time»State transition probabilities are

based the run time distributions This forms a basis for policy

design and optimization

From Tidwell et al., ATC 2008

6 - Gill et al. – 04/21/23

From Thread Run Times to a Scheduling Policy

We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times

(From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T)»X: set of process states (i.e., thread utilization states)»A: set of actions (i.e., scheduling a particular thread)»R: reward function for taking an action in a state

Expected utility of taking that action Distance of the next state(s) from a desired utilization (vector)

»T: transition function For each action, encodes the probability of moving from a given state

to another state

Solve MDP: optimal (per accumulated reward) policy Fold periodic states: smaller space (recent advance)

7 - Gill et al. – 04/21/23

Partial Observability Local CPU usage is pretty easy to observe exactly

»E.g., using Pentium tick counter, or other good time source

However, other key properties are noisier»E.g., robot location indoors

No GPS “position sensor”, wheel slip etc. adds noise during motion

»How does this relate to scheduling? What if we consider robot’s progress along a navigation path … … as an activity which must compete for resources with others? Then, robot’s position becomes part of the scheduling state Similar issues may arise for other scheduling cases (e.g., in CPS)

Noise in observation produces partial observability»E.g., multiple different positions can be equally likely» Possible approach: Partially Observable MDPs (POMDPs)

Reason on belief states to get MDP transition function (a big space)

8 - Gill et al. – 04/21/23

Observation Lag

State observations also may incur temporal lag»E.g., detailed scan of area with a range finding laser»However, during time it takes to scan, time passes»Robot or environment may move while scan is being done

As with partial observability, need a new extension to basic MDP model to address observation lag» In Semi-MDPs (SMDPs), an action causes 1 state change»SMDP extensions to MDPs exist for finding optimal policy

9 - Gill et al. – 04/21/23

Neglect Tolerance Need to schedule >1 entire-system behavior at once

»Can transform into scheduling interim sub-tasks as before»However, a behavior has own (possibly dynamic) structure»Navigation to cover a room, while mapping its boundary

Resource contention, control/data dependence»Scheduling becomes a multi-criteria optimization»Sub-tasks may have (potentially hard) deadlines»E.g., decide to turn or stop before hitting a wall

Spectrum: remote control to complete autonomy »Higher neglect tolerance needs more on-board scheduling»Uncertainty, observability, temporal lag issues as before»Open problem: formalize tractably, model parametrically»Multi-disciplinary (RT/ML) approach so far is still needed

10 - Gill et al. – 04/21/23

Learning (aka “Good Scheduler, Bad Scheduler”)

We base scheduling decisions on a value function»Captures state-action notion of long-term utility

Based on expected rewards from current and future actions

»But, knowing complete distributions is daunting in practice

Reinforcement learning appears promising for this»A stochastic variant of dynamic programming»Control decisions learned from direct observation

Start by dividing time into discrete steps»At each step, system is in one of a discrete set of states»Scheduler observes state, chooses action from finite set»Running action changes system state at next time step»Scheduler receives reward for immediate effect of action»Estimates value function, resulting model is exactly MDP

11 - Gill et al. – 04/21/23

Related Work Reference monitor approaches

» Interposition architectures E.g., Ostia: user/kernel-level (Garfinkel et al.)

» Separation kernels E.g., ARINC-653, MILS (Vanfleet et al.)

Scheduling policy design» Hierarchical scheduling

E.g., HLS and its extensions (Regehr et al.) E.g., Group scheduling (Niehaus et al.)

State space construction and verification» (Timed automata) model checking

E.g., IF (Sifakis et al.)

» Quasi-cyclic state space reduction E.g., Bogor (Robby et al.)

12 - Gill et al. – 04/21/23

Concluding Remarks

MDP approach maintains rational scheduling control» Even when thread run times vary stochastically» Encodes rather than presupposes utilizations» Allows policy verification (e.g., over utilization states)

Ongoing and Future Work» State space reduction via quasi-cyclic structure» Verification over continuous/discrete states» Kernel-level non-bypassable policy enforcement» Automated learning to discover scheduling policies

E.g., via RL for MDPs, POMDPs, SMDPs

Project web page » Supported by NSF grant CNS-0716764» http://www.cse.wustl.edu/~cdgill/Cybertrust/»

scheduling as a learned art* christopher gill, william d. smart, terry tidwell, and robert glaubius...

Documents