scheduling as a learned art* christopher gill, william d. smart, terry tidwell, and robert glaubius...
TRANSCRIPT
Scheduling as a Learned Art*
Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius
{cdgill, wds, ttidwell, rlg1}@cse.wustl.eduDepartment of Computer Science and Engineering
Washington University, St. Louis, MO, USA
Fourth International Workshop on Operating Systems Platforms for Embedded
Real-Time Applications (OSPERT 2008)
July 1, 2008, Prague, Czech Republic
*Research supported in part by NSF awards CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)
2 - Gill et al. – 04/21/23
Motivation: Systems with (some) Autonomy
Interact with variable environment» Varying degrees of autonomy» Performance is deadline sensitive
Many activities must run at once» Device interrupt handing,
computation» Comm w/ other systems/operators
Need reliable activity execution» Scheduling with shared resources
and competing, variable execution times
» How to guarantee utilizations?Remote Operator Station
(for all but full autonomy)Wireless
Communication
LewisMedia and Machines Lab Washington University
St. Louis, MO, USA
3 - Gill et al. – 04/21/23
More Generally, Open Soft Real-Time Systems
Questions of interest are relevant well beyond mobile robotics» Robotics is a good touchstone,
though» In many systems, platform features
interact with physical environment» Especially with increased embedding
of OS/RTOS platforms everywhere ;-)
Abstract view of the problem» Diverse concurrent application tasks» Task execution times are variable» (Soft) deadlines on application tasks» Resources shared among tasks» Need methods to design and verify
scheduling policies accordingly
What Other Kinds of Embedded Systems Have
Similar Platform Constraints?
4 - Gill et al. – 04/21/23
Current System Model
Threads of execution depend on a shared resource» Require mutually exclusive access (e.g., to a CPU) to
run Each thread binds the resource when it runs
» A thread binds resource for a duration then releases it
» Model duration with integer variables: count time quanta
Variable execution times with known distributions» We assume that each thread’s run-time distribution
is known and bounded, and independent of the others
Non-preemptive scheduler (repeats perpetually) » Scheduler chooses which thread to run (based on
policy)» Scheduler dispatches thread which runs until it yields» Scheduler waits until the thread releases the
resource
5 - Gill et al. – 04/21/23
Uncertainty (but with Observability Post-Hoc)
timepro
bab
ilit
y
timepro
bab
ilit
y
We summarize system state as a vector of integers»Represent thread utilizations
Threads’ run times come from known, bounded distributions
Scheduling a thread changes the system’s (utilization) state»Utilization is observed after the
thread runs based on its run time»State transition probabilities are
based the run time distributions This forms a basis for policy
design and optimization
From Tidwell et al., ATC 2008
6 - Gill et al. – 04/21/23
From Thread Run Times to a Scheduling Policy
We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times
(From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T)»X: set of process states (i.e., thread utilization states)»A: set of actions (i.e., scheduling a particular thread)»R: reward function for taking an action in a state
Expected utility of taking that action Distance of the next state(s) from a desired utilization (vector)
»T: transition function For each action, encodes the probability of moving from a given state
to another state
Solve MDP: optimal (per accumulated reward) policy Fold periodic states: smaller space (recent advance)
7 - Gill et al. – 04/21/23
Partial Observability Local CPU usage is pretty easy to observe exactly
»E.g., using Pentium tick counter, or other good time source
However, other key properties are noisier»E.g., robot location indoors
No GPS “position sensor”, wheel slip etc. adds noise during motion
»How does this relate to scheduling? What if we consider robot’s progress along a navigation path … … as an activity which must compete for resources with others? Then, robot’s position becomes part of the scheduling state Similar issues may arise for other scheduling cases (e.g., in CPS)
Noise in observation produces partial observability»E.g., multiple different positions can be equally likely» Possible approach: Partially Observable MDPs (POMDPs)
Reason on belief states to get MDP transition function (a big space)
8 - Gill et al. – 04/21/23
Observation Lag
State observations also may incur temporal lag»E.g., detailed scan of area with a range finding laser»However, during time it takes to scan, time passes»Robot or environment may move while scan is being done
As with partial observability, need a new extension to basic MDP model to address observation lag» In Semi-MDPs (SMDPs), an action causes 1 state change»SMDP extensions to MDPs exist for finding optimal policy
9 - Gill et al. – 04/21/23
Neglect Tolerance Need to schedule >1 entire-system behavior at once
»Can transform into scheduling interim sub-tasks as before»However, a behavior has own (possibly dynamic) structure»Navigation to cover a room, while mapping its boundary
Resource contention, control/data dependence»Scheduling becomes a multi-criteria optimization»Sub-tasks may have (potentially hard) deadlines»E.g., decide to turn or stop before hitting a wall
Spectrum: remote control to complete autonomy »Higher neglect tolerance needs more on-board scheduling»Uncertainty, observability, temporal lag issues as before»Open problem: formalize tractably, model parametrically»Multi-disciplinary (RT/ML) approach so far is still needed
10 - Gill et al. – 04/21/23
Learning (aka “Good Scheduler, Bad Scheduler”)
We base scheduling decisions on a value function»Captures state-action notion of long-term utility
Based on expected rewards from current and future actions
»But, knowing complete distributions is daunting in practice
Reinforcement learning appears promising for this»A stochastic variant of dynamic programming»Control decisions learned from direct observation
Start by dividing time into discrete steps»At each step, system is in one of a discrete set of states»Scheduler observes state, chooses action from finite set»Running action changes system state at next time step»Scheduler receives reward for immediate effect of action»Estimates value function, resulting model is exactly MDP
11 - Gill et al. – 04/21/23
Related Work Reference monitor approaches
» Interposition architectures E.g., Ostia: user/kernel-level (Garfinkel et al.)
» Separation kernels E.g., ARINC-653, MILS (Vanfleet et al.)
Scheduling policy design» Hierarchical scheduling
E.g., HLS and its extensions (Regehr et al.) E.g., Group scheduling (Niehaus et al.)
State space construction and verification» (Timed automata) model checking
E.g., IF (Sifakis et al.)
» Quasi-cyclic state space reduction E.g., Bogor (Robby et al.)
12 - Gill et al. – 04/21/23
Concluding Remarks
MDP approach maintains rational scheduling control» Even when thread run times vary stochastically» Encodes rather than presupposes utilizations» Allows policy verification (e.g., over utilization states)
Ongoing and Future Work» State space reduction via quasi-cyclic structure» Verification over continuous/discrete states» Kernel-level non-bypassable policy enforcement» Automated learning to discover scheduling policies
E.g., via RL for MDPs, POMDPs, SMDPs
Project web page » Supported by NSF grant CNS-0716764» http://www.cse.wustl.edu/~cdgill/Cybertrust/»