soft, hard and ruby hard real time with linux
DESCRIPTION
Presentation slides for a talk about different approaches for achieving varying levels of Real Time performance with LinuxTRANSCRIPT
Soft, Hard and Ruby Hard Real Time Approaches with LinuxGiladBenYossef CodefidenceLtd,CTO1
Real Time: A Definition
A real-time system is one in which the correctness of the computations not only depends upon the logical correctness of the computation but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred."Donald Gillies, quoted on Usenet comp.realtime FAQ
2
The Path to Real Time in Linux
POSIX Real Time Scheduling Domains Priority Inversion Locking Memory Timer Frequency Sources Of Latency Scheduling Latency Interrupt Latency Real Time Linux Benchmarks3
Linux PrioritiesNice level -20 -19 -18 Real Time priority 99 98 97
...4 3 2 1 04
.. . 1617 18 19
Real time processes SCHED_FIFO SCHED_RR
Non real-time processes SCHED_OTHER
Changing Real Time Prioritiesint sched_setscheduler(pid_t pid, int policy, const struct sched_param *p); struct sched_param { int sched_priority };
sched_setscheduler sets the scheduling policy and priority identified by pid.
Policy is SCHED_FIFO, SCHED_RR, SCHED_OTHER or SCHED_BATCH.5
Thread Level Priorities
In Linux, each thread has separate real time priority. When creating threads with the pthread_create() system call, an attribute structure can be passed. Various thread attributes can be used to set scheduling domain and priority. The relevant attributes can also be changed during run time6
Sched PolicyThread Attribute: schedpolicy
Select the scheduling policy for the thread: one of SCHED_OTHER (regular, non-realtime scheduling), SCHED_RR (realtime, round-robin) or SCHED_FIFO (realtime, first-in first-out). Default value: SCHED_OTHER. The real time scheduling policies SCHED_RR and SCHED_FIFO are available only to processes with superuser privileges. The scheduling policy of a thread can be changed after creation with pthread_setschedpolicy(3).
7
Sched ParamThread Attribute: schedparam
Contain the scheduling parameters (essentially, the scheduling priority) for the thread. Default value: priority is 0. This attribute is not significant if the scheduling policy is SCHED_OTHER; it only matters for the real time policies SCHED_RR and SCHED_FIFO. The scheduling priority of a thread can be changed after creation with pthread_setschedparam(3).
8
Inherit SchedThread Attribute: inheritsched
Indicate whether the scheduling policy and scheduling parameters for the newly created thread are determined by the values of the schedpolicy and schedparam attributes (PTHREAD_EXPLICIT_SCHED) or are inherited from the parent thread (value PTHREAD_INHERIT_SCHED). Default value: POSIX says PTHREAD_EXPLICIT_SCHED, but at least some version of Linux do PTHREAD_INHERIT_SCHED.
9
Priority Inversion2. High Priority task preempts low priority task 99 3. Hi Priority task block on mutex Task Priority 50 4. Medium Priority task preempts low priority task and high priority task
3 1. Low Priority task takes mutex Time10
Priority Inheritance
The common solution to priority inversion is called Priority Inheritance A task which holds a lock, automatically inherits the priority of the highest priority task which contends the lock.
Until the lock is released, of course.
Correct implementation hard, performance impact non trivial, not a sliver bullet. Linux implementation was.. difficult.11
PI-Futex
Interface through the Fast User-space muTEX mechanism.
Linux 2.6 application mutex support code. Similar kernel functionality in PREEMPT_RT.
Supports user-space mutexes only.
Introduced in 2.6.18 A patched Glibc is needed right now
Patch merging into mainline Glibc right now.
12
How fork() seems to workA complete copy is created of the father process. Both parent and child continue execution from the same spot.
Father Process
Copy Father Process Child Process
13
How fork() really worksRW MemoryChild process gets a copy of stack and file descriptors. Copy On Write reference is used for the memory.
Father Process
Father Process
RO
Memory
RO
Child Process
14
What happens during write?When write is attempted, a single memory page is copied and references updated. This is called: breaking the CoW.
Original Memory
RO
RW RO
Child Process
Father Process
15
Locking Memoryintmlockall(intflags);
Disables paging for all pages mapped into the address space of the calling process. Flags are:
MCL_CURRENT: Lock all pages which are currently mapped into the address space of the process. MCL_FUTURE : Lock all pages which will become mapped into the address space of the process in the future. Use both for max. effect: MCL_CURRENT|MCL_FUTURE
16
Stack Pre-Faulting
Linux user space stacks are auto expanding.
Use more stack then allocated to a process? kernel gets an exception and allocates more stack for you. Can turn stack access to multiple context switch and memory allocation with non deterministic latency. Call a dummy function that allocates an automatic variable big enough for your entire future stack usage and write to it, after you've mlocked memory.17
We need to pre-fault stack pages.
Timer frequency
Linux timer interrupts are raised every HZ of second. HZ is configurable during kernel build: 100, 250 (i386 default) or 1000. See kernel/Kconfig.hz. Compromise between system responsiveness and global throughput. Caution: not any value can be used. Constraints apply!18
The Effect of Timer FrequencyRequested sleep time, in ms 2 0 HZ=100
1 0 1 1 2 0 5 0 Real sleep time, in ms19
The Effect of Timer Frequency cont.Requested sleep time, in ms 2 0 HZ=1000
1 0 1 1 2 0 5 0 Real sleep time, in ms20
High-Res Timers
High-res timers use non RTC interrupt timer sources to deliver timer interrupt between ticks. Allows POSIX timers and nanosleep() to be as accurate as the hardware allows This feature is transparent.
When enabled it makes these timers much more accurate than the current HZ resolution. Around 1usec on typical hardware.21
Sources Of Latency
PIC
Context Switchsaving registers...
Finding ISR
ISR
Interrupt Latency
Interrupt Latency
ISR
Scheduler
Context Switch
Task
Preemption Latency22
The Linux O(1) Scheduler
The kernel maintains 2 priority arrays: the active and the expired array. Each array contains 140 entries each with a queue of processes with the same priority. The arrays are implemented such that it is possible to pick the queue with the runnable task with the highest priority in constant time
Whatever the number of runnable tasks is.
Let's see how this helps us...23
Choosing and Expiring Tasks
The scheduler finds the highest priority with a runnable task and
Executes the first task with that priority. Non real time tasks are run until they exhaust their time slice and moved to the expired array. Real time tasks are run until they yields the CPU [*].
This is done until there are no more tasks in the array, then the two arrays are swapped and we start over. Scheduling is O(1).24
Kernel Preemption Options
Preemption means a high priority task taking the place of a low priority one.
User space code is always preemptive. PREEMPT_NONE
In kernel mode (when running system call):
None preemptive kernel code. Voluntary preemption points. Non critical section preemptive kernel.25
PREEMPT_VOLUNTARY
PREEMPT
Soft Real Time
Deadline exists, but glitches do not propagate Typically 1 millisecond cycle time. Vanilla Linux is well suited:
Preemptive kernel, O(1) scheduler, high resolution timers, priority inheritance futex support Good hardware design helps: interrupt queues, DMA ring buffers, FIFOs.
26
Hard and Ruby Hard Real Time
Deadline exists, glitch are fatal Typically Sub millisecond cycle times. Can be done with Linux, two approaches available:
Nano-kernel
Run Linux under a nano Hard real time kernel. Change Linux to meet Hard Real Time needs.
PREEMPT_RT RT Patch
27
Nano Kernel Approach
LinuxRT Nano-Kernel
RT Tasks
RTOS
Hardware28
Nano Kernel Approach
Nano-kernels come in many flavors:
FSMLabs WindRiver RTLinux RTAI project Adeos I-Pipe Jaluna VirtualLogix VLX Real time tasks written in non Linux API. Limited Linux task support. Linux and RT task integration can be difficult.29
Problems:
PREEMPT_RT
Fully preemptive kernel by Ingo Molnar
Kernel config option PREEMPT_RT Patched 2.6 Preemptible critical sections Preemptible interrupt handlers Preemptible "interrupt disable" sequences Kernel locks priority inheritance Deferred operations Latency-reduction measures30
Vanilla Linux ContextsInterrupt Handlers Interrupt Context SoftIRQs Kernel Space
Regular tasklets
Hi prio taskletsScheduling Points
Timers31
Net Stack
...
User Context
Process Thread Kernel Thread
User Space
PREEMPT_RT Linux ContextsSO_NODELAY Interrupt Handlers
Process Thread
Kernel Space
Net Stack
Tasklets
Timers
Scheduling Points
Kernel Threads
User Space
32
Interface Changes
Spinlocks and local_irq_save() no longer disable hardware interrupts. Spinlocks no longer disable preemption. Raw_ variants for spinlocks and local_irq_save() preserve original meaning for SO_NODELAY interrupts. Semaphores and spinlocks employ priority inheritance Deferred operations API.33
Linux RT Benchmarking
The setup:
Dell PowerEdge SC420 machine with a Pentium 4 2.8MHz CPU, 256 Mb of RAM. Kernel version for all tests: Linux 2.6.12. Running various load generators.
LMbench, LTP, flood ping, dd...
500,000 to 650,000 interrupts via parallel port. Measure response time on another machine.
34
Interrupt Response TimesTechnology Vanilla Linux 2.6.12 Kernel PREEMPT_RT V0.7.51-02 Adeos/ Ipipe-0.7 Avg 6.8 7.3 7.6 Max 555.6 70.5 50.5 Min 5.6 5.6 5.7
Numbers are for time interrupt generation till receive of answer by the logger over parallel port (round trip), in micro-seconds.
35
Any Questions?
GiladBenYossef CTO CodefidenceLtd. [email protected]
36
Copyright Notice
2007 Codefidence Ltd. 2007 Michal Opdenacker, Free Electrons Released under a CC-by-sa 2.0 License.
37