scheduler activations: effective kernel support for the user-level management of parallelism thomas...
TRANSCRIPT
Scheduler Activations: Effective Kernel Support for the User-level
Management of Parallelism
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and
Henry M. Levy
Presenter: Yi Qiao
Outline
• Introduction• User-level threads: advantages and
limitations• Effective kernel support for user-level
management of parallelism• Implementation• Performance• Summary
Introduction
• Effectiveness of parallel computing– Largely depends on the performance and cost
of primitives used to express and control the parallelism within programs
• Shared memory between multiple processes– Better for uniprocessor environment
• Use of threads– Separate the notion of a sequential execution stream from
other aspects such as address spaces and I/O descriptors
– A significant performance advantage over traditional processes
Problem with Threads• User-level threads
– Execute within the context of traditional processes• Thread management requires no kernel intervention
• Flexible, easily customized without kernel modification
• Each process – virtual processor– Multiprogramming, I/O, page faults can lead to poor performance
or incorrect behavior of user-level threads
• Kernel-level threads– Avoids system integration problem
• Directly mapped onto physical processor
– Too heavyweight• An order of magnitude worse than the best performance
user-level threads
Goal of the Work• A kernel interface and a user-level thread package that
combine the functionality of kernel threads and the performance and flexibility of user-level threads– When no kernel intervention needed, same performance as
best user-level thread – When kernel needs to be involved, mimic a kernel thread
management system• No idle processors• No high priority thread waits for low-priority ones• Trap of a thread won’t block others
– Simple and easy application-specific cutomization
• Challenge– Necessary control and scheduling information is distributed
between the kernel and application address space
Approach• Each application provided with a virtual multiprocessor,
and control which of its threads run on these processors• The OS kernel control the allocation of processors
among address spaces• Kernel notifies the address space scheduler of relevant
kernel event– Scheduler activation
• Vectors control to the thread scheduler on a kernel event
• Thread system notified kernel of user-level thread events that affect processor allocation decisions– Thread scheduler
• Execute user-level threads• Make requests to the kernel
User-level Threads: Performance Advantages and Functionality Limitations
• Inherent cost in kernel threads management– Accessing thread management operations
• Kernel trap, parameter copy and checking
– Cost of generality• A single underlying implementation used by all
applications
• User-level threads improve both performance and flexibility
User-level Threads: Performance Advantages and Functionality Limitations (Cont.)
• Poor integration of user-level threads on kernel interface– Kernel threads are wrong abstraction of supporting
user-level systems• Kernel threads block, resume and preempted without
notification to user level• Kernel threads are scheduled obliviously to user-level
thread state• Cause problems both for uniprogrammed systems and
multiprogrammed systems– I/O– Page faults
Effective Kernel Support for the User-level Management of Parallelism
• A new kernel interface + user-level thread system– Functionality of kernel threads– Performance and flexibility of user-level threads– Each user-level thread system is provided with its own virtual
multiprocessor, the abstraction of dedicated physical machine• Kernel allocates processors to address spaces – complete control• Each user-level thread system has complete control over which threads to
run on allocated processors• Kernel vectors events to appropriate thread scheduler
– # of processors change, I/O, page fault
• User-level thread system notified kernel when needed– Only a subset of user-level operations which may affect processor allocation
• Application programmer does the same thing as if programming with kernel threads
– Programmers provided with a normal Topaz thread interface
Explicit Vectoring of Kernel Events to the User-level Thread Scheduler
• Scheduler Activation– Each vectored event causes the user-level thread
system to reconsider its scheduling decision– Three roles
• Serves as a vessel (execution context) for running user-level threads
• Notify the user-level thread of a kernel event• Saving processor context of the activation’s current user-
level thread when the thread is stopped by the kernel (I/O or processor preemption)
– Similar data structure as a traditional kernel thread
Scheduler Activation (Cont.)
• Distinction between scheduler activations and kernel threads– Once an activation’s user-level thread is stopped by the
kernel, the thread is never directly resumed by the kernel– Maintains the invariant that there are always as many running
scheduler activations as processors assigned to the address space
Events are vectored where a scheduling decision needs to be made
Example: I/O Request/Completion
• T1: Two processors allocated by kernel, two upcalls
• T2: Thread 1 blocks in the kernel, another upcall
• T3: I/O completes, preempts one processor and do the upcall
• T4: The upcall takes a thread off the ready list and run it
Same mechanism to reallocate a processor from one address space to another
Scheduler Activations (Cont.)• Reallocate a processor from one address space to
another (multiprogramming)– Stop the old activation, use the processor to do an upcall
into the new address space with a new activation– Need a second processor in old address space for an upcall
here, notifying stop of two use-level threads
• Some minor points– If threads have priorities, an additional preemption may be
needed– Application is free to build any other concurrency model
on top of scheduler activations– Sometimes a user-level thread blocked in the kernel may
need to execute further in kernel mode when the I/O completes
Notifying the Kernel of User-level Events
• Only a small subset of user-level events that affect the kernel processor allocation decision need to be notified– Transition to the state where the address space
has more runnable threads than processors– Transition to the state where the address space
has more processors than runnable threads
• How to keep applications honest?
Critical Sections• Block or preempt a user-level thread in a critical
section– Poor performance– Deadlock
• Solution– Prevention
• Requires kernel to yield control over processor allocation to the user-level
– Recovery• The thread system checks if the thread was executing in a
critical section– If so, continue temporarily via a user-level context switch– Then another context switch and relinquished control back to the
original upcall
Implementation
• Modifying Topaz – Change the Topaz thread management routines to
implement scheduler activations– Explicit allocation of processors to address spaces
• Modifying FastThreads– Process upcalls and provide Topaz with information
related to processor allocation decisions
• A few handred lines of code added to FastThreads, 1200 lines to Topaz
Implementation (Cont.)• Processor Allocation Policy
– Processors divided evenly among highest priority address spaces
– Then are divided evenly among the remainder– Time-sliced only if the available processors is not
an integer multiple of the number of address spaces that want them
– Possible to for an address space to use kernel threads instead of scheduler activations
• Binary compatibility with existing Topaz applications
• Thread Scheduling Policy– Application can choose any scheduling policy
• Default: per-processor ready lists following FIFO
Implementation (Cont.)• Performance Enhancements
– Critical Sections – need to check whether the preempted user-level thread has a lock
• Thread set a flag when entering a critical section and clear it when finish
– Overhead + latency
• Making a copy of every low-level critical section with post-processing of complier-generated code, and continues the preempted thread at the copy of the critical section
– No overhead on lock latency in the common case
– Management of scheduler activations• Caching discarded scheduler activations for later reuse
Performance
• Goal: Combining the functionality of kernel threads with the performance and flexibility of user-level threads
• Evaluation questions– What is the cost of user-level thread operations?
• Fork, block
– What is the cost of communication between kernel and the user level?
– What is the overall effect on the performance of applications?
Performance (Cont.)
• Thread Performance– Cost of user-level thread operations close to those of the
FastThreads package• Preserve the order of magnitude advantage over kernel threads
• Upcall Performance– Help determine the “break-even” point to outperform kernel
threads
– Two user-level threads signal and wait through the kernel• 2.4 milliseconds, five times worse than Topaz threads
– Built as a quick modification to existing Topaz thread system
– Written in Modula-2+, much slower than assembler
• Production scheduler activation could be faster
Application Performance
• Compare Topaz kernel threads, FastThreads, and FastThreads on top of scheduler activations– Application
• An O(N logN) solution to the N-body problem
• Can be either compute or I/O bound
– Memory used by the application can be controlled
• All tests run on a six processor CVAX Firefly
Application Performance
• Case 1- Application makes minimal use of kernel services– Enough memory, negligible I/O and no other
applications• Run as fast as original FastThreads
– 1 processor, all perform worse than sequential implementation
– More processors, kernel threads prevent good performance
– Slight divergence of FastThreads and new FastThreads for 4 or 5 processors
Application Performance
• Case 2 – Kernel involvement required for I/O purposes– New FastThreads performs best
• When less and less memory available, all three systems degrade fast
– Old FastThreads is the worst one
» When a user-level thread blocks, the kernel thread also blocks
– New FastThreads and Topaz threads can overlap I/O with useful computation
Application Performance
• Case 3 – Multiprogramming environment– Two copies of the N-body application on the six processors
• Speedup of new FastThread is within 5% of uniprogramming environment with 3 processors
• Old FastThread and Topaz perform much worse– Old FastThread - Physical processors idling waiting for a lock to be
released while the lock holder is descheduled
– Topaz – common thread operations are more expensive
• Limitation of the experiments– Limited number of processors makes it impossible for large
parallel applications or higher multiprogramming levels
Conclusion• Scheduler Activation – a kernel interface that
combines with user-level thread package– Achieves the performance of user-level threads (in
the common case) with the functionality of kernel threads (correct behavior for infrequent case)
– Responsibility division• Kernel
– Processor allocation
– Kernel event notification
• Application address space– Thread scheduling
– Subset of user-level events affecting processor allocation decisions
– Any user-level concurrency model can be supported