scheduler activations: effective kernel support for the user-level management of parallelism thomas...

Scheduler Activations: Effective Kernel Support for the User-level

Management of Parallelism

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and

Henry M. Levy

Presenter: Yi Qiao

Outline

• Introduction• User-level threads: advantages and

limitations• Effective kernel support for user-level

management of parallelism• Implementation• Performance• Summary

Introduction

• Effectiveness of parallel computing– Largely depends on the performance and cost

of primitives used to express and control the parallelism within programs

• Shared memory between multiple processes– Better for uniprocessor environment

• Use of threads– Separate the notion of a sequential execution stream from

other aspects such as address spaces and I/O descriptors

– A significant performance advantage over traditional processes

Problem with Threads• User-level threads

– Execute within the context of traditional processes• Thread management requires no kernel intervention

• Flexible, easily customized without kernel modification

• Each process – virtual processor– Multiprogramming, I/O, page faults can lead to poor performance

or incorrect behavior of user-level threads

• Kernel-level threads– Avoids system integration problem

• Directly mapped onto physical processor

– Too heavyweight• An order of magnitude worse than the best performance

user-level threads

Goal of the Work• A kernel interface and a user-level thread package that

combine the functionality of kernel threads and the performance and flexibility of user-level threads– When no kernel intervention needed, same performance as

best user-level thread – When kernel needs to be involved, mimic a kernel thread

management system• No idle processors• No high priority thread waits for low-priority ones• Trap of a thread won’t block others

– Simple and easy application-specific cutomization

• Challenge– Necessary control and scheduling information is distributed

between the kernel and application address space

Approach• Each application provided with a virtual multiprocessor,

and control which of its threads run on these processors• The OS kernel control the allocation of processors

among address spaces• Kernel notifies the address space scheduler of relevant

kernel event– Scheduler activation

• Vectors control to the thread scheduler on a kernel event

• Thread system notified kernel of user-level thread events that affect processor allocation decisions– Thread scheduler

• Execute user-level threads• Make requests to the kernel

User-level Threads: Performance Advantages and Functionality Limitations

• Inherent cost in kernel threads management– Accessing thread management operations

• Kernel trap, parameter copy and checking

– Cost of generality• A single underlying implementation used by all

applications

• User-level threads improve both performance and flexibility

User-level Threads: Performance Advantages and Functionality Limitations (Cont.)

• Poor integration of user-level threads on kernel interface– Kernel threads are wrong abstraction of supporting

user-level systems• Kernel threads block, resume and preempted without

notification to user level• Kernel threads are scheduled obliviously to user-level

thread state• Cause problems both for uniprogrammed systems and

multiprogrammed systems– I/O– Page faults

Effective Kernel Support for the User-level Management of Parallelism

• A new kernel interface + user-level thread system– Functionality of kernel threads– Performance and flexibility of user-level threads– Each user-level thread system is provided with its own virtual

multiprocessor, the abstraction of dedicated physical machine• Kernel allocates processors to address spaces – complete control• Each user-level thread system has complete control over which threads to

run on allocated processors• Kernel vectors events to appropriate thread scheduler

– # of processors change, I/O, page fault

• User-level thread system notified kernel when needed– Only a subset of user-level operations which may affect processor allocation

• Application programmer does the same thing as if programming with kernel threads

– Programmers provided with a normal Topaz thread interface

Explicit Vectoring of Kernel Events to the User-level Thread Scheduler

• Scheduler Activation– Each vectored event causes the user-level thread

system to reconsider its scheduling decision– Three roles

• Serves as a vessel (execution context) for running user-level threads

• Notify the user-level thread of a kernel event• Saving processor context of the activation’s current user-

level thread when the thread is stopped by the kernel (I/O or processor preemption)

– Similar data structure as a traditional kernel thread

Scheduler Activation (Cont.)

• Distinction between scheduler activations and kernel threads– Once an activation’s user-level thread is stopped by the

kernel, the thread is never directly resumed by the kernel– Maintains the invariant that there are always as many running

scheduler activations as processors assigned to the address space

Events are vectored where a scheduling decision needs to be made

Example: I/O Request/Completion

• T1: Two processors allocated by kernel, two upcalls

• T2: Thread 1 blocks in the kernel, another upcall

• T3: I/O completes, preempts one processor and do the upcall

• T4: The upcall takes a thread off the ready list and run it

Same mechanism to reallocate a processor from one address space to another

Scheduler Activations (Cont.)• Reallocate a processor from one address space to

another (multiprogramming)– Stop the old activation, use the processor to do an upcall

into the new address space with a new activation– Need a second processor in old address space for an upcall

here, notifying stop of two use-level threads

• Some minor points– If threads have priorities, an additional preemption may be

needed– Application is free to build any other concurrency model

on top of scheduler activations– Sometimes a user-level thread blocked in the kernel may

need to execute further in kernel mode when the I/O completes

Notifying the Kernel of User-level Events

• Only a small subset of user-level events that affect the kernel processor allocation decision need to be notified– Transition to the state where the address space

has more runnable threads than processors– Transition to the state where the address space

has more processors than runnable threads

• How to keep applications honest?

Critical Sections• Block or preempt a user-level thread in a critical

section– Poor performance– Deadlock

• Solution– Prevention

• Requires kernel to yield control over processor allocation to the user-level

– Recovery• The thread system checks if the thread was executing in a

critical section– If so, continue temporarily via a user-level context switch– Then another context switch and relinquished control back to the

original upcall

Implementation

• Modifying Topaz – Change the Topaz thread management routines to

implement scheduler activations– Explicit allocation of processors to address spaces

• Modifying FastThreads– Process upcalls and provide Topaz with information

related to processor allocation decisions

• A few handred lines of code added to FastThreads, 1200 lines to Topaz

Implementation (Cont.)• Processor Allocation Policy

– Processors divided evenly among highest priority address spaces

– Then are divided evenly among the remainder– Time-sliced only if the available processors is not

an integer multiple of the number of address spaces that want them

– Possible to for an address space to use kernel threads instead of scheduler activations

• Binary compatibility with existing Topaz applications

• Thread Scheduling Policy– Application can choose any scheduling policy

• Default: per-processor ready lists following FIFO

Implementation (Cont.)• Performance Enhancements

– Critical Sections – need to check whether the preempted user-level thread has a lock

• Thread set a flag when entering a critical section and clear it when finish

– Overhead + latency

• Making a copy of every low-level critical section with post-processing of complier-generated code, and continues the preempted thread at the copy of the critical section

– No overhead on lock latency in the common case

– Management of scheduler activations• Caching discarded scheduler activations for later reuse

Performance

• Goal: Combining the functionality of kernel threads with the performance and flexibility of user-level threads

• Evaluation questions– What is the cost of user-level thread operations?

• Fork, block

– What is the cost of communication between kernel and the user level?

– What is the overall effect on the performance of applications?

Performance (Cont.)

• Thread Performance– Cost of user-level thread operations close to those of the

FastThreads package• Preserve the order of magnitude advantage over kernel threads

• Upcall Performance– Help determine the “break-even” point to outperform kernel

threads

– Two user-level threads signal and wait through the kernel• 2.4 milliseconds, five times worse than Topaz threads

– Built as a quick modification to existing Topaz thread system

– Written in Modula-2+, much slower than assembler

• Production scheduler activation could be faster

Application Performance

• Compare Topaz kernel threads, FastThreads, and FastThreads on top of scheduler activations– Application

• An O(N logN) solution to the N-body problem

• Can be either compute or I/O bound

– Memory used by the application can be controlled

• All tests run on a six processor CVAX Firefly


• Case 1- Application makes minimal use of kernel services– Enough memory, negligible I/O and no other

applications• Run as fast as original FastThreads

– 1 processor, all perform worse than sequential implementation

– More processors, kernel threads prevent good performance

– Slight divergence of FastThreads and new FastThreads for 4 or 5 processors


• Case 2 – Kernel involvement required for I/O purposes– New FastThreads performs best

• When less and less memory available, all three systems degrade fast

– Old FastThreads is the worst one

» When a user-level thread blocks, the kernel thread also blocks

– New FastThreads and Topaz threads can overlap I/O with useful computation


• Case 3 – Multiprogramming environment– Two copies of the N-body application on the six processors

• Speedup of new FastThread is within 5% of uniprogramming environment with 3 processors

• Old FastThread and Topaz perform much worse– Old FastThread - Physical processors idling waiting for a lock to be

released while the lock holder is descheduled

– Topaz – common thread operations are more expensive

• Limitation of the experiments– Limited number of processors makes it impossible for large

parallel applications or higher multiprogramming levels

Conclusion• Scheduler Activation – a kernel interface that

combines with user-level thread package– Achieves the performance of user-level threads (in

the common case) with the functionality of kernel threads (correct behavior for infrequent case)

– Responsibility division• Kernel

– Processor allocation

– Kernel event notification

• Application address space– Thread scheduling

– Subset of user-level events affecting processor allocation decisions

– Any user-level concurrency model can be supported

scheduler activations: effective kernel support for the user-level management of parallelism thomas...

Documents

kernel of user

user level kernel threads

threads userlevel threads

kernel threads management

kernel slide

best userlevel thread

userlevel thread package

kernel modification