introduction

User Level Interprocess User Level Interprocess Communication for Communication for

Shared Memory Shared Memory MultiprocessorMultiprocessor

by by

Bershad, B.N. Anderson, Bershad, B.N. Anderson, A.E., Lazowska, E.D., and A.E., Lazowska, E.D., and

Levy, H.M.Levy, H.M.

IntroductionIntroduction

RPCRPC Help in implementing distributed Help in implementing distributed

applications by eliminating the need to applications by eliminating the need to implement communication mechanism.implement communication mechanism.

Decomposed system provides Decomposed system provides advantages of failure isolation, advantages of failure isolation, extensibility and modularity. So RPC is extensibility and modularity. So RPC is used even when the call is in the same used even when the call is in the same machine.machine.


RPC CostsRPC Costs Stub overheadStub overhead Message buffer overhead (4 copies)Message buffer overhead (4 copies) Access validationAccess validation Message transferMessage transfer SchedulingScheduling Context switchContext switch DispatchDispatch


LRPC CostsLRPC Costs Stub overheadStub overhead Message buffer overhead (1 copy)Message buffer overhead (1 copy) Only necessary access validationOnly necessary access validation Message transferMessage transfer Only necessary schedulingOnly necessary scheduling Context switch is minimized by using Context switch is minimized by using

domain cachingdomain caching


IPCIPC Main components (All work in Kernel)Main components (All work in Kernel)

Processor reallocation (process context Processor reallocation (process context switch)switch)

Data transferData transfer Thread managementThread management

ProblemsProblems Processor reallocation is expensiveProcessor reallocation is expensive Parallel applications need user-level thread Parallel applications need user-level thread

managementmanagement

URPCURPC

User-Level Remote Procedure CallUser-Level Remote Procedure Call Shared memory multiprocessorsShared memory multiprocessors

Processor reallocation - minimizeProcessor reallocation - minimize Data transfer - user-level (Package called Data transfer - user-level (Package called

URPC)URPC) Thread management - user-level (Package Thread management - user-level (Package

called FastThreads)called FastThreads)

User-level componentsUser-level components

Processor ReallocationProcessor Reallocation

Limit the frequency of processor Limit the frequency of processor reallocationreallocation WhyWhy

Cost of process context switch is more Cost of process context switch is more expensive than thread context switchexpensive than thread context switch

Cost of invoking kernelCost of invoking kernel-Client makes procedure call in server address space-Invoke kernel-Kernel reallocates processor to server address space-Server finishes the job-Invoke kernel-Kernel reallocates processor to client address space-Client resumes the work


Limit the frequency of processor Limit the frequency of processor reallocationreallocation HowHow

Optimistic reallocation policyOptimistic reallocation policy Client has other worksClient has other works Server has or will soon has a processor to do the Server has or will soon has a processor to do the

jobjob

Uniprocessor can delay processor Uniprocessor can delay processor reallocationreallocation

-Client makes procedure call in server address space-Client does something else-Server finishes the job-Client resumes the work


ProblemsProblems Inappropriate situationsInappropriate situations

Single-threaded client, real time applications Single-threaded client, real time applications & high-latency I/O applications& high-latency I/O applications

Solve: Allow client to force processor Solve: Allow client to force processor reallocationreallocation

UnderpoweredUnderpowered No processor to handle the pending request No processor to handle the pending request

from clientfrom client Solve: Donate – idle processor donates itself Solve: Donate – idle processor donates itself

to underpowered address spaceto underpowered address space


ProblemsProblems Voluntary return of processorVoluntary return of processor

Processor working in server never return to Processor working in server never return to client because it is too busy working on the client because it is too busy working on the request of other clients.request of other clients.

Solve: enforce the process reallocation Solve: enforce the process reallocation when necessary such as high priority when necessary such as high priority waiting while low priority job is running and waiting while low priority job is running and processor is idlingprocessor is idling


LRPC VS URPCLRPC VS URPC Domain caching looks for idle processor in Domain caching looks for idle processor in

server contextserver context Optimistic reallocation assume there will Optimistic reallocation assume there will

be an available processor in server context be an available processor in server context and queue the request to be done laterand queue the request to be done later

URPC needs two level scheduling URPC needs two level scheduling decisions including looking for idle decisions including looking for idle processor and underpoweredprocessor and underpowered address address space while LRPC does not.space while LRPC does not.

Data TransferData Transfer

Use pair-wise shared memory to Use pair-wise shared memory to avoid the need of copying in kernel.avoid the need of copying in kernel.

Both give the same level of security Both give the same level of security since data need to be passed into since data need to be passed into stubs before it can be usedstubs before it can be used

Thread ManagementThread Management

ArgumentsArguments Fine-grained parallel application needs Fine-grained parallel application needs

high performance thread management high performance thread management which could only be achieved by which could only be achieved by implementing in user-levelimplementing in user-level

Communication & Thread management Communication & Thread management can achieve very good performances can achieve very good performances when both are implemented at user-when both are implemented at user-levellevel


Features of kernel such as time Features of kernel such as time slicing degrade performance of slicing degrade performance of applicationsapplications

To invoke thread management To invoke thread management operation, kernel traps are requiredoperation, kernel traps are required

Thread management policy Thread management policy implemented in kernel is unlikely to implemented in kernel is unlikely to be efficient for all parallel be efficient for all parallel applicationsapplications


Threads block in order toThreads block in order to Synchronize their activities in same Synchronize their activities in same

address spaceaddress space Wait for external events from different Wait for external events from different

address spaceaddress space Communication implemented at kernel level Communication implemented at kernel level

will result in synchronization at both user will result in synchronization at both user level and kernel levellevel and kernel level

URPCURPC

PerformancePerformance

Thread managementThread management faster at user faster at user levellevel

Component breakdownComponent breakdown

PerformancePerformance

Call latency & throughput is at worst Call latency & throughput is at worst when S=0when S=0

ConclusionConclusion

Moving the possible functionality Moving the possible functionality from kernel into user-lever to from kernel into user-lever to improve performanceimprove performance

In order to achieve great In order to achieve great performance on multiprocessors, performance on multiprocessors, system need to be designed to system need to be designed to support its functionalitysupport its functionality

introduction

Documents